Here is a list of all modules:

[detail level 1234]

▼Accumulator Data Types
Complex Accumulator Types
Floating-Point Accumulator Types
►Integer Accumulator Types
1024-bit accumulator types
256-bit accumulator types
512-bit accumulator types
▼Load/Store Operations
Addressing intrinsics
►Compressed Load Operations	Compressed load operations load a compressed vector and expand it into an AIE-ML register
Compressed Load Reset Operations
Compressed Load of Eight Vectors
Compressed Load of Four Vectors
Compressed Load of One Vector
Compressed Load of Two Vectors
►Compressed Sparse Load Operations	Compressed sparse load operations load a compressed sparse vector and expand it into an AIE-ML register
Sparse Load Fill Operations
Sparse Load Peek Operations
Sparse Load Pop Operations
Sparse Load Reset Operations
Load 4x Operations	Load 4x intrinsics load four 64-bit values to a vector register from data memory
►Streams
Cascade read
Cascade write
Stream read
Stream write
Scalar Data Types	All the standard C scalar data-types are supported
▼Scalar Operations
►Configuration
►Mode Settings
Control registers	Intrinsics to set,get and clear the control registers
Status registers	Intrinsics to set,get and clear the status registers
Core ID
Cycle Counter
Events
Initialization
Integer Operations	Intrinsics allowing you to perform select, absolute and delay operations on integer scalars
Locks	Intrinsics to acquire and release locks
Scalar Conversions
Scalar updates and extracts
Stream access	These functions setup stream accesses in native mode
▼Vector Conversions	Various forms of conversions between vector data-types
►Broadcast	Broadcasts input value to all vector lanes
Broadcast from scalar	Broadcasts input value to all vector lanes (alternative syntax to broadcast to vector)
Broadcast to vector	Broadcasts input value to all vector lanes (alternative syntax to broadcast from scalar)
Updating all elements with element extracted from vector	Extracts element "idx" from vector "v" and broadcasts its value to all lanes of the destination vector
Updating all elements with one	Broadcasts value one (1) to all vector lanes
Updating all elements with zero	Broadcasts value zero (0) to all vector lanes
Casting	Casting intrinsics allow casting (bit-reinterpretation) between vector types of the same size
►Concatenate vectors	Vector concat intrinsic functions allow concatenation of vector values to create a larger one
Concatenate four vectors
Concatenate two vectors
►Extract vector	Extraction intrinsics enable lanes to be selected from vector and accumulator types
Extract element from vector
Extract integer and float data
Extract sparsity and data from sparse vector
Update sparse vectors
Extract/insert element	These intrinsics allow inserting or extracting of an individual element into/from a vector
Float to integer conversions	Conversion from bfloat16 vector to integer vector
►Insert vector	Vector insert intrinsic functions allow substitution of the lanes within a vector value
Insert a vector into a vector
Insert an element into a vector
Pack/Unpack
►Set vector	Vector set intrinsic functions allow setting the lanes within a vector value
Set an element of a vector
Set specific lanes of a vector
►Shift-Round-Saturate	Intrinsics for moving values from accumulator data-types to vector data-types
AIE interface
Floating-point interface
Size interface
►Upshift	Intrinsics for moving values from vector data-types to accumulator data-types
AIE interface
Floating-point
Size interface
▼Vector Data Types
Complex Vector Types
Compressed Complex Vector Types
Compressed Floating-Point Vector Types
►Compressed Integer Vector Types
Compressed 256-bit vector types
Compressed 512-bit vector types
►Compressed Sparse Vector Types
Compressed sparse floating-point vector types
Compressed sparse integer vector types
Floating-Point Vector Types
►Integer Vector Types
1024-bit vector types
128-bit vector types
16-bit vector types
256-bit vector types
32-bit vector types
512-bit vector types
64-bit vector types
8-bit vector types
►Sparse Vector Types
Sparse floating-point vector types
Sparse integer vector types
▼Vector Operations
Add/Subtract	Intrinsics and operators that allows you to perform addition and substraction operations on all types of vectors
Bitwise logical	Intrinsics and operators that allows you to perform bitwise logical operations on all types of vectors
Compare/Select	Intrinsics allowing you to perform compare and select operations on all types of vectors
Initialization
►Multiply Accumulate	Intrinsics allowing you to perform MUL/MAC operations and a few of their variants
Emulated Multiply-accumulate of 16b x 32b datatypes	Matrix multiplications in which matrix A has data elements of 16 bit and matrix B has data elements of 32 bit. These operations are emulated on top of Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance
Emulated Multiply-accumulate of 32b x 16b datatypes	Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 16 bit. These operations are emulated on top of Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance
Emulated Multiply-accumulate of 32b x 32b datatypes	Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 32 bit. These operations are emulated on top of Multiply-accumulate of 32b x 16b integer datatypes and Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance
Emulated Multiply-accumulate of Complex 32b x Complex 32b datatypes	Matrix multiplications in which matrix A has data elements of complex 32 bit and matrix B has data elements of complex 32 bit. These operations are emulated on top of Multiply-accumulate of 32b x 16b complex integer datatypes and might not have optimal performance
Emulated Multiply-accumulate of fp32 x fp32 datatypes	Elementwise-multiplication and matrix multiplication using bfloat16 datapath. 2 options available. With or without set_rnd(0) for truncation before using these intrinsics. Use flag AIE_FP32_EMULATION_SET_RND_MODE flag to set rnd mode to truncation. For an explanation how these operations works see Multiply Accumulate
Multiply-accumulate of 16b x 16b complex integer datatypes	Matrix multiplications in which matrix A and matrix B have complex data elements of 16 bit. For an explanation how these operations works see Multiply Accumulate
Multiply-accumulate of 16b x 16b integer datatypes	Matrix multiplications in which matrix A and matrix B have data elements of 16 bit
Multiply-accumulate of 16b x 8b integer datatypes	Matrix multiplications in which matrix A has data elements of 16 bit and matrix B has data elements of 8 bit
Multiply-accumulate of 32b x 16b complex integer datatypes	Matrix multiplications in which matrix A has complex data elements of 32 bit and matrix B has complex data elements of 16 bit
Multiply-accumulate of 32b x 16b integer datatypes	Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 16 bit
Multiply-accumulate of 8b x 4b datatypes	Matrix multiplications in which matrix A has data elements of 8 bit and matrix B has data elements of 4 bit. These operations are emulated on top of int8 x int8
Multiply-accumulate of 8b x 8b integer datatypes	Matrix multiplications in which matrix A and matrix B have data elements of 8 bit
Multiply-accumulate of bfloat16 datatypes	Matrix multiplications in which matrix A and B have bfloat16 data elements
Multiply-accumulate with a sparse matrix	Matrix multiplications in which matrix B is a sparse matrix
Negation control in complex multiplication modes	In order to do complex multiplications, some terms need to be negated
Shift	These intrinsics allow shifting full vectors
Shift element
►Shuffle	Intrinsics allowing you perform vector shuffles
Illustration of Shuffle Modes