▼Accumulator Data Types | |
Complex Accumulator Types | |
Floating-Point Accumulator Types | |
►Integer Accumulator Types | |
1024-bit accumulator types | |
256-bit accumulator types | |
512-bit accumulator types | |
▼Load/Store Operations | |
Addressing intrinsics | |
►Compressed Load Operations | Compressed load operations load a compressed vector and expand it into an AIE-ML register |
Compressed Load Reset Operations | |
Compressed Load of Eight Vectors | |
Compressed Load of Four Vectors | |
Compressed Load of One Vector | |
Compressed Load of Two Vectors | |
►Compressed Sparse Load Operations | Compressed sparse load operations load a compressed sparse vector and expand it into an AIE-ML register |
Sparse Load Fill Operations | |
Sparse Load Peek Operations | |
Sparse Load Pop Operations | |
Sparse Load Reset Operations | |
Load 4x Operations | Load 4x intrinsics load four 64-bit values to a vector register from data memory |
►Streams | |
Cascade read | |
Cascade write | |
Stream read | |
Stream write | |
Scalar Data Types | All the standard C scalar data-types are supported |
▼Scalar Operations | |
►Configuration | |
►Mode Settings | |
Control registers | Intrinsics to set,get and clear the control registers |
Status registers | Intrinsics to set,get and clear the status registers |
Core ID | |
Cycle Counter | |
Events | |
Initialization | |
Integer Operations | Intrinsics allowing you to perform select, absolute and delay operations on integer scalars |
Locks | Intrinsics to acquire and release locks |
Scalar Conversions | |
Scalar updates and extracts | |
Stream access | These functions setup stream accesses in native mode |
▼Vector Conversions | Various forms of conversions between vector data-types |
►Broadcast | Broadcasts input value to all vector lanes |
Broadcast from scalar | Broadcasts input value to all vector lanes (alternative syntax to broadcast to vector) |
Broadcast to vector | Broadcasts input value to all vector lanes (alternative syntax to broadcast from scalar) |
Updating all elements with element extracted from vector | Extracts element "idx" from vector "v" and broadcasts its value to all lanes of the destination vector |
Updating all elements with one | Broadcasts value one (1) to all vector lanes |
Updating all elements with zero | Broadcasts value zero (0) to all vector lanes |
Casting | Casting intrinsics allow casting (bit-reinterpretation) between vector types of the same size |
►Concatenate vectors | Vector concat intrinsic functions allow concatenation of vector values to create a larger one |
Concatenate four vectors | |
Concatenate two vectors | |
►Extract vector | Extraction intrinsics enable lanes to be selected from vector and accumulator types |
Extract element from vector | |
Extract integer and float data | |
Extract sparsity and data from sparse vector | |
Update sparse vectors | |
Extract/insert element | These intrinsics allow inserting or extracting of an individual element into/from a vector |
Float to integer conversions | Conversion from bfloat16 vector to integer vector |
►Insert vector | Vector insert intrinsic functions allow substitution of the lanes within a vector value |
Insert a vector into a vector | |
Insert an element into a vector | |
Pack/Unpack | |
►Set vector | Vector set intrinsic functions allow setting the lanes within a vector value |
Set an element of a vector | |
Set specific lanes of a vector | |
►Shift-Round-Saturate | Intrinsics for moving values from accumulator data-types to vector data-types |
AIE interface | |
Floating-point interface | |
Size interface | |
►Upshift | Intrinsics for moving values from vector data-types to accumulator data-types |
AIE interface | |
Floating-point | |
Size interface | |
▼Vector Data Types | |
Complex Vector Types | |
Compressed Complex Vector Types | |
Compressed Floating-Point Vector Types | |
►Compressed Integer Vector Types | |
Compressed 256-bit vector types | |
Compressed 512-bit vector types | |
►Compressed Sparse Vector Types | |
Compressed sparse floating-point vector types | |
Compressed sparse integer vector types | |
Floating-Point Vector Types | |
►Integer Vector Types | |
1024-bit vector types | |
128-bit vector types | |
16-bit vector types | |
256-bit vector types | |
32-bit vector types | |
512-bit vector types | |
64-bit vector types | |
8-bit vector types | |
►Sparse Vector Types | |
Sparse floating-point vector types | |
Sparse integer vector types | |
▼Vector Operations | |
Add/Subtract | Intrinsics and operators that allows you to perform addition and substraction operations on all types of vectors |
Bitwise logical | Intrinsics and operators that allows you to perform bitwise logical operations on all types of vectors |
Compare/Select | Intrinsics allowing you to perform compare and select operations on all types of vectors |
Initialization | |
►Multiply Accumulate | Intrinsics allowing you to perform MUL/MAC operations and a few of their variants |
Emulated Multiply-accumulate of 16b x 32b datatypes | Matrix multiplications in which matrix A has data elements of 16 bit and matrix B has data elements of 32 bit. These operations are emulated on top of Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance |
Emulated Multiply-accumulate of 32b x 16b datatypes | Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 16 bit. These operations are emulated on top of Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance |
Emulated Multiply-accumulate of 32b x 32b datatypes | Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 32 bit. These operations are emulated on top of Multiply-accumulate of 32b x 16b integer datatypes and Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance |
Emulated Multiply-accumulate of Complex 32b x Complex 32b datatypes | Matrix multiplications in which matrix A has data elements of complex 32 bit and matrix B has data elements of complex 32 bit. These operations are emulated on top of Multiply-accumulate of 32b x 16b complex integer datatypes and might not have optimal performance |
Emulated Multiply-accumulate of fp32 x fp32 datatypes | Elementwise-multiplication and matrix multiplication using bfloat16 datapath. 2 options available. With or without set_rnd(0) for truncation before using these intrinsics. Use flag AIE_FP32_EMULATION_SET_RND_MODE flag to set rnd mode to truncation. For an explanation how these operations works see Multiply Accumulate |
Multiply-accumulate of 16b x 16b complex integer datatypes | Matrix multiplications in which matrix A and matrix B have complex data elements of 16 bit. For an explanation how these operations works see Multiply Accumulate |
Multiply-accumulate of 16b x 16b integer datatypes | Matrix multiplications in which matrix A and matrix B have data elements of 16 bit |
Multiply-accumulate of 16b x 8b integer datatypes | Matrix multiplications in which matrix A has data elements of 16 bit and matrix B has data elements of 8 bit |
Multiply-accumulate of 32b x 16b complex integer datatypes | Matrix multiplications in which matrix A has complex data elements of 32 bit and matrix B has complex data elements of 16 bit |
Multiply-accumulate of 32b x 16b integer datatypes | Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 16 bit |
Multiply-accumulate of 8b x 4b datatypes | Matrix multiplications in which matrix A has data elements of 8 bit and matrix B has data elements of 4 bit. These operations are emulated on top of int8 x int8 |
Multiply-accumulate of 8b x 8b integer datatypes | Matrix multiplications in which matrix A and matrix B have data elements of 8 bit |
Multiply-accumulate of bfloat16 datatypes | Matrix multiplications in which matrix A and B have bfloat16 data elements |
Multiply-accumulate with a sparse matrix | Matrix multiplications in which matrix B is a sparse matrix |
Negation control in complex multiplication modes | In order to do complex multiplications, some terms need to be negated |
Shift | These intrinsics allow shifting full vectors |
Shift element | |
►Shuffle | Intrinsics allowing you perform vector shuffles |
Illustration of Shuffle Modes | |