AI Engine-ML Intrinsics User Guide
(v2023.2)
|
Load 4x intrinsics load four 64-bit values to a vector register from data memory. More...
Load 4x intrinsics load four 64-bit values to a vector register from data memory.
Loads vectors from four LUT elements. LUTs are specified by passing two 256-bit aligned pointers lut1 and lut2, the remaining two pointers are generated automatically so that they point to uneven memory banks. All four pointers are now used to read the low and high part of the output vector. The first read access will return the low part of the vector, the second read will return the high part of the vector. Every pointer will be offset by the value of the corresponding lane of the offset input.
The load_lut_2x variants return two vectors. They require the high part of the offset input.
Functions | |
void | load_lut_2x_float (const void *lut1, const void *lut2, v16int32 offset, chess_output v32bfloat16 &v1, chess_output v32bfloat16 &v2) |
Reads two 32 lane vectors of bfloat16 from the LUTs and stores the result in v1 and v2. More... | |
void | load_lut_2x_float (const void *lut1, const void *lut2, v16uint32 offset, chess_output v32bfloat16 &v1, chess_output v32bfloat16 &v2) |
Reads two 32 lane vector of bfloat16 from the LUTs and stores the result in v1 and v2. More... | |
void | load_lut_2x_int16 (const void *lut1, const void *lut2, v16int32 offset, chess_output v32int16 &v1, chess_output v32int16 &v2) |
Reads two 32 lane vectors of int16 from the LUTs and stores the results in v1 and v2. More... | |
void | load_lut_2x_int16 (const void *lut1, const void *lut2, v16uint32 offset, chess_output v32int16 &v1, chess_output v32int16 &v2) |
Reads two 32 lane vectors of int16 from the LUTs and stores the results in v1 and v2. More... | |
void | load_lut_2x_int32 (const void *lut1, const void *lut2, v16int32 offset, chess_output v16int32 &v1, chess_output v16int32 &v2) |
Reads two 16 lane vectors of int32 from the LUTs and stores the results in v1 and v2. More... | |
void | load_lut_2x_int32 (const void *lut1, const void *lut2, v16uint32 offset, chess_output v16int32 &v1, chess_output v16int32 &v2) |
Reads two 16 lane vectors of int32 from the LUTs and stores the results in v1 and v2. More... | |
void | load_lut_2x_int8 (const void *lut1, const void *lut2, v16int32 offset, chess_output v64int8 &v1, chess_output v64int8 &v2) |
Reads two 64 lane vectors of int8 from the LUTs and stores the results in v1 and v2. More... | |
void | load_lut_2x_int8 (const void *lut1, const void *lut2, v16uint32 offset, chess_output v64int8 &v1, chess_output v64int8 &v2) |
Reads two 64 lane vectors of int8 from the LUTs and stores the results in v1 and v2. More... | |
void | load_lut_float (const void *lut1, const void *lut2, v16int32 offset, chess_output v32bfloat16 &v1) |
Reads a 32 lane vector of bfloat16 from the LUTs and stores the result in v1. More... | |
void | load_lut_float (const void *lut1, const void *lut2, v16uint32 offset, chess_output v32bfloat16 &v1) |
Reads a 32 lane vector of bfloat16 from the LUTs and stores the result in v1. More... | |
void | load_lut_int16 (const void *lut1, const void *lut2, v16int32 offset, chess_output v32int16 &v1) |
Reads a 32 lane vector of int16 from the LUTs and stores the result in v1. More... | |
void | load_lut_int16 (const void *lut1, const void *lut2, v16uint32 offset, chess_output v32int16 &v1) |
Reads a 32 lane vector of int16 from the LUTs and stores the result in v1. More... | |
void | load_lut_int32 (const void *lut1, const void *lut2, v16int32 offset, chess_output v16int32 &v1) |
Reads a 16 lane vector of int32 from the LUTs and stores the result in v1. More... | |
void | load_lut_int32 (const void *lut1, const void *lut2, v16uint32 offset, chess_output v16int32 &v1) |
Reads a 16 lane vector of int32 from the LUTs and stores the result in v1. More... | |
void | load_lut_int8 (const void *lut1, const void *lut2, v16int32 offset, chess_output v64int8 &v1) |
Reads a 64 lane vector of int8 from the LUTs and stores the result in v1. More... | |
void | load_lut_int8 (const void *lut1, const void *lut2, v16uint32 offset, chess_output v64int8 &v1) |
Reads a 64 lane vector of int8 from the LUTs and stores the result in v1. More... | |
void load_lut_2x_float | ( | const void * | lut1, |
const void * | lut2, | ||
v16int32 | offset, | ||
chess_output v32bfloat16 & | v1, | ||
chess_output v32bfloat16 & | v2 | ||
) |
Reads two 32 lane vectors of bfloat16 from the LUTs and stores the result in v1 and v2.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data of the first read will be stored |
v2 | Reference to vector in which the data of the second read will be stored |
void load_lut_2x_float | ( | const void * | lut1, |
const void * | lut2, | ||
v16uint32 | offset, | ||
chess_output v32bfloat16 & | v1, | ||
chess_output v32bfloat16 & | v2 | ||
) |
Reads two 32 lane vector of bfloat16 from the LUTs and stores the result in v1 and v2.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data of the first read will be stored |
v2 | Reference to vector in which the data of the second read will be stored |
void load_lut_2x_int16 | ( | const void * | lut1, |
const void * | lut2, | ||
v16int32 | offset, | ||
chess_output v32int16 & | v1, | ||
chess_output v32int16 & | v2 | ||
) |
Reads two 32 lane vectors of int16 from the LUTs and stores the results in v1 and v2.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data of the first read will be stored |
v2 | Reference to vector in which the data of the second read will be stored |
void load_lut_2x_int16 | ( | const void * | lut1, |
const void * | lut2, | ||
v16uint32 | offset, | ||
chess_output v32int16 & | v1, | ||
chess_output v32int16 & | v2 | ||
) |
Reads two 32 lane vectors of int16 from the LUTs and stores the results in v1 and v2.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data of the first read will be stored |
v2 | Reference to vector in which the data of the second read will be stored |
void load_lut_2x_int32 | ( | const void * | lut1, |
const void * | lut2, | ||
v16int32 | offset, | ||
chess_output v16int32 & | v1, | ||
chess_output v16int32 & | v2 | ||
) |
Reads two 16 lane vectors of int32 from the LUTs and stores the results in v1 and v2.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data of the first read will be stored |
v2 | Reference to vector in which the data of the second read will be stored |
void load_lut_2x_int32 | ( | const void * | lut1, |
const void * | lut2, | ||
v16uint32 | offset, | ||
chess_output v16int32 & | v1, | ||
chess_output v16int32 & | v2 | ||
) |
Reads two 16 lane vectors of int32 from the LUTs and stores the results in v1 and v2.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data of the first read will be stored |
v2 | Reference to vector in which the data of the second read will be stored |
void load_lut_2x_int8 | ( | const void * | lut1, |
const void * | lut2, | ||
v16int32 | offset, | ||
chess_output v64int8 & | v1, | ||
chess_output v64int8 & | v2 | ||
) |
Reads two 64 lane vectors of int8 from the LUTs and stores the results in v1 and v2.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data of the first read will be stored |
v2 | Reference to vector in which the data of the second read will be stored |
void load_lut_2x_int8 | ( | const void * | lut1, |
const void * | lut2, | ||
v16uint32 | offset, | ||
chess_output v64int8 & | v1, | ||
chess_output v64int8 & | v2 | ||
) |
Reads two 64 lane vectors of int8 from the LUTs and stores the results in v1 and v2.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data of the first read will be stored |
v2 | Reference to vector in which the data of the second read will be stored |
void load_lut_float | ( | const void * | lut1, |
const void * | lut2, | ||
v16int32 | offset, | ||
chess_output v32bfloat16 & | v1 | ||
) |
Reads a 32 lane vector of bfloat16 from the LUTs and stores the result in v1.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data will be stored. |
void load_lut_float | ( | const void * | lut1, |
const void * | lut2, | ||
v16uint32 | offset, | ||
chess_output v32bfloat16 & | v1 | ||
) |
Reads a 32 lane vector of bfloat16 from the LUTs and stores the result in v1.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data will be stored. |
void load_lut_int16 | ( | const void * | lut1, |
const void * | lut2, | ||
v16int32 | offset, | ||
chess_output v32int16 & | v1 | ||
) |
Reads a 32 lane vector of int16 from the LUTs and stores the result in v1.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data will be stored. |
void load_lut_int16 | ( | const void * | lut1, |
const void * | lut2, | ||
v16uint32 | offset, | ||
chess_output v32int16 & | v1 | ||
) |
Reads a 32 lane vector of int16 from the LUTs and stores the result in v1.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data will be stored. |
void load_lut_int32 | ( | const void * | lut1, |
const void * | lut2, | ||
v16int32 | offset, | ||
chess_output v16int32 & | v1 | ||
) |
Reads a 16 lane vector of int32 from the LUTs and stores the result in v1.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data will be stored. |
void load_lut_int32 | ( | const void * | lut1, |
const void * | lut2, | ||
v16uint32 | offset, | ||
chess_output v16int32 & | v1 | ||
) |
Reads a 16 lane vector of int32 from the LUTs and stores the result in v1.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data will be stored. |
void load_lut_int8 | ( | const void * | lut1, |
const void * | lut2, | ||
v16int32 | offset, | ||
chess_output v64int8 & | v1 | ||
) |
Reads a 64 lane vector of int8 from the LUTs and stores the result in v1.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data will be stored. |
void load_lut_int8 | ( | const void * | lut1, |
const void * | lut2, | ||
v16uint32 | offset, | ||
chess_output v64int8 & | v1 | ||
) |
Reads a 64 lane vector of int8 from the LUTs and stores the result in v1.
lut1 | Pointer to LUT 1. Must be 256-bit aligned. |
lut2 | Pointer to LUT 2. Must be 256-bit aligned. |
offset | Offset for generation of LUT access address. |
v1 | Reference to vector in which the data will be stored. |