AI Engine API User Guide (AIE) 2022.1
|
The AIE API encapsulates the matrix multiplication functionality in the aie::mmul class template. This class template is parametrized with the matrix multiplication shape (MxKxN), the data types and, optionally, the requested accmululation precision. The resulting class defines a function that performs the multiplication and a data type for the result that can be converted to an accumulator/vector. The function interprets the input vectors as matrices as described by the shape parameters.
The following code snippet shows a sample blocked multiplication using the aie::mmul class. The matrices are assumed to be pre-tiled as defined by the mmul shape (MxK for A, KxN for B, and MxN for C).
Classes | |
struct | aie::mmul< M, N, K, TypeA, TypeB, AccumTag > |
8b x 8b | 16b x 8b | 8b x 16b | 16b x 16b | 32b x 16b | 16b x 32b | 32b x 32b | float |
---|---|---|---|---|---|---|---|
4x8x4 4x16x4 8x8x4 2x8x8 4x8x8 2x16x8 4x16x8 | 4x4x4 8x4x4 4x8x4 4x4x8 | 4x4x8 4x4x4 | 4x4x4 2x4x8 4x4x8 4x2x8 | 2x4x8 4x4x4 4x2x4 2x2x4 2x4x4 4x4x2 2x2x8 | 4x2x2 2x4x8 4x4x4 | 4x2x4 2x2x2 2x4x2 2x8x2 4x2x2 4x4x2 2x4x4 | 4x2x4 2x2x2 2x4x2 2x8x2 4x2x2 4x4x2 2x4x4 |
16b x c16b | 16b x c32b | c16b x 16b | c16b x c16b | c16b x 32b | c16b x c32b | 32b x c16b | 32b x c32b | c32b x 16b | c32b x c16b | c32b x 32b | c32b x c32b | float x cfloat | cfloat x float | cfloat x cfloat |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4x2x2 4x4x4 | 2x4x2 2x4x4 2x8x2 4x4x2 | 2x2x4 2x2x8 2x4x4 2x4x8 4x2x4 4x4x2 4x4x4 | 2x2x2 2x4x2 2x8x2 2x4x4 4x2x2 4x4x2 4x2x4 | 2x2x2 2x4x2 2x8x2 2x4x4 4x2x2 4x4x2 4x2x4 | 2x2x2 2x4x2 | 2x2x2 2x4x2 2x8x2 2x4x4 4x2x2 4x4x2 4x2x4 | 2x2x2 2x4x2 | 2x4x2 2x8x2 2x4x4 4x4x2 | 2x2x2 2x4x2 | 1x2x2 2x2x2 2x4x2 | 1x2x2 2x2x1 2x2x2 | 2x2x2 2x4x2 | 2x2x2 2x4x2 | 2x2x2 2x2x4 2x4x2 4x2x2 |
8b x 4b | 8b x 8b | 16b x 8b | 8b x 16b | 16b x 16b | 32b x 16b | 16b x 32b | 32b x 32b | bfloat16 x bfloat16 |
---|---|---|---|---|---|---|---|---|
4x16x16 | 4x8x8 8x8x8 | 8x4x8 4x8x8 | 4x4x8 8x2x8 | 8x2x8 4x4x8 | 4x2x8 4x4x8 | 4x4x8 | 4x2x8 4x4x8 8x2x8 |
8b x 4b | 8b x 8b | 16b x 8b | 16b x 16b | bfloat16 x bfloat16 |
---|---|---|---|---|
c16b x 16b | c16b x c16b | c32b x c16b | c32b x c32b |
---|---|---|---|
<td style="vertical-align:top"> 1x4x8<br/> 2x2x16 <td style="vertical-align:top"> 1x2x4<br/> 1x2x8<br/> 1x2x16 <td style="vertical-align:top"> 1x2x8 |
struct aie::mmul |
Type that encapsulates a blocked matrix multiplication C = A x B
Objects of this type encapsulate the current result of the multiplication. The first result is computed with the mul method. New multiplications can be accumulated using the mac method.
M_Elems | Rows in matrix A. |
K_Elems | Columns in matrix A / Rows in matrix B. |
N_Elems | Columns in matrix B. |
TypeA | Type of the elements in matrix A. It must meet ElemBaseType. |
TypeB | Type of the elements in matrix B. By default is the same as TypeA. It must meet ElemBaseType. |
AccumTag | Type of the elements of the accumulator that contains the results to be written in matrix C. It must meet AccumElemBaseType. If not specified, it uses the default accumulation type for multiplications of TypeA x TypeB. |
Public Types | |
using | accum_type = typename mmul_impl::accum_type |
using | mmul_impl = detail::mmul< M_Elems, K_Elems, N_Elems, TypeA, TypeB, detail::to_native_accum_bits_for_mul_types_tag< TypeA, TypeB, AccumTag >()> |
Public Member Functions | |
mmul () | |
More... | |
mmul (const accum_type &acc) | |
More... | |
template<typename T > | |
mmul (const vector< T, M *N > &v, int shift=0) | |
More... | |
template<VectorOrOp VecA, VectorOrOp VecB> | |
void | mac (const VecA &a, const VecB &b) |
More... | |
template<VectorOrOp VecA, VectorOrOp VecB> | |
void | mul (const VecA &a, const VecB &b) |
More... | |
operator accum_type () const | |
More... | |
accum_type | to_accum () const |
More... | |
template<typename T > | |
vector< T, M *N > | to_vector (int shift=0) const |
More... | |
Static Public Member Functions | |
static constexpr unsigned | size () |
More... | |
Static Public Attributes | |
static constexpr unsigned | K = K_Elems |
More... | |
static constexpr unsigned | M = M_Elems |
More... | |
static constexpr unsigned | N = N_Elems |
More... | |
static constexpr unsigned | size_A = M * K |
More... | |
static constexpr unsigned | size_B = K * N |
More... | |
static constexpr unsigned | size_C = M * N |
More... | |
|
inline |
Constructor. Data is undefined.
|
inline |
Constructor. Data is initialized from the given accumulator.
acc | Accumulator data is initialized from. |
|
inline |
Constructor. Data is initialized from the given vector.
v | Vector data is initialized from. |
shift | Upshift in bits to be applied to input data. This parameter is ignored for floating-point types. |
|
inline |
Multiply the two given matrices and add it to the result.
a | Vector that represents the A input matrix. The number of elements must be M * N. Must meet VectorOrOp. |
b | Vector that represents the B input matrix. The number of elements must be N * K. Must meet VectorOrOp. |
|
inline |
Initialize the result value with the multiplication of the two given matrices.
a | Vector that represents the A input matrix. The number of elements must be M * N. Must meet VectorOrOp. |
b | Vector that represents the B input matrix. The number of elements must be N * K. Must meet VectorOrOp. |
|
inline |
Conversion operator to accumulator.
|
inlinestaticconstexpr |
Returns number of elements in matrix C
|
inline |
Return the result of the multiplication as an accumulator.
|
inline |
Return the result of the multiplication as an accumulator.
shift | Downshift in bits to be applied to output data. This parameter is ignored for floating-point types. |
|
staticconstexpr |
Number of columns in matrix A, and number of rows in matrix B.
|
staticconstexpr |
Number of rows in matrix A.
|
staticconstexpr |
Number of columns in matrix B.
|
staticconstexpr |
Number of elements in matrix A
|
staticconstexpr |
Number of elements in matrix B