Overview

The AIE API encapsulates the matrix multiplication functionality in the aie::mmul class template. This class template is parametrized with the matrix multiplication shape (MxKxN), the data types and, optionally, the requested accmululation precision. The resulting class defines a function that performs the multiplication and a data type for the result that can be converted to an accumulator/vector. The function interprets the input vectors as matrices as described by the shape parameters.

The following code snippet shows a sample blocked multiplication using the aie::mmul class. The matrices are assumed to be pre-tiled as defined by the mmul shape (MxK for A, KxN for B, and MxN for C).

template <unsigned M, unsigned K, unsigned N>
void mmul_blocked(unsigned rowA, unsigned colA, unsigned colB,
                  const int16 * __restrict pA, const int16 * __restrict pB, int16 * __restrict pC)
{
   using MMUL = aie::mmul<M, K, N, int16, int16>;
 
   for (unsigned z = 0; z < rowA; z += 2) chess_loop_range(2,) {
       int16 * __restrict pC1 = pC + (      z * colB +       0) * MMUL::size_C;
       int16 * __restrict pC2 = pC + ((z + 1) * colB +       0) * MMUL::size_C;
 
       for (unsigned j = 0; j < colB; j += 2) chess_loop_range(2,) {
           const int16 * __restrict pA1 = pA + (      z * colA +       0) * MMUL::size_A;
           const int16 * __restrict pA2 = pA + ((z + 1) * colA +       0) * MMUL::size_A;
           const int16 * __restrict pB1 = pB + (      0 * colB +       j) * MMUL::size_B;
           const int16 * __restrict pB2 = pB + (      0 * colB + (j + 1)) * MMUL::size_B;
 
           aie::vector<int16, MMUL::size_A> A0 = aie::load_v<MMUL::size_A>(pA1); pA1 += MMUL::size_A;
           aie::vector<int16, MMUL::size_A> A1 = aie::load_v<MMUL::size_A>(pA2); pA2 += MMUL::size_A;
           aie::vector<int16, MMUL::size_B> B0 = aie::load_v<MMUL::size_B>(pB1); pB1 += MMUL::size_B * colB;
           aie::vector<int16, MMUL::size_B> B1 = aie::load_v<MMUL::size_B>(pB2); pB2 += MMUL::size_B * colB;
 
           MMUL C00; C00.mul(A0, B0);
           MMUL C01; C01.mul(A0, B1);
           MMUL C10; C10.mul(A1, B0);
           MMUL C11; C11.mul(A1, B1);
 
           for (unsigned i = 1; i < colA; ++i) chess_prepare_for_pipelining chess_loop_range(3,) {
               A0 = aie::load_v<MMUL::size_A>(pA1); pA1 += MMUL::size_A;
               A1 = aie::load_v<MMUL::size_A>(pA2); pA2 += MMUL::size_A;
               B0 = aie::load_v<MMUL::size_B>(pB1); pB1 += MMUL::size_B * colB;
               B1 = aie::load_v<MMUL::size_B>(pB2); pB2 += MMUL::size_B * colB;
 
               C00.mac(A0, B0);
               C01.mac(A0, B1);
               C10.mac(A1, B0);
               C11.mac(A1, B1);
           }
 
           aie::store_v(pC1, C00.template to_vector<int16>()); pC1 += MMUL::size_C;
           aie::store_v(pC1, C01.template to_vector<int16>()); pC1 += MMUL::size_C;
           aie::store_v(pC2, C10.template to_vector<int16>()); pC2 += MMUL::size_C;
           aie::store_v(pC2, C11.template to_vector<int16>()); pC2 += MMUL::size_C;
       }
   }
}

Classes
struct	aie::mmul< M, N, K, TypeA, TypeB, AccumTag >

Supported matrix multiplication shapes

Matrix multiplication modes for real types
8b x 8b	16b x 8b	8b x 16b	16b x 16b	32b x 16b	16b x 32b	32b x 32b	float
4x8x4 4x16x4 8x8x4 2x8x8 4x8x8 2x16x8 4x16x8	4x4x4 8x4x4 4x8x4 4x4x8	4x4x8 4x4x4 8x8x1	4x4x4 2x4x8 4x4x8 4x2x8 8x8x1	2x4x8 4x4x4 4x2x4 2x2x4 2x4x4 4x4x2 2x2x8	4x2x2 2x4x8 4x4x4	4x2x4 2x2x2 2x4x2 2x8x2 4x2x2 4x4x2 2x4x4 4x4x1	4x2x4 2x2x2 2x4x2 2x8x2 4x2x2 4x4x2 2x4x4 4x4x1

Matrix multiplication modes for complex types (c16b/c32b/cfloat represent complex types)
16b x c16b	16b x c32b	c16b x 16b	c16b x c16b	c16b x 32b	c16b x c32b	32b x c16b	32b x c32b	c32b x 16b	c32b x c16b	c32b x 32b	c32b x c32b	float x cfloat	cfloat x float	cfloat x cfloat
4x2x2 4x4x4 4x4x1	2x4x2 2x4x4 2x8x2 4x4x2 4x4x1	2x2x4 2x2x8 2x4x4 2x4x8 4x2x4 4x4x2 4x4x4	2x2x2 2x4x2 2x8x2 2x4x4 4x2x2 4x4x2 4x2x4 4x4x1	2x2x2 2x4x2 2x8x2 2x4x4 4x2x2 4x4x2 4x2x4 4x4x1	2x2x2 2x4x2 4x2x1	2x2x2 2x4x2 2x8x2 2x4x4 4x2x2 4x4x2 4x2x4 4x4x1	2x2x2 2x4x2 4x2x1	2x4x2 2x8x2 2x4x4 4x4x2	2x2x2 2x4x2 4x4x1	1x2x2 2x2x2 2x4x2 4x4x1	1x2x2 2x2x1 2x2x2 2x2x1	2x2x2 2x4x2 4x2x1	2x2x2 2x4x2 2x4x1	2x2x2 2x2x4 2x4x2 4x2x2 4x2x1

Class Documentation

◆ aie::mmul

struct aie::mmul

template<unsigned M, unsigned N, unsigned K, typename TypeA, typename TypeB = TypeA, typename AccumTag = accauto>
struct aie::mmul< M, N, K, TypeA, TypeB, AccumTag >

Type that encapsulates a blocked matrix multiplication C = A x B

Objects of this type encapsulate the current result of the multiplication. The first result is computed with the mul method. New multiplications can be accumulated using the mac method.

Template Parameters

M_Elems	Rows in matrix A.
K_Elems	Columns in matrix A / Rows in matrix B.
N_Elems	Columns in matrix B.
TypeA	Type of the elements in matrix A. It must meet ElemBaseType.
TypeB	Type of the elements in matrix B. By default is the same as TypeA. It must meet ElemBaseType.
AccumTag	Type of the elements of the accumulator that contains the results to be written in matrix C. It must meet AccumElemBaseType. If not specified, it uses the default accumulation type for multiplications of TypeA x TypeB.

Inheritance diagram for aie::mmul< M, N, K, TypeA, TypeB, AccumTag >:

Public Types
using	accum_type = typename mmul_impl::accum_type

using	mmul_impl = detail::mmul< M_Elems, K_Elems, N_Elems, TypeA, TypeB, detail::to_native_accum_bits_for_mul_types_tag< TypeA, TypeB, AccumTag >()>

Public Member Functions
	mmul ()
	More...

	mmul (const accum_type &acc)
	More...

template<typename T >
	mmul (const T &acc)

template<typename T >
	mmul (const vector< T, M *N > &v, int shift=0)
	More...

template<VectorOrOp VecA, VectorOrOp VecB>
void	mac (const VecA &a, const VecB &b)
	More...

template<VectorOrOp VecA, VectorOrOp VecB>
void	mul (const VecA &a, const VecB &b)
	More...

	operator accum_type () const
	More...

accum_type	to_accum () const
	More...

template<typename T >
vector< T, M *N >	to_vector (int shift=0) const
	More...

Static Public Member Functions
static constexpr unsigned	size ()
	More...

Static Public Attributes
static constexpr unsigned	K = K_Elems
	More...

static constexpr unsigned	M = M_Elems
	More...

static constexpr unsigned	N = N_Elems
	More...

static constexpr unsigned	size_A = M * K
	More...

static constexpr unsigned	size_B = K * N
	More...

static constexpr unsigned	size_C = M * N
	More...

Constructor & Destructor Documentation

◆ mmul() [1/3]

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::mmul ( )

inline

Constructor. Data is undefined.

◆ mmul() [2/3]

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::mmul ( const accum_type & acc )

inline

Constructor. Data is initialized from the given accumulator.

Parameters

acc	Accumulator data is initialized from.

◆ mmul() [3/3]

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

template<typename T >

aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::mmul	(	const vector< T, M *N > &	v,
		int	shift = `0`
	)

inline

Constructor. Data is initialized from the given vector.

Parameters

v	Vector data is initialized from.
shift	Upshift in bits to be applied to input data. This parameter is ignored for floating-point types.

Member Function Documentation

◆ mac()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

template<VectorOrOp VecA, VectorOrOp VecB>

void aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::mac	(	const VecA &	a,
		const VecB &	b
	)

inline

Multiply the two given matrices and add it to the result.

Parameters

a	Vector that represents the A input matrix. The number of elements must be M * N. Must meet VectorOrOp.
b	Vector that represents the B input matrix. The number of elements must be N * K. Must meet VectorOrOp.

◆ mul()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

template<VectorOrOp VecA, VectorOrOp VecB>

void aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::mul	(	const VecA &	a,
		const VecB &	b
	)

inline

Initialize the result value with the multiplication of the two given matrices.

Parameters

a	Vector that represents the A input matrix. The number of elements must be M * N. Must meet VectorOrOp.
b	Vector that represents the B input matrix. The number of elements must be N * K. Must meet VectorOrOp.

◆ operator accum_type()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::operator accum_type ( ) const

inline

Conversion operator to accumulator.

◆ size()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

static constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::size ( )

inlinestaticconstexpr

Returns number of elements in matrix C

◆ to_accum()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

accum_type aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::to_accum ( ) const

inline

Return the result of the multiplication as an accumulator.

◆ to_vector()

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

template<typename T >

vector<T, M * N> aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::to_vector ( int shift = 0 ) const

inline

Return the result of the multiplication as a vector of the requested type.

Parameters

shift Downshift in bits to be applied to output data. This parameter is ignored for floating-point types.

Member Data Documentation

◆ K

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::K = K_Elems

staticconstexpr

Number of columns in matrix A, and number of rows in matrix B.

◆ M

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::M = M_Elems

staticconstexpr

Number of rows in matrix A.

◆ N

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::N = N_Elems

staticconstexpr

Number of columns in matrix B.

◆ size_A

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::size_A = M * K

staticconstexpr

Number of elements in matrix A

◆ size_B

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::size_B = K * N

staticconstexpr

Number of elements in matrix B

◆ size_C

template<unsigned M, unsigned N, unsigned K, typename TypeA , typename TypeB = TypeA, typename AccumTag = accauto>

constexpr unsigned aie::mmul< M, N, K, TypeA, TypeB, AccumTag >::size_C = M * N

staticconstexpr

Number of elements in matrix C

Overview

Classes

Supported matrix multiplication shapes

Class Documentation

◆ aie::mmul

Public Types

Public Member Functions

Static Public Member Functions

Static Public Attributes

Constructor & Destructor Documentation

◆ mmul() [1/3]

◆ mmul() [2/3]

◆ mmul() [3/3]

Member Function Documentation

◆ mac()

◆ mul()

◆ operator accum_type()

◆ size()

◆ to_accum()

◆ to_vector()

Member Data Documentation

◆ K

◆ M

◆ N

◆ size_A

◆ size_B

◆ size_C