Vitis 2024.2

Documentation changes

Global AIE API changes

Implement arbitrary vector and accum size support
Add cbfloat16 support on AIE-ML

Changes to data types

accum: Implement construction from block_vector
accum: Implement grow_replicate
mask: Enable get_submask to work with ElemsOut less than word size
mask: Implement insert and extract

Changes to operations

fft: Implement dynamic vectorization support for AIE
fft: Implement radix 2 and radix 4 cbfloat16 support on AIE-ML
mmul: Implement 8x8x8 bfloat16 mmul mode on AIE-ML
shuffle: Fix optimized code paths for when the shuffle mode is known at compile time

ADF integration

Enable cbfloat16 streams for AIE-ML
Implement vector reads and writes on cascades

Vitis 2024.1

Documentation changes

Further refactoring to remove detail namespace from documentation
Document default accumulator type for given mul inputs
Document float conversion implementations and behaviour

Global AIE API changes

Add vector and accum template deduction guides
Leverage aie::zeros in place of aie:broadcast(0) internally to prevent non-trivial conversions
Several refactors to isolate external interfaces from detail namespace
Add clamp function
Add aie::saturating_add aie::saturating_sub functions
Add aie::real and aie::imag helpers to get real and imaginary parts of vectors

Changes to data types

accum: Fix sub-accum insertion
vector: Optimize vector::get for 1K vectors
vector: Extend pack/unpack to work for arbitrary conversions
tensor_buffer_stream: Fix native implementations for non-native vector sizes

Changes to operations

accumulate: Add array-based interface as alternative to variadic interface
compare: Add workaround for IEEE incompliance to equality comparisons for bfloat16 on AIE-ML
compare: Optimize comparisons with zero
compare: Fix scalar/vector comparisons on AIE
concat: Add support for concatenating tuples of vectors/accumulators
elementary: Optimize vector unrolling for scalar functions
fft: Fix vectorization limits for odd-radix implementations
fft: Add radix2 combiner for cint16 input/output with cint32 twiddles
fft: Add radix3 and radix5 stage0 implementations on AIE-ML
fft: Add cfloat radix3 and radix5 stage0 implementations on AIE
fft: Add 32b twiddle up/down dit FFTs
mmul: Add dynamic sign support for sparse_vector
mmul: Implemented additional modes
- AIE-ML: 16b x 8b - 8x4x8 , 32b x 16b - 4x4x8, bf16 x bf16 - 4x8x8
mmul: Add some outer product modes as aie::mmul<M,1,N>
mmul: Enable zeroization for fp32 mmuls
mul: Implement cint32 * int16 and int16 * cint32 muls
neg: Add bfloat16 support for AIE-ML
sliding_mul: Optimize complex x real implementations
sliding_mul: Add bfloat16 support on AIE-ML
to_fixed/to_float: Add support for unsigned types on AIE-ML

ADF integration

Fix cfloat stream reads

Vitis 2023.2

Documentation changes

Integrate AIE-ML documentation
Document rounding modes
Expand accumulate documentation
Clarify limitations on 8b parallel lookup
Fix mmul member documentation
Clarify requirement for linear_approx step bits
Improve documentation of vector, accum, and mask
Highlight architecture requirements of functions using C++ requires clauses
Document FFT twiddle factor generation
Clarify internal rounding mode for bfloat16 to integer conversion
Clarify native and emulated modes for mmul
Clarify native and emulated modes for sliding_mul
Document sparse_vector_input_buffer_stream with memory layout and GEMM example
Document tensor_buffer_stream with a GEMM example

Global AIE API changes

Add cfloat support for AIE-ML

Changes to data types

vector: Optimize grow_replicate on AIE-ML
mmul: Support reinitialization from an accum
DM resources: Add compound aie_dm_resource variants
streams: Add sparse_vector_input_buffer_stream for loading sparse data on AIE-ML
streams: Add tensor_buffer_stream to handle multi-dimensional addressing for AIE-ML
bfloat16: Add specialization for std::numeric_limits on AIE-ML

Changes to operations

abs: Fix for float input
add_reduce: Optimize for 8b and 16b types on AIE-ML
div: Implement vector-vector and vector-scalar division
downshift: Implement logical_downshift for AIE
fft: Add support for 32 bit twiddles on AIE
fft: Fix for radix-3 and radix-5 FFTs on AIE
fft: Fix radix-5 performance for low vectorizations on AIE
fft: Add stage-based FFT functions and deprecate iterator interface
mul: Fix for vector * vector_elem_ref on AIE
print_fixed: Support printing Q format data
print_matrix: Added accumulator support
sliding_mul: Add support float
sliding_mul: Add support for remaining 32b modes for AIE-ML
sliding_mul: Add support for Points < Native Points
sliding_mul_ch: Fix DataStepX == DataStepY requirement
sincos: Optimize AIE implementation
to_fixed: Fix for AIE-ML
to_fixed/to_float: Add vectorized float conversions for AIE
to_fixed/to_float: Add generic conversions ((int8, int16, int32) <-> (bfloat16, float)) for AIE-ML

ADF integration

Add TLAST support for stream reads on AIE-ML
Add support for input_cascade and output_cascade types
Deprecate accum reads from input_stream and output_stream

Vitis 2023.1

Documentation changes

Add explanation of FFT inputs
Use block_size in FFT docs
Clarify matrix data layout expectations
Clarify downshift being arithmetic
Correct description of bfloat16 linear_approx lookup table

Global AIE API changes

Do not explicitly initialize inferred template arguments
More aggressive inlining of internal functions
Avoid using 128b vectors in stream helper functions for AIE-ML

Changes to data types

iterator: Do not declare iterator data members as const
mask: Optimized implementation for 64b masks on AIE-ML
mask: New constructors added to initialize the mask from uint32 or uint64 values
vector: Fix 1024b inserts
vector: Use 128b concats in upd_all
vector: Fix 8b unsigned to_vector for AIE-ML

Changes to operations

add/sub: Support for dynamic accumulator zeroization
begin_restrict_vector: Add implementation for io_buffer
eq: Add support for complex numbers
fft: Correctly set radix configuration in fft::begin_stage calls
inv/invsqrt: Add implementation for AIE-ML
linear_approx: Performance optimization for AIE-ML
logical_downshift: New function that implements a logical downshift (as opposed to aie::downshift, which is arithmetic)
max/min/maxdiff: Add support for dynamic sign
mmul: Implement 16b 8x2x8 mode for AIE-ML
mmul: Implement 8b 8x8x8 mode for AIE-ML
mmul: Implemet missing 16b x 8b and 8b x 4b sparse multiplication modes for AIE-ML
neq: Add support for complex numbers
parallel_lookup: Optimize implementation for signed truncation
print_matrix: New function that prints vectors with the specified matrix shape
shuffle_up/down: Minor optimization for 16b
shuffle_up/down: Optimized implementation for AIE-ML
sliding_mul: Support data_start/coeff_start values larger than vector size
sliding_mul: Add support for 32b modes for AIE-ML
sliding_mul: Add 2 point 16b 16 channel for AIE-ML
sliding_mul_ch: New function for multi-channel multiplication modes for AIE-ML
sliding_mul_sym_uct: Fix for 16b two-buffer implementation
store_unaligned_v: Optimized implementation for AIE-ML
transpose: Add support for 64b and 32b types
transpose: Enable transposition of 256 element 4b vectors (scalar implementation for now)
to_fixed: Add bfloat16 to int32 conversion on AIE-ML

Vitis 2022.2

Documentation changes

Add code samples for load_v/store_v and load_unaligned_v/store_unaligned_v
Enhanced documentation for parallel_lookup and linear_approx
Clarify coeff vector size limit on AIE-ML

Global AIE API changes

Remove usage of srs in compare functions, to avoid compilation warnings as it is deprecated
Add support for stream ADF vector types on AIE-ML

Changes to data types

mask: add shift operators
saturation_mode: add saturate value. It was previously named truncate, which is not correct. The old name is also kept until it is deprecated

Changes to operations

add: support accumulator addition on AIE-ML
add_reduce: add optimized implementation for cfloat on AIE
add_reduce: add optimized implementation for bfloat16 on AIE-ML
eq/neq: enhanced implementation on AIE-ML
le: enhanced implementation on AIE-ML
load_unaligned_v: leverage pointer truncation to 128b done by HW on AIE
fft: add support for radix 3/5 on AIE
mmul: add matrix x vector multiplicatio modes on AIE
mmul: add support for dynamic accumulator zeroization
to_fixed: added implementation for AIE-ML
to_fixed: provide a default return type
to_float: added implementation for AIE-ML
reverse: optimized implementation for 32b and 64b on AIE-ML
zeros: include fixes on AIE

Vitis 2022.1

Documentation changes

Small documentation fixes for operators
Issues of documentation on msc_square and mmul
Enhance documentation for sliding_mul operations
Change logo in documentation
Add documentation for ADF stream operators

Global AIE API changes

Add support for emulated FP32 data types and operations on AIE-ML

Changes to data types

unaligned_vector_iterator: add new type and helper functions
random_circular_vector_iterator: add new type and helper functions
iterator: add linear iterator type and helper functions for scalar values
accum: add support for dynamic sign in to/from_vector on AIE-ML
accum: add implicit conversion to float on AIE-ML
vector: add support for dynamic sign in pack/unpack
vector: optimization of initialization by value on AIE-ML
vector: add constructor from 1024b native types on AIE-ML
vector: fixes and optimizations for unaligned_load/store

Changes to operations

adf::buffer_port: add many wrapper iterators
adf::stream: annotate read/write functions with stream resource so they can be scheduled in parallel
adf::stream: add stream operator overloading
fft: performance fixes on AIE-ML
max/min/maxdiff: add support for bfloat16 and float on AIE-ML
mul/mmul: add support for bfloat16 and float on AIE-ML
mul/mmul: add support for dynamic sign AIE-ML
parallel_lookup: expanded to int16->bfloat, performance optimisations, and softmax kernel
print: add support to print accumulators
add/max/min_reduce: add support for float on AIE-ML
reverse: add optimized implementation on AIE-ML using matrix multiplications
shuffle_down_replicate: add new function
sliding_mul: add 32b for 8b * 8b and 16b * 16b on AIE-ML
transpose: add new function and implementation for AIE-ML
upshift/downshift: add implementation for AIE-ML

Vitis 2021.2

Documentation changes

Fix description of sliding_mul_sym_uct
Make return types explicit for better documentation
Fix documentation for sin/cos so that it says that the input must be in radians
Add support for concepts
Add documenttion for missing arguments and fix wrong argument names
Fixes in documentation for int4/uint4 AIE-ML types
Add documentation for the mmul class
Update documentation about supported accumulator sizes
Update the matrix multiplication example to use the new MxKxN scheme and size_A/size_B/size_C

Global AIE API changes

Make all entry points always_inline
Add declaration macros to aie_declaration.hpp so that they can be used in headers parsed by aiecompiler

Changes to data types

Add support for bfloat16 data type on AIE-ML
Add support for cint16/cint32 data types on AIE-ML
Add an argument to vector::grow, to specify where the input vector will be located in the output vector
Remove copy constructor so that the vector type becomes trivial
Remove copy constructor so that the mask type becomes trivial
Make all member functions in circular_index constexpr
Add tiled_mdspan::begin_vector_dim functions that return vector iterators
Initial support for sparse vectors on AIE-ML, including iterators to read from memory
Make vector methods always_inline
Make vector::push be applied to the object it is called on and return a reference

Changes to operations

add: Implementation optimization on AIE-ML
add_reduce: Implement on AIE-ML
bit/or/xor: Implement scalar x vector variants of bit operations
equal/not_equal: Add fix in which not all lanes were being compared for certain vector sizes.
fft: Interface change to enhance portability across AIE/AIE-ML
fft: Add initial support on AIE-ML
fft: Add alignment checks for x86sim in FFT iterators
fft: Make FFT output interface uniform for radix 2 cint16 upscale version on AIE
filter_even/filter_odd: Functional fixes
filter_even/filter_odd: Performance improvement for 4b/8b/16b implementations
filter_even/filter_odd: Performance optimization on AIE-ML
filter_even/filter_odd: Do not require step argument to be a compile-time constant
interleave_zip/interleave_unzip: Improve performance when configuration is a run-time value
interleave_*: Do not require step argument to be a compile-time constant
load_floor_v/load_floor_bytes_v: New functions that floor the pointer to a requested boundary before performing the load.
load_unaligned_v/store_unaligned_v: Performance optimization on AIE-ML
lut/parallel_lookup/linear_approx: First implementation of look-up based linear functions on AIE-ML.
max_reduce/min_reduce: Add 8b implementation
max_reduce/min_reduce: Implement on AIE-ML
mmul: Implement new shapes for AIE-ML
mmul: Initial support for 4b multiplication
mmul: Add support for 80b accumulation for 16b x 32b / 32b x 16b cases
mmul: Change dimension names from MxNxK to MxKxN
mmul: Add size_A/size_B/size_C data members
mul: Optimized mul+conj operations to merged into a single intrinsic call on AIE-ML
sin/cos/sincos: Fix to avoid int -> unsigned conversions that reduce the range
sin/cos/sincos: Use a compile-time division to compute 1/PI
sin/cos/sincos: Fix floating-point range
sin/cos/sincos: Optimized implementation for float vector
shuffle_up/shuffle_down: Elements don't wrap around anymore. Instead, new elements are undefined.
shuffle_up_rotate/shuffle_down_rotate: New variants added for the cases in which elements need to wrap-around
shuffle_up_replicate: Variant added which replicates the first element.
shuffle_up_fill: Variant added which fills new elements with elements from another vector.
shuffle_*: Optimization in shuffle primitives on AIE, especially for 8b/16b cases
sliding_mul: Fixes to handle larger Step values for cfloat variants
sliding_mul: Initial implementation for 16b x 16b and cint16b x cint16b on AIE-ML
sliding_mul: Optimized mul+conj operations to merged into a single intrinsic call on AIE-ML
sliding_mul_sym: Fixes in start computation for filters with DataStepX > 1
sliding_mul_sym: Add missing int32 x int16 / int16 x int32 type combinations
sliding_mul_sym: Fix two-buffer sliding_mul_sym acc80
sliding_mul_sym: Add support for separate left/right start arguments
store_v: Support pointers annotated with storage attributes

Table of Contents

Vitis 2024.2

Documentation changes

Global AIE API changes

Changes to data types

Changes to operations

ADF integration

Vitis 2024.1

Documentation changes

Global AIE API changes

Changes to data types

Changes to operations

ADF integration

Vitis 2023.2

Documentation changes

Global AIE API changes

Changes to data types

Changes to operations

ADF integration

Vitis 2023.1

Documentation changes

Global AIE API changes

Changes to data types

Changes to operations

Vitis 2022.2

Documentation changes

Global AIE API changes

Changes to data types

Changes to operations

Vitis 2022.1

Documentation changes

Global AIE API changes

Changes to data types

Changes to operations

Vitis 2021.2

Documentation changes

Global AIE API changes

Changes to data types

Changes to operations