-
add: Implementation optimization on AIE-ML
-
add_reduce: Implement on AIE-ML
-
bit/or/xor: Implement scalar x vector variants of bit operations
-
equal/not_equal: Add fix in which not all lanes were being compared for certain vector sizes.
-
fft: Interface change to enhance portability across AIE/AIE-ML
-
fft: Add initial support on AIE-ML
-
fft: Add alignment checks for x86sim in FFT iterators
-
fft: Make FFT output interface uniform for radix 2 cint16 upscale version on AIE
-
filter_even/filter_odd: Functional fixes
-
filter_even/filter_odd: Performance improvement for 4b/8b/16b implementations
-
filter_even/filter_odd: Performance optimization on AIE-ML
-
filter_even/filter_odd: Do not require step argument to be a compile-time constant
-
interleave_zip/interleave_unzip: Improve performance when configuration is a run-time value
-
interleave_*: Do not require step argument to be a compile-time constant
-
load_floor_v/load_floor_bytes_v: New functions that floor the pointer to a requested boundary before performing the load.
-
load_unaligned_v/store_unaligned_v: Performance optimization on AIE-ML
-
lut/parallel_lookup/linear_approx: First implementation of look-up based linear functions on AIE-ML.
-
max_reduce/min_reduce: Add 8b implementation
-
max_reduce/min_reduce: Implement on AIE-ML
-
mmul: Implement new shapes for AIE-ML
-
mmul: Initial support for 4b multiplication
-
mmul: Add support for 80b accumulation for 16b x 32b / 32b x 16b cases
-
mmul: Change dimension names from MxNxK to MxKxN
-
mmul: Add size_A/size_B/size_C data members
-
mul: Optimized mul+conj operations to merged into a single intrinsic call on AIE-ML
-
sin/cos/sincos: Fix to avoid int -> unsigned conversions that reduce the range
-
sin/cos/sincos: Use a compile-time division to compute 1/PI
-
sin/cos/sincos: Fix floating-point range
-
sin/cos/sincos: Optimized implementation for float vector
-
shuffle_up/shuffle_down: Elements don't wrap around anymore. Instead, new elements are undefined.
-
shuffle_up_rotate/shuffle_down_rotate: New variants added for the cases in which elements need to wrap-around
-
shuffle_up_replicate: Variant added which replicates the first element.
-
shuffle_up_fill: Variant added which fills new elements with elements from another vector.
-
shuffle_*: Optimization in shuffle primitives on AIE, especially for 8b/16b cases
-
sliding_mul: Fixes to handle larger Step values for cfloat variants
-
sliding_mul: Initial implementation for 16b x 16b and cint16b x cint16b on AIE-ML
-
sliding_mul: Optimized mul+conj operations to merged into a single intrinsic call on AIE-ML
-
sliding_mul_sym: Fixes in start computation for filters with DataStepX > 1
-
sliding_mul_sym: Add missing int32 x int16 / int16 x int32 type combinations
-
sliding_mul_sym: Fix two-buffer sliding_mul_sym acc80
-
sliding_mul_sym: Add support for separate left/right start arguments
-
store_v: Support pointers annotated with storage attributes