AI Engine API User Guide (AIE) 2022.1
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages Concepts
Fast Fourier Transform (FFT)

Overview

The AIE API offers the aie::fft_dit class template that provides the building blocks to implement all the stages of the FFT decimation-in-time computation. This class template is parametrized with the data type, the radix to be used and the vectorization of the FFT stage to be computed. The resulting class defines a stage iterator that contains the state of the computation (pointers to the input and twiddle arrays), and a function that performs the decimation in time.

By default the input and output type are the same, but you can optionally specify a second type for a different output type.

The following snippet shows the sample implementation of a radix-2 FFT stage for any vectorization value.

template <unsigned Vectorization>
void radix2_dit(const cint32 * __restrict x,
const cint16 * __restrict tw,
unsigned n, unsigned shift_tw, unsigned shift, bool inv,
cint32 * __restrict y)
{
using FFT = aie::fft_dit<Vectorization, 2, cint32>; // radix = 2, input_type = cint32, output_type = cint32 (output defaults to input)
FFT fft;
auto it_stage = fft.begin_stage(x, tw);
auto it_out0 = aie::begin_vector<FFT::out_vector_size>(y);
auto it_out1 = aie::begin_vector<FFT::out_vector_size>(y + n / 2);
for (int j = 0; j < n / (2 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const auto out = fft.dit(*it_stage++, shift_tw, shift, inv);
*it_out0++ = out[0];
*it_out1++ = out[1];
}
}
auto inv(E a)
Definition: aie.hpp:7513
shift_bits< T, type_bits_v< T >, Elems > shift
Definition: shift.hpp:119
Definition: aie_declaration.hpp:85

Typedefs

template<unsigned Vectorization, unsigned Radix, typename T1 , typename T2 = T1>
using aie::fft_dit = detail::fft_dit< Vectorization, detail::fft_get_stage< Vectorization, Radix, T1, T2 >(), Radix, T1, T2 >
  More...
 

Supported Fast Fourier Transform Modes

Supported FFT/IFFT modes
Data typeRadix
c16b 2
c32b 2
c32b 4
cfloat 2
For a given radix R with n number of points, there will be R number of outputs spaced by n/R. The above example shows 2 outputs for a radix 2, for radix 4 we would have:
template <unsigned Vectorization>
void radix4_dit(const cint32 * __restrict x,
const cint16 * __restrict tw1,
const cint16 * __restrict tw2,
unsigned n, unsigned shift_tw, unsigned shift, bool inv,
cint32 * __restrict y)
{
FFT fft;
auto it_stage = fft.begin_stage(x, tw1, tw2);
auto it_out0 = aie::begin_restrict_vector<FFT::out_vector_size>(y);
auto it_out1 = aie::begin_restrict_vector<FFT::out_vector_size>(y + n / 4);
auto it_out2 = aie::begin_restrict_vector<FFT::out_vector_size>(y + 2*n / 4);
auto it_out3 = aie::begin_restrict_vector<FFT::out_vector_size>(y + 3*n / 4);
chess_report(FFT::out_vector_size);
for (int j = 0; j < n / (4 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const auto out = fft.dit(*it_stage++, shift_tw, shift, inv);
*it_out0++ = out[0];
*it_out1++ = out[1];
*it_out2++ = out[2];
*it_out3++ = out[3];
}
}
However you can also use less with different incrementation schemes to achieve the same result. This approach is currently recommended for optimal performance on c32b radix 4 for Vectorization > 1 due to internal pointer requirements:
template <unsigned Vectorization>
void radix4_dit(const cint32 * __restrict x,
const cint16 * __restrict tw1,
const cint16 * __restrict tw2,
unsigned n, unsigned shift_tw, unsigned shift, bool inv,
cint32 * __restrict y)
{
FFT fft;
auto it_stage = fft.begin_stage(x, tw1, tw2);
auto it_out0 = aie::begin_restrict_vector<FFT::out_vector_size>(y);
auto it_out1 = aie::begin_restrict_vector<FFT::out_vector_size>(y + n / 2);
for (int j = 0; j < n / (4 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const auto out = fft.dit(*it_stage++, shift_tw, shift, inv);
*it_out0 = out[0]; it_out0 += n/16;
*it_out0 = out[1]; it_out0 += -(int)(n/16)+1;
*it_out1 = out[2]; it_out1 += n/16;
*it_out1 = out[3]; it_out1 += -(int)(n/16)+1;
}
}
Going to/from cint16/cint32 is also supported. Here is an example of a radix 2 downscaling from cint32 to cint16:
template <unsigned Vectorization>
void radix2_down_dit(const cint32_t * __restrict x,
const cint16_t * __restrict tw,
int n, int shift_tw, int shift, bool inv,
cint16_t * __restrict y)
{
FFT fft;
auto it_stage = fft.begin_stage(x, tw);
auto it_out0 = aie::begin_vector<FFT::out_vector_size>(y);
auto it_out1 = aie::begin_vector<FFT::out_vector_size>(y + n / 2);
for (int j = 0; j < n / (2 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const auto out = fft.dit(*it_stage++, shift_tw, shift, inv);
*it_out0++ = out[0];
*it_out1++ = out[1];
}
}
cint16 cint16_t
Definition: types.hpp:197
cint32 cint32_t
Definition: types.hpp:198

Typedef Documentation

◆ fft_dit

template<unsigned Vectorization, unsigned Radix, typename T1 , typename T2 = T1>
using aie::fft_dit = typedef detail::fft_dit<Vectorization, detail::fft_get_stage<Vectorization, Radix, T1, T2>(), Radix, T1, T2>

Type that encapsulates the functionality for decimation-in-time FFTs.

Template Parameters
VectorizationVectorization of the FFT stage
RadixNumber which selects the FFT radix.
T1Type of the input elements.
T2Type of the output elements, defaults to input type.