template<unsigned ParallelAccesses, typename OffsetType, typename SlopeType = OffsetType>
struct aie::lut< ParallelAccesses, OffsetType, SlopeType >
Abstraction to represent a LUT that is stored in memory, instantiated with pointer(s) to the already populated memory and the number of elements. To achieve a given level of access parallelism, the LUT values need to have a specific layout in memory:
- To achieve the 2 loads in parallel, the LUT needs to have 2 copies of the desired LUT values with repetition every 128b. For example with 32b values, in memory we would have the first 4 values (128b), then the same 4 again, then the next 4, which then repeat, etc.
- For 4 loads in parallel, we require the same layout as above, but two distinct copies like this, each placed in different banks. This is in the end 4 total copies of the LUT values in memory.
- For a single load without parallelism, the values are just stored linearly in memory.
Currently the only supported implementation is 4 parallel accesses on this architecture.
- Template Parameters
-
ParallelAccesses | Defines how many parallel accesses will be done in a single LUT access, possibilities depend on the hardware available for the given architecture |
OffsetType | Type of values stored within the lookup table. |
SlopeType | Optional template parameter, only needed in certain cases of linear approximation where the offset/slope value pair uses two different types. |
template<unsigned ParallelAccesses, typename OffsetType , typename SlopeType = OffsetType>
aie::lut< ParallelAccesses, OffsetType, SlopeType >::lut |
( |
unsigned |
LUT_elems, |
|
|
const void * |
LUT_ab, |
|
|
const void * |
LUT_cd |
|
) |
| |
|
inline |
Constructor for 4 parallel accesses. Each pointer points to an equivalent LUT populated within which the values are repeated twice, interleaved at a 128b granularity. In total the same values need to be present 4 times in memory to allow for the 4 parallel accesses.
- Parameters
-
LUT_elems | Number elements in the LUT (not accounting for repetition). |
LUT_ab | First two copies of the data, with the values repeated and interleaved at a 128b granularity. |
LUT_cd | Next two copies of the data, with the values repeated and interleaved at a 128b granularity. |