AI Engine Intrinsics User Guide
(AIE) v(2024.1)
|
Vector MAC intrinsics with pre-adding with 16 bit real by 16 bit real
Functions | |
v16acc48 | mac16_sym (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply-accumulate intrinsic function with pre-add from x input buffer . | |
v16acc48 | mac16_sym (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply-accumulate intrinsic function with pre-add from x input buffer using small X input buffer. | |
v16acc48 | mac16_sym (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply-accumulate intrinsic function with pre-add from x and y input buffers using small X input buffer. | |
v8acc48 | mac8_sym (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply-accumulate intrinsic function with pre-add from x input buffer . | |
v8acc48 | mac8_sym (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply-accumulate intrinsic function with pre-add from x input buffer using small X input buffer. | |
v8acc48 | mac8_sym (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply-accumulate intrinsic function with pre-add from x and y input buffers using small X input buffer. | |
v16acc48 | msc16_sym (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply-subtract intrinsic function with pre-add from x input buffer . | |
v16acc48 | msc16_sym (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply-subtract intrinsic function with pre-add from x input buffer using small X input buffer. | |
v16acc48 | msc16_sym (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply-subtract intrinsic function with pre-add from x and y input buffers using small X input buffer. | |
v8acc48 | msc8_sym (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply-subtract intrinsic function with pre-add from x input buffer . | |
v8acc48 | msc8_sym (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply-subtract intrinsic function with pre-add from x input buffer using small X input buffer. | |
v8acc48 | msc8_sym (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply-subtract intrinsic function with pre-add from x and y input buffers using small X input buffer. | |
v16acc48 | mul16_sym (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply intrinsic function with pre-add from x input buffer . | |
v16acc48 | mul16_sym (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply intrinsic function with pre-add from x input buffer using small X input buffer. | |
v16acc48 | mul16_sym (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply intrinsic function with pre-add from x and y input buffers using small X input buffer. | |
v8acc48 | mul8_sym (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply intrinsic function with pre-add from x input buffer . | |
v8acc48 | mul8_sym (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply intrinsic function with pre-add from x input buffer using small X input buffer. | |
v8acc48 | mul8_sym (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply intrinsic function with pre-add from x and y input buffers using small X input buffer. | |
v16acc48 | negmul16_sym (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply-negate intrinsic function with pre-add from x input buffer . | |
v16acc48 | negmul16_sym (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply-negate intrinsic function with pre-add from x input buffer using small X input buffer. | |
v16acc48 | negmul16_sym (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Symetric multiply-negate intrinsic function with pre-add from x and y input buffers using small X input buffer. | |
v8acc48 | negmul8_sym (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply-negate intrinsic function with pre-add from x input buffer . | |
v8acc48 | negmul8_sym (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply-negate intrinsic function with pre-add from x input buffer using small X input buffer. | |
v8acc48 | negmul8_sym (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Symetric multiply-negate intrinsic function with pre-add from x and y input buffers using small X input buffer. | |
v16acc48 mac16_sym | ( | v16acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply-accumulate intrinsic function with pre-add from x input buffer .
acc | Incoming accumulation vector (16 x int48 lanes) |
xbuff | Input buffer of 64 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 mac16_sym | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply-accumulate intrinsic function with pre-add from x input buffer using small X input buffer.
acc | Incoming accumulation vector (16 x int48 lanes) |
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 mac16_sym | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply-accumulate intrinsic function with pre-add from x and y input buffers using small X input buffer.
acc | Incoming accumulation vector (16 x int48 lanes) |
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | Right input buffer of 32 elements of type int16 |
ystart | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 mac8_sym | ( | v8acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply-accumulate intrinsic function with pre-add from x input buffer .
acc | Incoming accumulation vector (8 x int48 lanes) |
xbuff | Input buffer of 64 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 mac8_sym | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply-accumulate intrinsic function with pre-add from x input buffer using small X input buffer.
acc | Incoming accumulation vector (8 x int48 lanes) |
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 mac8_sym | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply-accumulate intrinsic function with pre-add from x and y input buffers using small X input buffer.
acc | Incoming accumulation vector (8 x int48 lanes) |
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | Right input buffer of 32 elements of type int16 |
ystart | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 msc16_sym | ( | v16acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply-subtract intrinsic function with pre-add from x input buffer .
acc | Incoming accumulation vector (16 x int48 lanes) |
xbuff | Input buffer of 64 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 msc16_sym | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply-subtract intrinsic function with pre-add from x input buffer using small X input buffer.
acc | Incoming accumulation vector (16 x int48 lanes) |
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 msc16_sym | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply-subtract intrinsic function with pre-add from x and y input buffers using small X input buffer.
acc | Incoming accumulation vector (16 x int48 lanes) |
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | Right input buffer of 32 elements of type int16 |
ystart | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 msc8_sym | ( | v8acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply-subtract intrinsic function with pre-add from x input buffer .
acc | Incoming accumulation vector (8 x int48 lanes) |
xbuff | Input buffer of 64 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 msc8_sym | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply-subtract intrinsic function with pre-add from x input buffer using small X input buffer.
acc | Incoming accumulation vector (8 x int48 lanes) |
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 msc8_sym | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply-subtract intrinsic function with pre-add from x and y input buffers using small X input buffer.
acc | Incoming accumulation vector (8 x int48 lanes) |
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | Right input buffer of 32 elements of type int16 |
ystart | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 mul16_sym | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply intrinsic function with pre-add from x input buffer .
xbuff | Input buffer of 64 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 mul16_sym | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply intrinsic function with pre-add from x input buffer using small X input buffer.
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 mul16_sym | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply intrinsic function with pre-add from x and y input buffers using small X input buffer.
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | Right input buffer of 32 elements of type int16 |
ystart | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 mul8_sym | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply intrinsic function with pre-add from x input buffer .
xbuff | Input buffer of 64 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 mul8_sym | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply intrinsic function with pre-add from x input buffer using small X input buffer.
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 mul8_sym | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply intrinsic function with pre-add from x and y input buffers using small X input buffer.
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | Right input buffer of 32 elements of type int16 |
ystart | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_sym | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply-negate intrinsic function with pre-add from x input buffer .
xbuff | Input buffer of 64 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_sym | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply-negate intrinsic function with pre-add from x input buffer using small X input buffer.
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_sym | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Symetric multiply-negate intrinsic function with pre-add from x and y input buffers using small X input buffer.
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | Right input buffer of 32 elements of type int16 |
ystart | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_sym | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply-negate intrinsic function with pre-add from x input buffer .
xbuff | Input buffer of 64 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_sym | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply-negate intrinsic function with pre-add from x input buffer using small X input buffer.
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_sym | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Symetric multiply-negate intrinsic function with pre-add from x and y input buffers using small X input buffer.
xbuff | Input buffer of 32 elements of type int16 |
xstart | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | Right input buffer of 32 elements of type int16 |
ystart | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | Input buffer of 16 elements of type int16 |
zstart | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | Step between each column for selection in the zbuffer. |