AI Engine Intrinsics User Guide (AIE) v2024.2
Loading...
Searching...
No Matches

Overview

Vector MAC intrinsics with pre-subtraction and upshifting with 16 bit real by 16 bit real

Functions

v16acc48 mac8_antisym_uct (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply-accumulate intrinsic function with unit center-tap optimization with pre-sub from x input buffer using small X input buffer.
 
v16acc48 mac8_antisym_uct (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply-accumulate intrinsic function with unit center-tap optimization with pre-sub from x and y input buffers using small X input buffer.
 
v16acc48 mac8_antisym_uct (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply-accumulate intrinsic function with unit center-tap optimization with pre-sub from x input buffer .
 
v16acc48 msc8_antisym_uct (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply-subtract intrinsic function with unit center-tap optimization with pre-sub from x input buffer using small X input buffer.
 
v16acc48 msc8_antisym_uct (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply-subtract intrinsic function with unit center-tap optimization with pre-sub from x and y input buffers using small X input buffer.
 
v16acc48 msc8_antisym_uct (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply-subtract intrinsic function with unit center-tap optimization with pre-sub from x input buffer .
 
v16acc48 mul8_antisym_uct (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply intrinsic function with unit center-tap optimization with pre-sub from x input buffer using small X input buffer.
 
v16acc48 mul8_antisym_uct (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply intrinsic function with unit center-tap optimization with pre-sub from x and y input buffers using small X input buffer.
 
v16acc48 mul8_antisym_uct (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply intrinsic function with unit center-tap optimization with pre-sub from x input buffer .
 
v16acc48 negmul8_antisym_uct (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply-negate intrinsic function with unit center-tap optimization with pre-sub from x input buffer using small X input buffer.
 
v16acc48 negmul8_antisym_uct (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply-negate intrinsic function with unit center-tap optimization with pre-sub from x and y input buffers using small X input buffer.
 
v16acc48 negmul8_antisym_uct (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, int uct_col, int uct_shift, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep)
 Anti-symmetric multiply-negate intrinsic function with unit center-tap optimization with pre-sub from x input buffer .
 

Function Documentation

◆ mac8_antisym_uct() [1/3]

v16acc48 mac8_antisym_uct ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply-accumulate intrinsic function with unit center-tap optimization with pre-sub from x input buffer using small X input buffer.

acc0 += z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03)
acc1 += z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13)
acc2 += z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23)
acc3 += z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33)
acc4 += z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43)
acc5 += z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53)
acc6 += z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63)
acc7 += z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73)
acc8 = y0uct_select << uct_shift
acc9 = y1uct_select << uct_shift
acc10 = y2uct_select << uct_shift
acc11 = y3uct_select << uct_shift
acc12 = y4uct_select << uct_shift
acc13 = y5uct_select << uct_shift
acc14 = y6uct_select << uct_shift
acc15 = y7uct_select << uct_shift
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes for input from Y buffer.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ mac8_antisym_uct() [2/3]

v16acc48 mac8_antisym_uct ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply-accumulate intrinsic function with unit center-tap optimization with pre-sub from x and y input buffers using small X input buffer.

acc0 += z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03)
acc1 += z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13)
acc2 += z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23)
acc3 += z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33)
acc4 += z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43)
acc5 += z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53)
acc6 += z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63)
acc7 += z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73)
acc8 = y0uct_select << uct_shift
acc9 = y1uct_select << uct_shift
acc10 = y2uct_select << uct_shift
acc11 = y3uct_select << uct_shift
acc12 = y4uct_select << uct_shift
acc13 = y5uct_select << uct_shift
acc14 = y6uct_select << uct_shift
acc15 = y7uct_select << uct_shift
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ mac8_antisym_uct() [3/3]

v16acc48 mac8_antisym_uct ( v16acc48  acc,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply-accumulate intrinsic function with unit center-tap optimization with pre-sub from x input buffer .

acc0 += z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03)
acc1 += z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13)
acc2 += z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23)
acc3 += z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33)
acc4 += z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43)
acc5 += z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53)
acc6 += z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63)
acc7 += z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73)
acc8 = y0uct_select << uct_shift
acc9 = y1uct_select << uct_shift
acc10 = y2uct_select << uct_shift
acc11 = y3uct_select << uct_shift
acc12 = y4uct_select << uct_shift
acc13 = y5uct_select << uct_shift
acc14 = y6uct_select << uct_shift
acc15 = y7uct_select << uct_shift
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes for input from Y buffer.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ msc8_antisym_uct() [1/3]

v16acc48 msc8_antisym_uct ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply-subtract intrinsic function with unit center-tap optimization with pre-sub from x input buffer using small X input buffer.

acc0 -= z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03)
acc1 -= z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13)
acc2 -= z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23)
acc3 -= z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33)
acc4 -= z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43)
acc5 -= z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53)
acc6 -= z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63)
acc7 -= z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73)
acc8 = y0uct_select << uct_shift
acc9 = y1uct_select << uct_shift
acc10 = y2uct_select << uct_shift
acc11 = y3uct_select << uct_shift
acc12 = y4uct_select << uct_shift
acc13 = y5uct_select << uct_shift
acc14 = y6uct_select << uct_shift
acc15 = y7uct_select << uct_shift
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes for input from Y buffer.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ msc8_antisym_uct() [2/3]

v16acc48 msc8_antisym_uct ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply-subtract intrinsic function with unit center-tap optimization with pre-sub from x and y input buffers using small X input buffer.

acc0 -= z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03)
acc1 -= z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13)
acc2 -= z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23)
acc3 -= z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33)
acc4 -= z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43)
acc5 -= z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53)
acc6 -= z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63)
acc7 -= z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73)
acc8 = y0uct_select << uct_shift
acc9 = y1uct_select << uct_shift
acc10 = y2uct_select << uct_shift
acc11 = y3uct_select << uct_shift
acc12 = y4uct_select << uct_shift
acc13 = y5uct_select << uct_shift
acc14 = y6uct_select << uct_shift
acc15 = y7uct_select << uct_shift
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ msc8_antisym_uct() [3/3]

v16acc48 msc8_antisym_uct ( v16acc48  acc,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply-subtract intrinsic function with unit center-tap optimization with pre-sub from x input buffer .

acc0 -= z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03)
acc1 -= z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13)
acc2 -= z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23)
acc3 -= z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33)
acc4 -= z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43)
acc5 -= z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53)
acc6 -= z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63)
acc7 -= z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73)
acc8 = y0uct_select << uct_shift
acc9 = y1uct_select << uct_shift
acc10 = y2uct_select << uct_shift
acc11 = y3uct_select << uct_shift
acc12 = y4uct_select << uct_shift
acc13 = y5uct_select << uct_shift
acc14 = y6uct_select << uct_shift
acc15 = y7uct_select << uct_shift
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes for input from Y buffer.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ mul8_antisym_uct() [1/3]

v16acc48 mul8_antisym_uct ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply intrinsic function with unit center-tap optimization with pre-sub from x input buffer using small X input buffer.

acc0 = z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03)
acc1 = z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13)
acc2 = z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23)
acc3 = z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33)
acc4 = z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43)
acc5 = z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53)
acc6 = z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63)
acc7 = z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73)
acc8 = y0uct_select << uct_shift
acc9 = y1uct_select << uct_shift
acc10 = y2uct_select << uct_shift
acc11 = y3uct_select << uct_shift
acc12 = y4uct_select << uct_shift
acc13 = y5uct_select << uct_shift
acc14 = y6uct_select << uct_shift
acc15 = y7uct_select << uct_shift
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes for input from Y buffer.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ mul8_antisym_uct() [2/3]

v16acc48 mul8_antisym_uct ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply intrinsic function with unit center-tap optimization with pre-sub from x and y input buffers using small X input buffer.

acc0 = z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03)
acc1 = z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13)
acc2 = z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23)
acc3 = z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33)
acc4 = z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43)
acc5 = z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53)
acc6 = z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63)
acc7 = z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73)
acc8 = y0uct_select << uct_shift
acc9 = y1uct_select << uct_shift
acc10 = y2uct_select << uct_shift
acc11 = y3uct_select << uct_shift
acc12 = y4uct_select << uct_shift
acc13 = y5uct_select << uct_shift
acc14 = y6uct_select << uct_shift
acc15 = y7uct_select << uct_shift
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ mul8_antisym_uct() [3/3]

v16acc48 mul8_antisym_uct ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply intrinsic function with unit center-tap optimization with pre-sub from x input buffer .

acc0 = z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03)
acc1 = z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13)
acc2 = z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23)
acc3 = z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33)
acc4 = z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43)
acc5 = z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53)
acc6 = z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63)
acc7 = z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73)
acc8 = y0uct_select << uct_shift
acc9 = y1uct_select << uct_shift
acc10 = y2uct_select << uct_shift
acc11 = y3uct_select << uct_shift
acc12 = y4uct_select << uct_shift
acc13 = y5uct_select << uct_shift
acc14 = y6uct_select << uct_shift
acc15 = y7uct_select << uct_shift
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes for input from Y buffer.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ negmul8_antisym_uct() [1/3]

v16acc48 negmul8_antisym_uct ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply-negate intrinsic function with unit center-tap optimization with pre-sub from x input buffer using small X input buffer.

acc0 = -( z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03) )
acc1 = -( z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13) )
acc2 = -( z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23) )
acc3 = -( z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33) )
acc4 = -( z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43) )
acc5 = -( z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53) )
acc6 = -( z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63) )
acc7 = -( z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73) )
acc8 = y0uct_select << uct_shift )
acc9 = y1uct_select << uct_shift )
acc10 = y2uct_select << uct_shift )
acc11 = y3uct_select << uct_shift )
acc12 = y4uct_select << uct_shift )
acc13 = y5uct_select << uct_shift )
acc14 = y6uct_select << uct_shift )
acc15 = y7uct_select << uct_shift )
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes for input from Y buffer.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ negmul8_antisym_uct() [2/3]

v16acc48 negmul8_antisym_uct ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply-negate intrinsic function with unit center-tap optimization with pre-sub from x and y input buffers using small X input buffer.

acc0 = -( z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03) )
acc1 = -( z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13) )
acc2 = -( z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23) )
acc3 = -( z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33) )
acc4 = -( z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43) )
acc5 = -( z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53) )
acc6 = -( z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63) )
acc7 = -( z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73) )
acc8 = y0uct_select << uct_shift )
acc9 = y1uct_select << uct_shift )
acc10 = y2uct_select << uct_shift )
acc11 = y3uct_select << uct_shift )
acc12 = y4uct_select << uct_shift )
acc13 = y5uct_select << uct_shift )
acc14 = y6uct_select << uct_shift )
acc15 = y7uct_select << uct_shift )
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.

◆ negmul8_antisym_uct() [3/3]

v16acc48 negmul8_antisym_uct ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  ysquare,
int  uct_col,
int  uct_shift,
v16int16  zbuff,
int  zstart,
unsigned int  zoffsets,
int  zstep 
)

Anti-symmetric multiply-negate intrinsic function with unit center-tap optimization with pre-sub from x input buffer .

acc0 = -( z00*(x00 - y00) + z01*(x01 - y01) + z02*(x02 - y02) + z03*(x03 - y03) )
acc1 = -( z10*(x10 - y10) + z11*(x11 - y11) + z12*(x12 - y12) + z13*(x13 - y13) )
acc2 = -( z20*(x20 - y20) + z21*(x21 - y21) + z22*(x22 - y22) + z23*(x23 - y23) )
acc3 = -( z30*(x30 - y30) + z31*(x31 - y31) + z32*(x32 - y32) + z33*(x33 - y33) )
acc4 = -( z40*(x40 - y40) + z41*(x41 - y41) + z42*(x42 - y42) + z43*(x43 - y43) )
acc5 = -( z50*(x50 - y50) + z51*(x51 - y51) + z52*(x52 - y52) + z53*(x53 - y53) )
acc6 = -( z60*(x60 - y60) + z61*(x61 - y61) + z62*(x62 - y62) + z63*(x63 - y63) )
acc7 = -( z70*(x70 - y70) + z71*(x71 - y71) + z72*(x72 - y72) + z73*(x73 - y73) )
acc8 = y0uct_select << uct_shift )
acc9 = y1uct_select << uct_shift )
acc10 = y2uct_select << uct_shift )
acc11 = y3uct_select << uct_shift )
acc12 = y4uct_select << uct_shift )
acc13 = y5uct_select << uct_shift )
acc14 = y6uct_select << uct_shift )
acc15 = y7uct_select << uct_shift )
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xoffsets4b offset for each lane, while for each second lane, the offset is with respect to the previous lane + 1, applied to both x and y buffers. LSB apply to first lane
xstepStep between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit.
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes for input from Y buffer.
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
uct_colSelect which column of data from the Y buffer to upshift for the lower half of the output accumulator. This must be a compile time constant.
uct_shiftUpshift value to the four upshifted lanes
zbuffInput buffer of 16 elements of type int16
zstartStarting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used.
zoffsets4b offset is applied to each lane, applied to input from Z buffer. LSB apply to first lane
zstepStep between each column for selection in the zbuffer.
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how data selection works from the buffers go here. For this intrinsic, the data buffer uses the 16bx16b scheme and the coefficient buffer uses the general scheme.
  • Parameter 'zstart' must be a compile time constant.