AI Engine Intrinsics User Guide  (v2023.2)
 All Data Structures Namespaces Functions Variables Typedefs Groups Pages
16-bit Real x 16-bit Real

Overview

16-bit Real self multiplication intrinsics.

Functions

v16acc48 mac16 (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-accumulate intrinsic function .
 
v16acc48 mac16 (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-accumulate intrinsic function using small X input buffer.
 
v16acc48 mac16 (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-accumulate intrinsic function using small X input buffer.
 
v8acc48 mac8 (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-accumulate intrinsic function .
 
v8acc48 mac8 (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-accumulate intrinsic function using small X input buffer.
 
v8acc48 mac8 (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-accumulate intrinsic function using small X input buffer.
 
v16acc48 msc16 (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-subtract intrinsic function .
 
v16acc48 msc16 (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-subtract intrinsic function using small X input buffer.
 
v16acc48 msc16 (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-subtract intrinsic function using small X input buffer.
 
v8acc48 msc8 (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-subtract intrinsic function .
 
v8acc48 msc8 (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-subtract intrinsic function using small X input buffer.
 
v8acc48 msc8 (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-subtract intrinsic function using small X input buffer.
 
v16acc48 mul16 (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply intrinsic function .
 
v16acc48 mul16 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply intrinsic function using small X input buffer.
 
v16acc48 mul16 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply intrinsic function using small X input buffer.
 
v8acc48 mul8 (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply intrinsic function .
 
v8acc48 mul8 (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply intrinsic function using small X input buffer.
 
v8acc48 mul8 (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply intrinsic function using small X input buffer.
 
v16acc48 negmul16 (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-negate intrinsic function .
 
v16acc48 negmul16 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-negate intrinsic function using small X input buffer.
 
v16acc48 negmul16 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Multiply-negate intrinsic function using small X input buffer.
 
v8acc48 negmul8 (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-negate intrinsic function .
 
v8acc48 negmul8 (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-negate intrinsic function using small X input buffer.
 
v8acc48 negmul8 (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, int ystep, unsigned int ysquare)
 Multiply-negate intrinsic function using small X input buffer.
 

Function Documentation

v16acc48 mac16 ( v16acc48  acc,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function .

acc0 += x00*y00 + x01*y01
acc1 += x10*y10 + x11*y11
acc2 += x20*y20 + x21*y21
acc3 += x30*y30 + x31*y31
acc4 += x40*y40 + x41*y41
acc5 += x50*y50 + x51*y51
acc6 += x60*y60 + x61*y61
acc7 += x70*y70 + x71*y71
acc8 += x80*y80 + x81*y81
acc9 += x90*y90 + x91*y91
acc10 += x100*y100 + x101*y101
acc11 += x110*y110 + x111*y111
acc12 += x120*y120 + x121*y121
acc13 += x130*y130 + x131*y131
acc14 += x140*y140 + x141*y141
acc15 += x150*y150 + x151*y151
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 mac16 ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function using small X input buffer.

acc0 += x00*y00 + x01*y01
acc1 += x10*y10 + x11*y11
acc2 += x20*y20 + x21*y21
acc3 += x30*y30 + x31*y31
acc4 += x40*y40 + x41*y41
acc5 += x50*y50 + x51*y51
acc6 += x60*y60 + x61*y61
acc7 += x70*y70 + x71*y71
acc8 += x80*y80 + x81*y81
acc9 += x90*y90 + x91*y91
acc10 += x100*y100 + x101*y101
acc11 += x110*y110 + x111*y111
acc12 += x120*y120 + x121*y121
acc13 += x130*y130 + x131*y131
acc14 += x140*y140 + x141*y141
acc15 += x150*y150 + x151*y151
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 mac16 ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function using small X input buffer.

acc0 += x00*y00 + x01*y01
acc1 += x10*y10 + x11*y11
acc2 += x20*y20 + x21*y21
acc3 += x30*y30 + x31*y31
acc4 += x40*y40 + x41*y41
acc5 += x50*y50 + x51*y51
acc6 += x60*y60 + x61*y61
acc7 += x70*y70 + x71*y71
acc8 += x80*y80 + x81*y81
acc9 += x90*y90 + x91*y91
acc10 += x100*y100 + x101*y101
acc11 += x110*y110 + x111*y111
acc12 += x120*y120 + x121*y121
acc13 += x130*y130 + x131*y131
acc14 += x140*y140 + x141*y141
acc15 += x150*y150 + x151*y151
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mac8 ( v8acc48  acc,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function .

acc0 += x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 += x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 += x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 += x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 += x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 += x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 += x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 += x70*y70 + x71*y71 + x72*y72 + x73*y73
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
accIncoming accumulation vector (8 x int48 lanes)
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the xbuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mac8 ( v8acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function using small X input buffer.

acc0 += x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 += x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 += x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 += x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 += x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 += x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 += x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 += x70*y70 + x71*y71 + x72*y72 + x73*y73
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
accIncoming accumulation vector (8 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the xbuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mac8 ( v8acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-accumulate intrinsic function using small X input buffer.

acc0 += x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 += x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 += x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 += x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 += x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 += x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 += x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 += x70*y70 + x71*y71 + x72*y72 + x73*y73
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
accIncoming accumulation vector (8 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the ybuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 msc16 ( v16acc48  acc,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function .

acc0 -= x00*y00 + x01*y01
acc1 -= x10*y10 + x11*y11
acc2 -= x20*y20 + x21*y21
acc3 -= x30*y30 + x31*y31
acc4 -= x40*y40 + x41*y41
acc5 -= x50*y50 + x51*y51
acc6 -= x60*y60 + x61*y61
acc7 -= x70*y70 + x71*y71
acc8 -= x80*y80 + x81*y81
acc9 -= x90*y90 + x91*y91
acc10 -= x100*y100 + x101*y101
acc11 -= x110*y110 + x111*y111
acc12 -= x120*y120 + x121*y121
acc13 -= x130*y130 + x131*y131
acc14 -= x140*y140 + x141*y141
acc15 -= x150*y150 + x151*y151
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 msc16 ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function using small X input buffer.

acc0 -= x00*y00 + x01*y01
acc1 -= x10*y10 + x11*y11
acc2 -= x20*y20 + x21*y21
acc3 -= x30*y30 + x31*y31
acc4 -= x40*y40 + x41*y41
acc5 -= x50*y50 + x51*y51
acc6 -= x60*y60 + x61*y61
acc7 -= x70*y70 + x71*y71
acc8 -= x80*y80 + x81*y81
acc9 -= x90*y90 + x91*y91
acc10 -= x100*y100 + x101*y101
acc11 -= x110*y110 + x111*y111
acc12 -= x120*y120 + x121*y121
acc13 -= x130*y130 + x131*y131
acc14 -= x140*y140 + x141*y141
acc15 -= x150*y150 + x151*y151
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 msc16 ( v16acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function using small X input buffer.

acc0 -= x00*y00 + x01*y01
acc1 -= x10*y10 + x11*y11
acc2 -= x20*y20 + x21*y21
acc3 -= x30*y30 + x31*y31
acc4 -= x40*y40 + x41*y41
acc5 -= x50*y50 + x51*y51
acc6 -= x60*y60 + x61*y61
acc7 -= x70*y70 + x71*y71
acc8 -= x80*y80 + x81*y81
acc9 -= x90*y90 + x91*y91
acc10 -= x100*y100 + x101*y101
acc11 -= x110*y110 + x111*y111
acc12 -= x120*y120 + x121*y121
acc13 -= x130*y130 + x131*y131
acc14 -= x140*y140 + x141*y141
acc15 -= x150*y150 + x151*y151
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
accIncoming accumulation vector (16 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 msc8 ( v8acc48  acc,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function .

acc0 -= x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 -= x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 -= x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 -= x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 -= x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 -= x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 -= x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 -= x70*y70 + x71*y71 + x72*y72 + x73*y73
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
accIncoming accumulation vector (8 x int48 lanes)
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the xbuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 msc8 ( v8acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function using small X input buffer.

acc0 -= x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 -= x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 -= x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 -= x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 -= x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 -= x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 -= x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 -= x70*y70 + x71*y71 + x72*y72 + x73*y73
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
accIncoming accumulation vector (8 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the xbuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 msc8 ( v8acc48  acc,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-subtract intrinsic function using small X input buffer.

acc0 -= x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 -= x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 -= x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 -= x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 -= x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 -= x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 -= x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 -= x70*y70 + x71*y71 + x72*y72 + x73*y73
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
accIncoming accumulation vector (8 x int48 lanes)
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the ybuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 mul16 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply intrinsic function .

acc0 = x00*y00 + x01*y01
acc1 = x10*y10 + x11*y11
acc2 = x20*y20 + x21*y21
acc3 = x30*y30 + x31*y31
acc4 = x40*y40 + x41*y41
acc5 = x50*y50 + x51*y51
acc6 = x60*y60 + x61*y61
acc7 = x70*y70 + x71*y71
acc8 = x80*y80 + x81*y81
acc9 = x90*y90 + x91*y91
acc10 = x100*y100 + x101*y101
acc11 = x110*y110 + x111*y111
acc12 = x120*y120 + x121*y121
acc13 = x130*y130 + x131*y131
acc14 = x140*y140 + x141*y141
acc15 = x150*y150 + x151*y151
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 mul16 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply intrinsic function using small X input buffer.

acc0 = x00*y00 + x01*y01
acc1 = x10*y10 + x11*y11
acc2 = x20*y20 + x21*y21
acc3 = x30*y30 + x31*y31
acc4 = x40*y40 + x41*y41
acc5 = x50*y50 + x51*y51
acc6 = x60*y60 + x61*y61
acc7 = x70*y70 + x71*y71
acc8 = x80*y80 + x81*y81
acc9 = x90*y90 + x91*y91
acc10 = x100*y100 + x101*y101
acc11 = x110*y110 + x111*y111
acc12 = x120*y120 + x121*y121
acc13 = x130*y130 + x131*y131
acc14 = x140*y140 + x141*y141
acc15 = x150*y150 + x151*y151
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 mul16 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply intrinsic function using small X input buffer.

acc0 = x00*y00 + x01*y01
acc1 = x10*y10 + x11*y11
acc2 = x20*y20 + x21*y21
acc3 = x30*y30 + x31*y31
acc4 = x40*y40 + x41*y41
acc5 = x50*y50 + x51*y51
acc6 = x60*y60 + x61*y61
acc7 = x70*y70 + x71*y71
acc8 = x80*y80 + x81*y81
acc9 = x90*y90 + x91*y91
acc10 = x100*y100 + x101*y101
acc11 = x110*y110 + x111*y111
acc12 = x120*y120 + x121*y121
acc13 = x130*y130 + x131*y131
acc14 = x140*y140 + x141*y141
acc15 = x150*y150 + x151*y151
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mul8 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply intrinsic function .

acc0 = x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 = x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 = x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 = x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 = x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 = x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 = x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 = x70*y70 + x71*y71 + x72*y72 + x73*y73
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the xbuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mul8 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply intrinsic function using small X input buffer.

acc0 = x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 = x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 = x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 = x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 = x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 = x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 = x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 = x70*y70 + x71*y71 + x72*y72 + x73*y73
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the xbuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 mul8 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply intrinsic function using small X input buffer.

acc0 = x00*y00 + x01*y01 + x02*y02 + x03*y03
acc1 = x10*y10 + x11*y11 + x12*y12 + x13*y13
acc2 = x20*y20 + x21*y21 + x22*y22 + x23*y23
acc3 = x30*y30 + x31*y31 + x32*y32 + x33*y33
acc4 = x40*y40 + x41*y41 + x42*y42 + x43*y43
acc5 = x50*y50 + x51*y51 + x52*y52 + x53*y53
acc6 = x60*y60 + x61*y61 + x62*y62 + x63*y63
acc7 = x70*y70 + x71*y71 + x72*y72 + x73*y73
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the ybuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 negmul16 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-negate intrinsic function .

acc0 = -( x00*y00 + x01*y01 )
acc1 = -( x10*y10 + x11*y11 )
acc2 = -( x20*y20 + x21*y21 )
acc3 = -( x30*y30 + x31*y31 )
acc4 = -( x40*y40 + x41*y41 )
acc5 = -( x50*y50 + x51*y51 )
acc6 = -( x60*y60 + x61*y61 )
acc7 = -( x70*y70 + x71*y71 )
acc8 = -( x80*y80 + x81*y81 )
acc9 = -( x90*y90 + x91*y91 )
acc10 = -( x100*y100 + x101*y101 )
acc11 = -( x110*y110 + x111*y111 )
acc12 = -( x120*y120 + x121*y121 )
acc13 = -( x130*y130 + x131*y131 )
acc14 = -( x140*y140 + x141*y141 )
acc15 = -( x150*y150 + x151*y151 )
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 negmul16 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-negate intrinsic function using small X input buffer.

acc0 = -( x00*y00 + x01*y01 )
acc1 = -( x10*y10 + x11*y11 )
acc2 = -( x20*y20 + x21*y21 )
acc3 = -( x30*y30 + x31*y31 )
acc4 = -( x40*y40 + x41*y41 )
acc5 = -( x50*y50 + x51*y51 )
acc6 = -( x60*y60 + x61*y61 )
acc7 = -( x70*y70 + x71*y71 )
acc8 = -( x80*y80 + x81*y81 )
acc9 = -( x90*y90 + x91*y91 )
acc10 = -( x100*y100 + x101*y101 )
acc11 = -( x110*y110 + x111*y111 )
acc12 = -( x120*y120 + x121*y121 )
acc13 = -( x130*y130 + x131*y131 )
acc14 = -( x140*y140 + x141*y141 )
acc15 = -( x150*y150 + x151*y151 )
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v16acc48 negmul16 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Multiply-negate intrinsic function using small X input buffer.

acc0 = -( x00*y00 + x01*y01 )
acc1 = -( x10*y10 + x11*y11 )
acc2 = -( x20*y20 + x21*y21 )
acc3 = -( x30*y30 + x31*y31 )
acc4 = -( x40*y40 + x41*y41 )
acc5 = -( x50*y50 + x51*y51 )
acc6 = -( x60*y60 + x61*y61 )
acc7 = -( x70*y70 + x71*y71 )
acc8 = -( x80*y80 + x81*y81 )
acc9 = -( x90*y90 + x91*y91 )
acc10 = -( x100*y100 + x101*y101 )
acc11 = -( x110*y110 + x111*y111 )
acc12 = -( x120*y120 + x121*y121 )
acc13 = -( x130*y130 + x131*y131 )
acc14 = -( x140*y140 + x141*y141 )
acc15 = -( x150*y150 + x151*y151 )
Returns
Returned accumulation vector (16 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
xoffsets_hi4b offset for each lane. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane
yoffsets_hi4b offset for each lane. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 negmul8 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-negate intrinsic function .

acc0 = -( x00*y00 + x01*y01 + x02*y02 + x03*y03 )
acc1 = -( x10*y10 + x11*y11 + x12*y12 + x13*y13 )
acc2 = -( x20*y20 + x21*y21 + x22*y22 + x23*y23 )
acc3 = -( x30*y30 + x31*y31 + x32*y32 + x33*y33 )
acc4 = -( x40*y40 + x41*y41 + x42*y42 + x43*y43 )
acc5 = -( x50*y50 + x51*y51 + x52*y52 + x53*y53 )
acc6 = -( x60*y60 + x61*y61 + x62*y62 + x63*y63 )
acc7 = -( x70*y70 + x71*y71 + x72*y72 + x73*y73 )
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
xbuffInput buffer of 64 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the xbuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 negmul8 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-negate intrinsic function using small X input buffer.

acc0 = -( x00*y00 + x01*y01 + x02*y02 + x03*y03 )
acc1 = -( x10*y10 + x11*y11 + x12*y12 + x13*y13 )
acc2 = -( x20*y20 + x21*y21 + x22*y22 + x23*y23 )
acc3 = -( x30*y30 + x31*y31 + x32*y32 + x33*y33 )
acc4 = -( x40*y40 + x41*y41 + x42*y42 + x43*y43 )
acc5 = -( x50*y50 + x51*y51 + x52*y52 + x53*y53 )
acc6 = -( x60*y60 + x61*y61 + x62*y62 + x63*y63 )
acc7 = -( x70*y70 + x71*y71 + x72*y72 + x73*y73 )
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the xbuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.
v8acc48 negmul8 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
int  xstep,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
int  ystep,
unsigned int  ysquare 
)

Multiply-negate intrinsic function using small X input buffer.

acc0 = -( x00*y00 + x01*y01 + x02*y02 + x03*y03 )
acc1 = -( x10*y10 + x11*y11 + x12*y12 + x13*y13 )
acc2 = -( x20*y20 + x21*y21 + x22*y22 + x23*y23 )
acc3 = -( x30*y30 + x31*y31 + x32*y32 + x33*y33 )
acc4 = -( x40*y40 + x41*y41 + x42*y42 + x43*y43 )
acc5 = -( x50*y50 + x51*y51 + x52*y52 + x53*y53 )
acc6 = -( x60*y60 + x61*y61 + x62*y62 + x63*y63 )
acc7 = -( x70*y70 + x71*y71 + x72*y72 + x73*y73 )
Returns
Returned accumulation vector (8 x int48 lanes)
Parameters
xbuffInput buffer of 32 elements of type int16
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
xstepStep between each column for selection in the xbuffer
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
ybuffRight input buffer of 32 elements of type int16
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane in the xbuffer while each second lane is an offset to the lane before + 1. LSB apply to first lane
ystepStep between each column for selection in the ybuffer
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here.
  • For more information on how data selection works from the buffers go here.