AI Engine Intrinsics User Guide  (v2023.2)
 All Data Structures Namespaces Functions Variables Typedefs Groups Pages
Integer

Overview

Advanced Integer Vector Arithmetic Operations. To have more information in lane selection please refer to here.

Functions

v16int32 abs16 (v32int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi)
 Performs an absolute value computation between lanes of xbuff.
 
v16int32 abs16 (v16int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi)
 Performs an absolute value computation between lanes of xbuff.
 
v32int16 abs32 (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare)
 Performs an absolute value computation between lanes of xbuff.
 
v32int16 abs32 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare)
 Performs an absolute value computation between lanes of xbuff.
 
v16int32 add16 (v32int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs an addition between lanes of xbuff.
 
v16int32 add16 (v16int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs an addition between lanes of xbuff.
 
v16int32 add16 (v16int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, v16int32 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs an addition between lanes of xbuff and ybuff.
 
v16cint16 add16 (v32cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs an addition between lanes of xbuff.
 
v16cint16 add16 (v16cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs an addition between lanes of xbuff.
 
v16cint16 add16 (v16cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v16cint16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs an addition between lanes of xbuff and ybuff.
 
v32int16 add32 (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs an addition between lanes of xbuff.
 
v32int16 add32 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs an addition between lanes of xbuff.
 
v32int16 add32 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs an addition between lanes of xbuff and ybuff.
 
v8cint32 add8 (v16cint32 xbuff, int xstart, unsigned int xoffsets, int ystart, unsigned int yoffsets)
 Performs an addition between lanes of xbuff.
 
v8cint32 add8 (v8cint32 xbuff, int xstart, unsigned int xoffsets, int ystart, unsigned int yoffsets)
 Performs an addition between lanes of xbuff.
 
v8cint32 add8 (v8cint32 xbuff, int xstart, unsigned int xoffsets, v8cint32 ybuff, int ystart, unsigned int yoffsets)
 Performs an addition between lanes of xbuff and ybuff.
 
v16int32 sub16 (v32int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a subtraction between lanes of xbuff.
 
v16int32 sub16 (v16int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a subtraction between lanes of xbuff.
 
v16int32 sub16 (v16int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, v16int32 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a subtraction between lanes of xbuff and ybuff.
 
v16cint16 sub16 (v32cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs a subtraction between lanes of xbuff.
 
v16cint16 sub16 (v16cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs a subtraction between lanes of xbuff.
 
v16cint16 sub16 (v16cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v16cint16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs a subtraction between lanes of xbuff and ybuff.
 
v32int16 sub32 (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs a subtraction between lanes of xbuff.
 
v32int16 sub32 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs a subtraction between lanes of xbuff.
 
v32int16 sub32 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs a subtraction between lanes of xbuff and ybuff.
 
v8cint32 sub8 (v16cint32 xbuff, int xstart, unsigned int xoffsets, int ystart, unsigned int yoffsets)
 Performs a subtraction between lanes of xbuff.
 
v8cint32 sub8 (v8cint32 xbuff, int xstart, unsigned int xoffsets, int ystart, unsigned int yoffsets)
 Performs a subtraction between lanes of xbuff.
 
v8cint32 sub8 (v8cint32 xbuff, int xstart, unsigned int xoffsets, v8cint32 ybuff, int ystart, unsigned int yoffsets)
 Performs a subtraction between lanes of xbuff and ybuff.
 

Function Documentation

v16int32 abs16 ( v32int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi 
)

Performs an absolute value computation between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
o[i] = abs(x[idx])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an absolute value computation between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v16int32 abs16 ( v16int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi 
)

Performs an absolute value computation between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
o[i] = abs(x[idx])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an absolute value computation between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v32int16 abs32 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare 
)

Performs an absolute value computation between lanes of xbuff.

for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
o[i] = abs(x[idx])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an absolute value computation between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 64 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
v32int16 abs32 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare 
)

Performs an absolute value computation between lanes of xbuff.

for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
o[i] = abs(x[idx])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an absolute value computation between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
v16int32 add16 ( v32int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs an addition between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] + x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v16int32 add16 ( v16int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs an addition between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] + x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v16int32 add16 ( v16int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
v16int32  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs an addition between lanes of xbuff and ybuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] + y[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ybuffInput buffer of 16 elements with 32-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, applied to the ybuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the ybuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v16cint16 add16 ( v32cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs an addition between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] + x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex add instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
v16cint16 add16 ( v16cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs an addition between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] + x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex add instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
v16cint16 add16 ( v16cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v16cint16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs an addition between lanes of xbuff and ybuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] + y[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ybuffInput buffer of 16 elements with 16-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, applied to the ybuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the ybuffer. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex add instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
v32int16 add32 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs an addition between lanes of xbuff.

for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] + x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 64 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
v32int16 add32 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs an addition between lanes of xbuff.

for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] + x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
v32int16 add32 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs an addition between lanes of xbuff and ybuff.

for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] + y[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ybuffInput buffer of 32 elements with 16-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, applied to the ybuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the ybuffer. LSB apply to 16th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
v8cint32 add8 ( v16cint32  xbuff,
int  xstart,
unsigned int  xoffsets,
int  ystart,
unsigned int  yoffsets 
)

Performs an addition between lanes of xbuff.

for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] + x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the xbuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex add instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
v8cint32 add8 ( v8cint32  xbuff,
int  xstart,
unsigned int  xoffsets,
int  ystart,
unsigned int  yoffsets 
)

Performs an addition between lanes of xbuff.

for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] + x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 8 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the xbuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex add instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
v8cint32 add8 ( v8cint32  xbuff,
int  xstart,
unsigned int  xoffsets,
v8cint32  ybuff,
int  ystart,
unsigned int  yoffsets 
)

Performs an addition between lanes of xbuff and ybuff.

for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] + y[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of an addition between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 8 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ybuffInput buffer of 8 elements with 32-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the ybuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex add instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
v16int32 sub16 ( v32int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a subtraction between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] - x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff using 'xstart','xoffsets(_hi)' params is the left operand and data from xbuff using 'ystart','yoffsets(_hi)' params is the right operand.
v16int32 sub16 ( v16int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a subtraction between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] - x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff using 'xstart','xoffsets(_hi)' params is the left operand and data from xbuff using 'ystart','yoffsets(_hi)' params is the right operand.
v16int32 sub16 ( v16int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
v16int32  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a subtraction between lanes of xbuff and ybuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] - y[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ybuffInput buffer of 16 elements with 32-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, applied to the ybuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the ybuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff is the left operand and data from ybuff is the right operand.
v16cint16 sub16 ( v32cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs a subtraction between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] - x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex sub instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff using 'xstart','xoffsets(_hi)' params is the left operand and data from xbuff using 'ystart','yoffsets(_hi)' params is the right operand.
v16cint16 sub16 ( v16cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs a subtraction between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] - x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex sub instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff using 'xstart','xoffsets(_hi)' params is the left operand and data from xbuff using 'ystart','yoffsets(_hi)' params is the right operand.
v16cint16 sub16 ( v16cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v16cint16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs a subtraction between lanes of xbuff and ybuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] - y[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ybuffInput buffer of 16 elements with 16-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, applied to the ybuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the ybuffer. LSB apply to 8th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex sub instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff is the left operand and data from ybuff is the right operand.
v32int16 sub32 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs a subtraction between lanes of xbuff.

for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] - x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 64 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff using 'xstart','xoffsets(_hi)' params is the left operand and data from xbuff using 'ystart','yoffsets(_hi)' params is the right operand.
v32int16 sub32 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs a subtraction between lanes of xbuff.

for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] - x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff using 'xstart','xoffsets(_hi)' params is the left operand and data from xbuff using 'ystart','yoffsets(_hi)' params is the right operand.
v32int16 sub32 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs a subtraction between lanes of xbuff and ybuff.

for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = x[idx] - y[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ybuffInput buffer of 32 elements with 16-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, applied to the ybuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the ybuffer. LSB apply to 16th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff is the left operand and data from ybuff is the right operand.
v8cint32 sub8 ( v16cint32  xbuff,
int  xstart,
unsigned int  xoffsets,
int  ystart,
unsigned int  yoffsets 
)

Performs a subtraction between lanes of xbuff.

for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] - x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the xbuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex sub instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff using 'xstart','xoffsets(_hi)' params is the left operand and data from xbuff using 'ystart','yoffsets(_hi)' params is the right operand.
v8cint32 sub8 ( v8cint32  xbuff,
int  xstart,
unsigned int  xoffsets,
int  ystart,
unsigned int  yoffsets 
)

Performs a subtraction between lanes of xbuff.

for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] - x[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 8 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the xbuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex sub instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff using 'xstart','xoffsets(_hi)' params is the left operand and data from xbuff using 'ystart','yoffsets(_hi)' params is the right operand.
v8cint32 sub8 ( v8cint32  xbuff,
int  xstart,
unsigned int  xoffsets,
v8cint32  ybuff,
int  ystart,
unsigned int  yoffsets 
)

Performs a subtraction between lanes of xbuff and ybuff.

for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx] - y[idy]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a subtraction between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 8 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ybuffInput buffer of 8 elements with 32-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the ybuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex sub instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
  • Data from xbuff is the left operand and data from ybuff is the right operand.