AI Engine Intrinsics User Guide  (v2023.2)
 All Data Structures Namespaces Functions Variables Typedefs Groups Pages
Integer

Overview

Advanced Integer Vector Lane Selection

Select: Selects between the first set of lanes or the second one according to the value in 'select'. If the lane corresponding bit in select is 0 it returns the value in the first set of lanes,otherwise, if it is 1, it returns the value in the second set of lanes.

Shuffle: Shuffle selects from a single input acording to the start/offset computation.

To have more information in lane selection please refer to here.

Functions

v16int32 select16 (unsigned int select, v32int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a selection between lanes of xbuff.
 
v16int32 select16 (unsigned int select, v16int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a selection between lanes of xbuff.
 
v16int32 select16 (unsigned int select, v16int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, v16int32 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a selection between lanes of xbuff and ybuff.
 
v16cint16 select16 (unsigned int select, v32cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a selection between lanes of xbuff.
 
v16cint16 select16 (unsigned int select, v16cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a selection between lanes of xbuff.
 
v16cint16 select16 (unsigned int select, v16cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, v16cint16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a selection between lanes of xbuff and ybuff.
 
v32int16 select32 (unsigned int select, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs a selection between lanes of xbuff.
 
v32int16 select32 (unsigned int select, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs a selection between lanes of xbuff.
 
v32int16 select32 (unsigned int select, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi, unsigned int ysquare)
 Performs a selection between lanes of xbuff and ybuff.
 
v8cint32 select8 (unsigned int select, v16cint32 xbuff, int xstart, unsigned int xoffsets, int ystart, unsigned int yoffsets)
 Performs a selection between lanes of xbuff.
 
v8cint32 select8 (unsigned int select, v8cint32 xbuff, int xstart, unsigned int xoffsets, int ystart, unsigned int yoffsets)
 Performs a selection between lanes of xbuff.
 
v8cint32 select8 (unsigned int select, v8cint32 xbuff, int xstart, unsigned int xoffsets, v8cint32 ybuff, int ystart, unsigned int yoffsets)
 Performs a selection between lanes of xbuff and ybuff.
 
v16int32 shuffle16 (v32int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi)
 Performs a shuffle between lanes of xbuff.
 
v16int32 shuffle16 (v16int32 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi)
 Performs a shuffle between lanes of xbuff.
 
v16cint16 shuffle16 (v32cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi)
 Performs a shuffle between lanes of xbuff.
 
v16cint16 shuffle16 (v16cint16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi)
 Performs a shuffle between lanes of xbuff.
 
v32int16 shuffle32 (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare)
 Performs a shuffle between lanes of xbuff.
 
v32int16 shuffle32 (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare)
 Performs a shuffle between lanes of xbuff.
 
v8cint32 shuffle8 (v16cint32 xbuff, int xstart, unsigned int xoffsets)
 Performs a shuffle between lanes of xbuff.
 
v8cint32 shuffle8 (v8cint32 xbuff, int xstart, unsigned int xoffsets)
 Performs a shuffle between lanes of xbuff.
 

Function Documentation

v16int32 select16 ( unsigned int  select,
v32int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a selection between lanes of xbuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = select(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 32 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects value computed from xbuff using 'xstart','xoffsets(_hi)' params and 1 selects value computed using 'ystart','yoffsets(_hi)' params.
v16int32 select16 ( unsigned int  select,
v16int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a selection between lanes of xbuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = select(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects value computed from xbuff using 'xstart','xoffsets(_hi)' params and 1 selects value computed using 'ystart','yoffsets(_hi)' params.
v16int32 select16 ( unsigned int  select,
v16int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
v16int32  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a selection between lanes of xbuff and ybuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = select(x[idx], y[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ybuffInput buffer of 16 elements with 32-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, applied to the ybuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the ybuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects from xbuff and 1 selects from ybuff.
v16cint16 select16 ( unsigned int  select,
v32cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a selection between lanes of xbuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = select(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex select instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets The same goes for the select parameter.
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects value computed from xbuff using 'xstart','xoffsets(_hi)' params and 1 selects value computed using 'ystart','yoffsets(_hi)' params.
v16cint16 select16 ( unsigned int  select,
v16cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a selection between lanes of xbuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = select(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 16 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex select instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets The same goes for the select parameter.
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects value computed from xbuff using 'xstart','xoffsets(_hi)' params and 1 selects value computed using 'ystart','yoffsets(_hi)' params.
v16cint16 select16 ( unsigned int  select,
v16cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
v16cint16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a selection between lanes of xbuff and ybuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = select(x[idx], y[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 16 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ybuffInput buffer of 16 elements with 16-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, applied to the ybuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the ybuffer. LSB apply to 8th lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex select instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets The same goes for the select parameter.
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects from xbuff and 1 selects from ybuff.
v32int16 select32 ( unsigned int  select,
v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs a selection between lanes of xbuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = select(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 64 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects value computed from xbuff using 'xstart','xoffsets(_hi)' params and 1 selects value computed using 'ystart','yoffsets(_hi)' params.
v32int16 select32 ( unsigned int  select,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs a selection between lanes of xbuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = select(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects value computed from xbuff using 'xstart','xoffsets(_hi)' params and 1 selects value computed using 'ystart','yoffsets(_hi)' params.
v32int16 select32 ( unsigned int  select,
v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare,
v32int16  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi,
unsigned int  ysquare 
)

Performs a selection between lanes of xbuff and ybuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
idy = f( ystart, yoffsets[i],ysquare);
o[i] = select(x[idx], y[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
ybuffInput buffer of 32 elements with 16-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, applied to the ybuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the ybuffer. LSB apply to 16th lane
ysquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects from xbuff and 1 selects from ybuff.
v8cint32 select8 ( unsigned int  select,
v16cint32  xbuff,
int  xstart,
unsigned int  xoffsets,
int  ystart,
unsigned int  yoffsets 
)

Performs a selection between lanes of xbuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = select(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the xbuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex select instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets The same goes for the select parameter.
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects value computed from xbuff using 'xstart','xoffsets(_hi)' params and 1 selects value computed using 'ystart','yoffsets(_hi)' params.
v8cint32 select8 ( unsigned int  select,
v8cint32  xbuff,
int  xstart,
unsigned int  xoffsets,
int  ystart,
unsigned int  yoffsets 
)

Performs a selection between lanes of xbuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = select(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 8 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the xbuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex select instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets The same goes for the select parameter.
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects value computed from xbuff using 'xstart','xoffsets(_hi)' params and 1 selects value computed using 'ystart','yoffsets(_hi)' params.
v8cint32 select8 ( unsigned int  select,
v8cint32  xbuff,
int  xstart,
unsigned int  xoffsets,
v8cint32  ybuff,
int  ystart,
unsigned int  yoffsets 
)

Performs a selection between lanes of xbuff and ybuff.

select(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = select(x[idx], y[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 8 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ybuffInput buffer of 8 elements with 32-bit precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the ybuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex select instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets The same goes for the select parameter.
  • For more information on how the function f() selects data from the buffers go here.
  • The LSB of 'select' starts at the lower lanes. Value of 0 selects from xbuff and 1 selects from ybuff.
v16int32 shuffle16 ( v32int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi 
)

Performs a shuffle between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v16int32 shuffle16 ( v16int32  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi 
)

Performs a shuffle between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v16cint16 shuffle16 ( v32cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi 
)

Performs a shuffle between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • When xoffsets is a runtime parameter, it might be more efficient to use a non-complex shuffle instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
v16cint16 shuffle16 ( v16cint16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi 
)

Performs a shuffle between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • When xoffsets is a runtime parameter, it might be more efficient to use a non-complex shuffle instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
v32int16 shuffle32 ( v64int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare 
)

Performs a shuffle between lanes of xbuff.

for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 64 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
v32int16 shuffle32 ( v32int16  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
unsigned int  xsquare 
)

Performs a shuffle between lanes of xbuff.

for (int i = 0; i < 32; i++)
idx = f( xstart, xoffsets[i],xsquare);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0],xsquare)
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7],xsquare)
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with 16-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 16th lane
xsquareSelect order of the mini-permute square (default=0x3210). LSB apply to first element. Value per lane needs to be less than 4. max value for this field is (0x3333)
Note
  • This intrinsic uses the 'square' parameter, to have more information on how to use this please go here
  • For more information on how the function f() selects data from the buffers go here.
v8cint32 shuffle8 ( v16cint32  xbuff,
int  xstart,
unsigned int  xoffsets 
)

Performs a shuffle between lanes of xbuff.

for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
Note
  • When xoffsets is a runtime parameter, it might be more efficient to use a non-complex shuffle instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
v8cint32 shuffle8 ( v8cint32  xbuff,
int  xstart,
unsigned int  xoffsets 
)

Performs a shuffle between lanes of xbuff.

for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 8 elements with 32-bit precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
Note
  • When xoffsets is a runtime parameter, it might be more efficient to use a non-complex shuffle instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.