AI Engine Intrinsics User Guide  (v2023.2)
 All Data Structures Namespaces Functions Variables Typedefs Groups Pages

Overview

Advanced Floating-point Vector Lane Selection

Select: Selects between the first set of lanes or the second one according to the value in 'select'. If the lane corresponding bit in select is 0 it returns the value in the first set of lanes,otherwise, if it is 1, it returns the value in the second set of lanes.

Shuffle: Shuffle selects from a single input acording to the start/offset computation.

Note
fpsel behaves as a "Shuffle" intrinsic.

To have more information in lane selection please refer to here.

Functions

v16float fpselect16 (unsigned int select, v32float xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a floating point selection between lanes of xbuff.
 
v16float fpselect16 (unsigned int select, v16float xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a floating point selection between lanes of xbuff.
 
v16float fpselect16 (unsigned int select, v16float xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, v16float ybuff, int ystart, unsigned int yoffsets, unsigned int yoffsets_hi)
 Performs a floating point selection between lanes of xbuff and ybuff.
 
v8cfloat fpselect8 (unsigned int select, v16cfloat xbuff, int xstart, unsigned int xoffsets, int ystart, unsigned int yoffsets)
 Performs a floating point selection between lanes of xbuff.
 
v8cfloat fpselect8 (unsigned int select, v8cfloat xbuff, int xstart, unsigned int xoffsets, int ystart, unsigned int yoffsets)
 Performs a floating point selection between lanes of xbuff.
 
v8cfloat fpselect8 (unsigned int select, v8cfloat xbuff, int xstart, unsigned int xoffsets, v8cfloat ybuff, int ystart, unsigned int yoffsets)
 Performs a floating point selection between lanes of xbuff and ybuff.
 
v16float fpshuffle16 (v32float xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi)
 Performs a floating point shuffle between lanes of xbuff.
 
v16float fpshuffle16 (v16float xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi)
 Performs a floating point shuffle between lanes of xbuff.
 
v8cfloat fpshuffle8 (v16cfloat xbuff, int xstart, unsigned int xoffsets)
 Performs a floating point shuffle between lanes of xbuff.
 
v8cfloat fpshuffle8 (v8cfloat xbuff, int xstart, unsigned int xoffsets)
 Performs a floating point shuffle between lanes of xbuff.
 

Function Documentation

v16float fpselect16 ( unsigned int  select,
v32float  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a floating point selection between lanes of xbuff.

fpselect(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = fpselect(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a floating point selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 32 elements with single precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v16float fpselect16 ( unsigned int  select,
v16float  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a floating point selection between lanes of xbuff.

fpselect(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = fpselect(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a floating point selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 16 elements with single precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v16float fpselect16 ( unsigned int  select,
v16float  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi,
v16float  ybuff,
int  ystart,
unsigned int  yoffsets,
unsigned int  yoffsets_hi 
)

Performs a floating point selection between lanes of xbuff and ybuff.

fpselect(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = fpselect(x[idx], y[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a floating point selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 16 elements with single precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
ybuffInput buffer of 16 elements with single precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets4b offset for each lane, applied to the ybuffer. LSB apply to first lane
yoffsets_hi4b offset for each lane, applied to the ybuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v8cfloat fpselect8 ( unsigned int  select,
v16cfloat  xbuff,
int  xstart,
unsigned int  xoffsets,
int  ystart,
unsigned int  yoffsets 
)

Performs a floating point selection between lanes of xbuff.

fpselect(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = fpselect(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a floating point selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 16 elements with single precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the xbuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex fpselect instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets The same goes for the select parameter.
  • For more information on how the function f() selects data from the buffers go here.
v8cfloat fpselect8 ( unsigned int  select,
v8cfloat  xbuff,
int  xstart,
unsigned int  xoffsets,
int  ystart,
unsigned int  yoffsets 
)

Performs a floating point selection between lanes of xbuff.

fpselect(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = fpselect(x[idx], x[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a floating point selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 8 elements with single precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ystartStarting position offset applied to all lanes of input from xbuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the xbuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex fpselect instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets The same goes for the select parameter.
  • For more information on how the function f() selects data from the buffers go here.
v8cfloat fpselect8 ( unsigned int  select,
v8cfloat  xbuff,
int  xstart,
unsigned int  xoffsets,
v8cfloat  ybuff,
int  ystart,
unsigned int  yoffsets 
)

Performs a floating point selection between lanes of xbuff and ybuff.

fpselect(a, b, s)
{
if (s)
return b;
else
return a;
}
for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = fpselect(x[idx], y[idy], select[i])
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a floating point selection between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
selectValue of each bit selects from the value to be placed in the corresponding vector position
xbuffInput buffer of 8 elements with single precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
ybuffInput buffer of 8 elements with single precision
ystartStarting position offset applied to all lanes of input from ybuffer for the second input
yoffsets3b (aligned to 4b) offset for each lane in the ybuffer for the second input. LSB apply to first lane
Note
  • When xoffsets or yoffsets is a runtime parameter, it might be more efficient to use a non-complex fpselect instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets The same goes for the select parameter.
  • For more information on how the function f() selects data from the buffers go here.
v16float fpshuffle16 ( v32float  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi 
)

Performs a floating point shuffle between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a floating point shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 32 elements with single precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v16float fpshuffle16 ( v16float  xbuff,
int  xstart,
unsigned int  xoffsets,
unsigned int  xoffsets_hi 
)

Performs a floating point shuffle between lanes of xbuff.

for (int i = 0; i < 16; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a floating point shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with single precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets4b offset for each lane, applied to the xbuffer. LSB apply to first lane
xoffsets_hi4b offset for each lane, applied to the xbuffer. LSB apply to 8th lane
Note
  • For more information on how the function f() selects data from the buffers go here.
v8cfloat fpshuffle8 ( v16cfloat  xbuff,
int  xstart,
unsigned int  xoffsets 
)

Performs a floating point shuffle between lanes of xbuff.

for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a floating point shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 16 elements with single precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
Note
  • When xoffsets is a runtime parameter, it might be more efficient to use a non-complex fpshuffle instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.
v8cfloat fpshuffle8 ( v8cfloat  xbuff,
int  xstart,
unsigned int  xoffsets 
)

Performs a floating point shuffle between lanes of xbuff.

for (int i = 0; i < 8; i++)
idx = f( xstart, xoffsets[i]);
idy = f( ystart, yoffsets[i]);
o[i] = x[idx]
xoffsets, xoffsets_hi, yoffsets, yoffsets_hi have 8 offset values each. 4 bits per offset.
For Example: for v16int32 output type, idx for output_lane_0 = f(xstart,xoffsets[0])
For Example: for v16int32 output type, idx for output_lane_15 = f(xstart,xoffsets_hi[7])
In case of v32int16, 1 offset is used for 2 adjacent lanes.
For more information on how the function f() selects data from the buffers refer to Lane selection note below.
Returns
Value of each lane is the result of a floating point shuffle between lanes of xbuff where the result of lane 0 goes to lane 0 of the output.
Parameters
xbuffInput buffer of 8 elements with single precision
xstartStarting position offset applied to all lanes of input from X buffer
xoffsets3b (aligned to 4b) offset for each lane, applied to the xbuffer. LSB apply to first lane
Note
  • When xoffsets is a runtime parameter, it might be more efficient to use a non-complex fpshuffle instuction and calculate the offsets accordingly. Therefore both, real and imaginary (real+1) lane must be considered in the offsets
  • For more information on how the function f() selects data from the buffers go here.