AI Engine-ML Intrinsics User Guide
(v2023.2)
|
These intrinsics allow shifting full vectors. More...
These intrinsics allow shifting full vectors.
Special shift used to perform unaligned loads | |
v128int4 | shiftx (v128int4 a, v128int4 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v64int8 | shiftx (v64int8 a, v64int8 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v32int16 | shiftx (v32int16 a, v32int16 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v16int32 | shiftx (v16int32 a, v16int32 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v128uint4 | shiftx (v128uint4 a, v128uint4 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v64uint8 | shiftx (v64uint8 a, v64uint8 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v32uint16 | shiftx (v32uint16 a, v32uint16 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v16uint32 | shiftx (v16uint32 a, v16uint32 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v16cint16 | shiftx (v16cint16 a, v16cint16 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v8cint32 | shiftx (v8cint32 a, v8cint32 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v32bfloat16 | shiftx (v32bfloat16 a, v32bfloat16 b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v16accfloat | shiftx (v16accfloat a, v16accfloat b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v16float | shiftx (v16float a, v16float b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
v8cfloat | shiftx (v8cfloat a, v8cfloat b, int step, int shift) |
Shifts a by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows: More... | |
Lane-by-lane vector shift (in bytes) | |
v128int4 | shift_bytes (v128int4 a, v128int4 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v64int8 | shift_bytes (v64int8 a, v64int8 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v32int16 | shift_bytes (v32int16 a, v32int16 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v16int32 | shift_bytes (v16int32 a, v16int32 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v128uint4 | shift_bytes (v128uint4 a, v128uint4 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v64uint8 | shift_bytes (v64uint8 a, v64uint8 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v32uint16 | shift_bytes (v32uint16 a, v32uint16 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v16uint32 | shift_bytes (v16uint32 a, v16uint32 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v16cint16 | shift_bytes (v16cint16 a, v16cint16 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v8cint32 | shift_bytes (v8cint32 a, v8cint32 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v32bfloat16 | shift_bytes (v32bfloat16 a, v32bfloat16 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v16accfloat | shift_bytes (v16accfloat a, v16accfloat b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v16float | shift_bytes (v16float a, v16float b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
v8cfloat | shift_bytes (v8cfloat a, v8cfloat b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64]. More... | |
Lane-by-lane vector shift (in elems) | |
v64int8 | shift (v64int8 a, v64int8 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v32int16 | shift (v32int16 a, v32int16 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v16int32 | shift (v16int32 a, v16int32 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v64uint8 | shift (v64uint8 a, v64uint8 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v32uint16 | shift (v32uint16 a, v32uint16 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v16uint32 | shift (v16uint32 a, v16uint32 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v16cint16 | shift (v16cint16 a, v16cint16 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v8cint32 | shift (v8cint32 a, v8cint32 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v32bfloat16 | shift (v32bfloat16 a, v32bfloat16 b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v16accfloat | shift (v16accfloat a, v16accfloat b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v16float | shift (v16float a, v16float b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v8cfloat | shift (v8cfloat a, v8cfloat b, int shift) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64]. More... | |
v16accfloat shift | ( | v16accfloat | a, |
v16accfloat | b, | ||
int | shift | ||
) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
v32bfloat16 shift | ( | v32bfloat16 | a, |
v32bfloat16 | b, | ||
int | shift | ||
) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift*elem_size:shift*elem_size+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of elements to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
v16accfloat shift_bytes | ( | v16accfloat | a, |
v16accfloat | b, | ||
int | shift | ||
) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
v32bfloat16 shift_bytes | ( | v32bfloat16 | a, |
v32bfloat16 | b, | ||
int | shift | ||
) |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Concatenates a and b, interprets them as a vector of 128 bytes and returns a::b[shift:shift+64].
a | value to be concatenated |
b | value to be concatenated |
shift | number of bytes to be shifted |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
v16accfloat shiftx | ( | v16accfloat | a, |
v16accfloat | b, | ||
int | step, | ||
int | shift | ||
) |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
v32bfloat16 shiftx | ( | v32bfloat16 | a, |
v32bfloat16 | b, | ||
int | step, | ||
int | shift | ||
) |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |
Shifts a
by 2^step for step >= 1. Then it ORs the result with b right-shifted shift bytes. Each value except of shift are considered to be 64 elements of bytes. When step is zero, then it behaves like a regular shift. The pseudo-code can be described as follows:
a | value to be concatenated |
b | value to be concatenated |
step | amount of preshift on a (0: 0 bits, 1: 32 bits, 2: 64 bits, 3: 128 bits, 4: 256 bits) |
shift | number of bytes to shift b |