AI Engine Intrinsics User Guide
(AIE) r2p22
|
Vector MAC combined with vector comparisons with 16 bit real by 16 bit real
Functions | |
v16acc48 | mac16_abs (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the absolute value in the selected lanes from the input buffer . More... | |
v16acc48 | mac16_abs (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mac16_max (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer . More... | |
v16acc48 | mac16_max (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mac16_max (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mac16_maxdiff (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer . More... | |
v16acc48 | mac16_maxdiff (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mac16_maxdiff (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mac16_min (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer . More... | |
v16acc48 | mac16_min (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mac16_min (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mac8_abs (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the absolute value in the selected lanes from the input buffer . More... | |
v8acc48 | mac8_abs (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mac8_max (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer . More... | |
v8acc48 | mac8_max (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mac8_max (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mac8_maxdiff (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer . More... | |
v8acc48 | mac8_maxdiff (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mac8_maxdiff (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mac8_min (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer . More... | |
v8acc48 | mac8_min (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mac8_min (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | msc16_abs (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the absolute value in the selected lanes from the input buffer . More... | |
v16acc48 | msc16_abs (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | msc16_max (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer . More... | |
v16acc48 | msc16_max (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | msc16_max (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | msc16_maxdiff (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer . More... | |
v16acc48 | msc16_maxdiff (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | msc16_maxdiff (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | msc16_min (v16acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer . More... | |
v16acc48 | msc16_min (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | msc16_min (v16acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | msc8_abs (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the absolute value in the selected lanes from the input buffer . More... | |
v8acc48 | msc8_abs (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | msc8_max (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer . More... | |
v8acc48 | msc8_max (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | msc8_max (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | msc8_maxdiff (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer . More... | |
v8acc48 | msc8_maxdiff (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | msc8_maxdiff (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | msc8_min (v8acc48 acc, v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer . More... | |
v8acc48 | msc8_min (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | msc8_min (v8acc48 acc, v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mul16_abs (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the absolute value in the selected lanes from the input buffer . More... | |
v16acc48 | mul16_abs (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mul16_max (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer . More... | |
v16acc48 | mul16_max (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mul16_max (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mul16_maxdiff (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer . More... | |
v16acc48 | mul16_maxdiff (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mul16_maxdiff (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mul16_min (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer . More... | |
v16acc48 | mul16_min (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | mul16_min (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mul8_abs (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the absolute value in the selected lanes from the input buffer . More... | |
v8acc48 | mul8_abs (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mul8_max (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer . More... | |
v8acc48 | mul8_max (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mul8_max (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mul8_maxdiff (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer . More... | |
v8acc48 | mul8_maxdiff (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mul8_maxdiff (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mul8_min (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer . More... | |
v8acc48 | mul8_min (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | mul8_min (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | negmul16_abs (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the absolute value in the selected lanes from the input buffer . More... | |
v16acc48 | negmul16_abs (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | negmul16_max (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer . More... | |
v16acc48 | negmul16_max (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | negmul16_max (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | negmul16_maxdiff (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer . More... | |
v16acc48 | negmul16_maxdiff (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | negmul16_maxdiff (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | negmul16_min (v64int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer . More... | |
v16acc48 | negmul16_min (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 | negmul16_min (v32int16 xbuff, int xstart, unsigned int xoffsets, unsigned int xoffsets_hi, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, unsigned int zoffsets_hi, int zstep) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | negmul8_abs (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the absolute value in the selected lanes from the input buffer . More... | |
v8acc48 | negmul8_abs (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | negmul8_max (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer . More... | |
v8acc48 | negmul8_max (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | negmul8_max (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | negmul8_maxdiff (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer . More... | |
v8acc48 | negmul8_maxdiff (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | negmul8_maxdiff (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | negmul8_min (v64int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer . More... | |
v8acc48 | negmul8_min (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v8acc48 | negmul8_min (v32int16 xbuff, int xstart, unsigned int xoffsets, int xstep, unsigned int xsquare, v32int16 ybuff, int ystart, unsigned int ysquare, v16int16 zbuff, int zstart, unsigned int zoffsets, int zstep) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer. More... | |
v16acc48 mac16_abs | ( | v16acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the absolute value in the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mac16_abs | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mac16_max | ( | v16acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mac16_max | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mac16_max | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mac16_maxdiff | ( | v16acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mac16_maxdiff | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mac16_maxdiff | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mac16_min | ( | v16acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mac16_min | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mac16_min | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_abs | ( | v8acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the absolute value in the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to the x buffer. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the xbuffer. xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_abs | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to the x buffer. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the xbuffer. xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_max | ( | v8acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_max | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_max | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_maxdiff | ( | v8acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_maxdiff | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_maxdiff | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_min | ( | v8acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_min | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mac8_min | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-accumulate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_abs | ( | v16acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the absolute value in the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_abs | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_max | ( | v16acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_max | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_max | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_maxdiff | ( | v16acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_maxdiff | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_maxdiff | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_min | ( | v16acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_min | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 msc16_min | ( | v16acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
acc | v16acc48 | Incoming accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_abs | ( | v8acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the absolute value in the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to the x buffer. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the xbuffer. xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_abs | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to the x buffer. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the xbuffer. xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_max | ( | v8acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_max | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_max | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_maxdiff | ( | v8acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_maxdiff | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_maxdiff | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_min | ( | v8acc48 | acc, |
v64int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_min | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 msc8_min | ( | v8acc48 | acc, |
v32int16 | xbuff, | ||
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-subtract intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
acc | v8acc48 | Incoming accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_abs | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the absolute value in the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_abs | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_max | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_max | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_max | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_maxdiff | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_maxdiff | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_maxdiff | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_min | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_min | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 mul16_min | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_abs | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the absolute value in the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to the x buffer. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the xbuffer. xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_abs | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to the x buffer. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the xbuffer. xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_max | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_max | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_max | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_maxdiff | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_maxdiff | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_maxdiff | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_min | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_min | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 mul8_min | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_abs | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the absolute value in the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_abs | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_max | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_max | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_max | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_maxdiff | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_maxdiff | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_maxdiff | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_min | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_min | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v16acc48 negmul16_min | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
unsigned int | xoffsets_hi, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
unsigned int | zoffsets_hi, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v16acc48 | Returned accumulation vector (16 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to first lane |
xoffsets_hi | unsigned int | 4b offset for each lane, corresponds to 2x the lane number and each second lane is an offset to the lane before + 1. LSB apply to 8th lane |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zoffsets_hi | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to 8th lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_abs | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the absolute value in the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to the x buffer. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the xbuffer. xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_abs | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the absolute value in the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to the x buffer. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the xbuffer. xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_max | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_max | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_max | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_maxdiff | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_maxdiff | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_maxdiff | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the maximum difference between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_min | ( | v64int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer .
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v64int16 | Input buffer of 64 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_min | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |
v8acc48 negmul8_min | ( | v32int16 | xbuff, |
int | xstart, | ||
unsigned int | xoffsets, | ||
int | xstep, | ||
unsigned int | xsquare, | ||
v32int16 | ybuff, | ||
int | ystart, | ||
unsigned int | ysquare, | ||
v16int16 | zbuff, | ||
int | zstart, | ||
unsigned int | zoffsets, | ||
int | zstep | ||
) |
Multiply-negate intrinsic function after computing the minimum between the selected lanes from the input buffer using small X input buffer.
Input/Output | Type | Comments |
---|---|---|
return | v8acc48 | Returned accumulation vector (8 x int48 lanes) |
xbuff | v32int16 | Input buffer of 32 elements of type int16 |
xstart | int | Starting position offset applied to all lanes of input from X buffer. xstart is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xoffsets | unsigned int | 4b offset for each lane, while each second lane is an offset to the lane before + 1, applied to both x and y buffers. LSB apply to first lane |
xstep | unsigned int | Step between each column for selection in the x and y buffers. Ystep is symmetric to xstep (ystep advances by -xstep). xstep is restricted to multiples of 2 as granularity for xbuff is 32-bit. |
xsquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
ybuff | v32int16 | Right input buffer of 32 elements of type int16 |
ystart | int | Starting position offset applied to all lanes for input from Y buffer. ystart is restricted to multiples of 2 as granularity for ybuff is 32-bit. |
ysquare | unsigned int | Select order of the mini-permute square (default=0x3210). LSB apply to first element |
zbuff | v16int16 | Input buffer of 16 elements of type int16 |
zstart | int | Starting position offset applied to all lanes for input from Z buffer. This must be a compile time constant. Only the 4 LSB of the argument are used. |
zoffsets | unsigned int | 4b offset for each lane, applied to input from Z buffer. LSB apply to first lane |
zstep | int | Step between each column for selection in the zbuffer. |