AI Engine Intrinsics User Guide  (AIE) v(2024.1)
 All Data Structures Namespaces Functions Variables Typedefs Groups Pages
Datatype Conversions

Support for converting floating-point numbers to fixed-point and fixed-point numbers to floating-point. More...

Overview

Support for converting floating-point numbers to fixed-point and fixed-point numbers to floating-point.

These intrinsics can generate exceptions, for more information you can go here.

Note
These intrinsics do real conversions. No bit reinterpretation takes place.

Functions

float fix2float (int n, int sft)
 Fixed-point to Floating-point conversion with scaling.
 
float fix2float (int n)
 Fixed-point to Floating-point conversion without scaling.
 
int float2fix (float n, int sft)
 Floating-point to Fixed-point conversion with scaling.
 
int float2fix (float n)
 Floating-point to Fixed-point conversion without scaling.
 

Function Documentation

float fix2float ( int  n,
int  sft 
)

Fixed-point to Floating-point conversion with scaling.

Parameters
nInteger input value
sftBinary point of input value. Range [-32:31].

Example:

If the input value has 8 fractional bits, then 0.5 in fixed point will be expressed as 128. To convert this value to float use the following code, which stores 0.5 in floating-point variable b:

int   n = 128;
float b = fix2float(n,8);
float fix2float ( int  n)

Fixed-point to Floating-point conversion without scaling.

Parameters
nInteger input value
int float2fix ( float  n,
int  sft 
)

Floating-point to Fixed-point conversion with scaling.

Parameters
nFloating point input value
sftBinary point of output value. Range [-32:31].
Returns
round(n * pow(2.0, sft)); // round to nearest integer

Example:

The input value is 0.5 and the output shall be of format Q24.8 (meaning 8 fractional bits out of 128 bit integer output word). To convert this value to float use the following code, which stores 128 in fixed-point variable a:

float   n = 0.5;
int a = float2fix(n,8);
Note
To achieve better performance, consider using float2fix(v8float v, int sft) instead
int float2fix ( float  n)

Floating-point to Fixed-point conversion without scaling.

Parameters
nFloating point input value