Precisions

Introduction

Understanding which precision format to utilize for a calculation is crucial, but it can sometimes feel like a balancing act: The accuracy of double-precision computing seems to compete against the performance value of its single-precision counterpart. Both ensure accuracy and push the limitations of numerical values, but each technique offers a unique purpose and operational cost.

Here, we’ll take a close look at each format, how they differ from one other, and how mixing different levels of precision can help you maintain efficiency without forfeiting accuracy.

The Role of Precision in Computer Science

In order to understand the difference between single- and double-precision computing, it’s important to understand the role of precision in computer science. Imagine performing a calculation using an irrational number (such as pi) and including only two digits to the right of the decimal point (3.14). You would get a more accurate result if you were to do the calculation including ten digits to the right of the decimal point (3.1415926535).

For computers, this level of accuracy is called precision, and it’s measured in binary digits (bits) instead of decimal places. The more bits used, the higher the precision.

Man on computer in a dim room with a city view.

IEEE Standard Floating-Point Number Representation

Representing large numbers in computer binary requires a standard to ensure there aren’t huge discrepancies in calculations. Thus, the Institute of Electrical and Electronics Engineers (IEEE) developed the IEEE Standard for Floating-Point Arithmetic (IEEE 754).

There are three components of IEEE 754:

The base - 0 represents a positive number; 1 represents a negative number.
The biased exponent - The exponent is used to represent both positive and negative exponents. Thus, a bias must be added to the actual exponent to get the stored exponent.
The mantissa - Also known as the significand, the mantissa represents the precision bits of the number.

Using these components, IEEE 754 represents floating-point numbers in two ways: single-precision format and double-precision format. While there are still a variety of ways in which to represent floating-point numbers, IEEE 754 is the most common because it is generally the most efficient representation of numerical values.

IEEE Standard Floating-Point Number Representation

There are three components of IEEE 754:

The base - 0 represents a positive number; 1 represents a negative number.
The biased exponent - The exponent is used to represent both positive and negative exponents. Thus, a bias must be added to the actual exponent to get the stored exponent.
The mantissa - Also known as the significand, the mantissa represents the precision bits of the number.

What Is Single-Precision Floating-Point Format?

Single-precision floating-point format uses 32 bits of computer memory and can represent a wide range of numerical values. Often referred to as FP32, this format is best used for calculations that won’t suffer from a bit of approximation.

What Is Double-Precision Floating-Point Format?

Double-precision floating-point format, on the other hand, occupies 64 bits of computer memory and is far more accurate than the single-precision format. This format is often referred to as FP64 and used to represent values that require a larger range or a more precise calculation.

Although double precision allows for more accuracy, it also requires more computational resources, memory storage, and data transfer. The cost of using this format doesn’t always make sense for every calculation.

The Difference Between Single and Double Precision

The simplest way to distinguish between single- and double-precision computing is to look at how many bits represent the floating-point number. For single precision, 32 bits are used to represent the floating-point number. For double precision, 64 bits are used to represent the floating-point number.

Take Euler’s number (e), for example. Here are the first 50 decimal digits of e: 2.7182818284590452353602874713526624977572470936999.

Here’s Euler’s number in binary, converted to single precision:
01000000001011011111100001010100

Here’s Euler’s number in binary, converted to double precision:
010000000000010110111111 0000101010001011000101000101011101101001

The first number represents the base. The next set of numbers (eight for single precision and eleven for double precision) represents the biased exponent. The final set of numbers (23 for single precision and 52 for double precision) represents the mantissa.

Comparison Chart: Single Precision vs Double Precision

	Single-Precision	Double-Precision
Overview	Uses 32 bits of memory to represent a numerical value, with one of the bits representing the sign of mantissa	Uses 64 bits of memory to represent a numerical value, with one of the bits representing the sign of mantissa
Biased exponent	8 bits used for exponent	11 bits used for exponent
Mantissa	Uses 23 bits for mantissa (to represent fractional part)	Uses 52 bits for mantissa (to represent fractional part)
Real-World Application	Often used for games or any program that requires wider representation without a high level of precision	Often used for scientific calculations and complex programs that require a high level of precision

Multi-Precision vs Mixed-Precision Computing

In addition to single- and double-precision computing, which are considered multi-precision, there is also mixed-precision computing.

Mixed-precision computing, sometimes called transprecision, is commonly used in the field of machine learning. It performs calculations by starting with half-precision (16 bit) values for rapid matrix math. Then, as the numbers are computed, they’re stored by the machine at a higher precision.

The advantage of mixed-precision computing is that it offers accumulated answers that are similar in accuracy to those run in double-precision computing—without requiring the same level of power, runtime, and memory.

Benefits of Mixing Different Levels of Precision

Different workloads require levels of precision, as running calculations isn’t a one-size-fits-all practice. Computer scientists need a variety of formats for computation based on available resources, budget, storage, and other variables.

For example, because it’s incredibly accurate, double precision might be best for some big data research or weather modeling. But the storage and resources required for those calculations don’t always justify its use. Developers can optimize efficiency and computational spend by mixing different precision levels, as needed.

Optimize Your Computational Efficiency

While accuracy in computing is certainly essential, it’s important to understand how you can benefit from using a variety of precision levels. To ensure operational efficiency without foregoing precise calculations, you need flexible capabilities that support different floating-point formats.

AMD Vivado™ Design Suite

AMD Vivado™ ML and System Generator for DSP, by AMD, both offer robust tools that support various floating-point precisions, whether multi-precision or mixed precision. This industry-leading tool suite also provides the flexibility of custom precision needed to accelerate design, increase productivity, and enable efficient use of resources. Learn more about how AMD Vivado can boost your computational efficiency.

Discover Vivado

Stay Connected

Contact Sales

Data Center

Business Systems

Personal & Gaming

Embedded

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Gaming

Systems

Technologies

Resources

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators

Adaptive SoCs, FPGAs, & SOMs

Graphics

Overview

Resources by Market Segment

Resources by Product

Resources by Type

About Our Partners

AMD Global Support

Processors & Graphics

Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

Your cart is empty

Single, Double, Multi, and Mixed-Precision Computing

Introduction

The Role of Precision in Computer Science

IEEE Standard Floating-Point Number Representation

IEEE Standard Floating-Point Number Representation

What Is Single-Precision Floating-Point Format?

What Is Double-Precision Floating-Point Format?

The Difference Between Single and Double Precision

Comparison Chart: Single Precision vs Double Precision

Multi-Precision vs Mixed-Precision Computing

Benefits of Mixing Different Levels of Precision

Optimize Your Computational Efficiency

AMD Vivado™ Design Suite

Stay Connected

Company

News & Events

Community

Partners

Investors