Bertolami.com

IEEE 754 16-bit Float Format
Personal Project
Completed 2009

IEEE 754 16-bit Float Format

Personal Project
Completed 2009

This is a simple 16 bit floating point storage interface. It is intended to serve as a learning aid for students, and is not in an optimized form.

I created this during development of my Vision 3D engine in order to reduce the file size of terrain data that consisted of 3D position, normal, texture coordinate, and color information. This format reduced terrain file sizes by 50% and enabled Vision to take advantage of support for this half format on NVIDIA and AMD GPUs.

This format is also used in Imagine to support 16 bit float image formats.

16-bit Format

Converting between 32 and 16 bit floats is a simple process. We simply convert the bit pattern of one format into the corresponding (most appropriate) bit pattern in the other. The following table compares the layouts of each format:

	32-bit Float	16-bit Float
Sign	1 bit	1 bit
Exponent	8 bits	5 bits
Mantissa	23 bits	10 bits
Bias	127	15

Additionally, special care must be taken to account for the following special values in both formats. In the listing below, (s) represents the sign bit, (e) represents the exponent bits, and (m) represents the mantissa bits.

   +-Zero:              s,  0e, 0m
   +-Denormalized:      s,  0e, (1 -> max)m
   +-Normalized:        s, (1 -> [max-1])e, m
   +-Infinity:          s, (all 1)e, (all 0s)m
   +-SNaN:              s, (all 1)e, (1 -> [max-high_bit])m
   +-QNaN:              s, (all 1)e, (high_bit -> all 1s)m

Source Code

IEEE Float 16 Source Code (Github)

IEEE 754 16-bit Float Format

IEEE 754 16-bit Float Format

16-bit Format

Source Code