• IEEE 754 16-bit Float Format

  • Completed 2009

This is a simple 16 bit floating point storage interface. It is intended to serve as a learning aid for students, and is not in an optimized form.

I created this during development of my Vision 3D engine in order to reduce the file size of terrain data that consisted of 3D position, normal, texture coordinate, and color information. This format reduced terrain file sizes by 50% and enabled Vision to take advantage of support for this half format on NVIDIA and AMD GPUs.

This format is also used in Imagine to support 16 bit float image formats.

16-bit Format

Converting between 32 and 16 bit floats is a simple process. We simply convert the bit pattern of one format into the corresponding (most appropriate) bit pattern in the other. The following table compares the layouts of each format:

 32-bit Float16-bit Float
Sign1 bit1 bit
Exponent8 bits5 bits
Mantissa23 bits10 bits

Additionally, special care must be taken to account for the following special values in both formats. In the listing below, (s) represents the sign bit, (e) represents the exponent bits, and (m) represents the mantissa bits.

   +-Zero:              s,  0e, 0m
   +-Denormalized:      s,  0e, (1 -> max)m
   +-Normalized:        s, (1 -> [max-1])e, m
   +-Infinity:          s, (all 1)e, (all 0s)m
   +-SNaN:              s, (all 1)e, (1 -> [max-high_bit])m
   +-QNaN:              s, (all 1)e, (high_bit -> all 1s)m 

Source Code

  IEEE Float 16 Source Code (Github)