This is a simple 16 bit floating point storage interface. It is intended to serve as a learning aid for students, and is not in an optimized form.
I created this during development of my Vision 3D engine in order to reduce the file size of terrain data that consisted of 3D position, normal, texture coordinate, and color information. This format reduced terrain file sizes by 50% and enabled Vision to take advantage of support for this half format on NVIDIA and AMD GPUs.
This format is also used in Imagine to support 16 bit float image formats.
Converting between 32 and 16 bit floats is a simple process. We simply convert the bit pattern of one format into the corresponding (most appropriate) bit pattern in the other. The following table compares the layouts of each format:
32-bit Float | 16-bit Float | |
Sign | 1 bit | 1 bit |
Exponent | 8 bits | 5 bits |
Mantissa | 23 bits | 10 bits |
Bias | 127 | 15 |
Additionally, special care must be taken to account for the following special values in both formats. In the listing below, (s) represents the sign bit, (e) represents the exponent bits, and (m) represents the mantissa bits.
+-Zero: s, 0e, 0m +-Denormalized: s, 0e, (1 -> max)m +-Normalized: s, (1 -> [max-1])e, m +-Infinity: s, (all 1)e, (all 0s)m +-SNaN: s, (all 1)e, (1 -> [max-high_bit])m +-QNaN: s, (all 1)e, (high_bit -> all 1s)m
IEEE Float 16 Source Code (Github)