EVX1: Cairo is a simple streaming video codec designed as a testbed for experimentation. It contains several common features of modern codecs (e.g. h.264), but has not been tuned for production. Instead, Cairo maintans a flexible implementation that makes it easy to modify and test new features or ideas.
Similar to most modern codecs, Cairo uses a block based encoding scheme to transform, quantize, and entropy encode the video data. Cairo supports self referencing intra frames as well as neighbor referencing inter frames (known as predicted frames in h.264 parlance).
The design of this codec borrows heavily from my previous codec, P.264, but with several notable improvements. The basic flow of P.264 and Cairo is depicted in the following image.
For a complete walkthrough of this diagram, check out my P.264 overview paper.
Cairo includes support for most P.264 features, plus the following improvements:
Flexible pixel formats
Cairo contains a full implementation of Imagine, so it supports a wide array of pixel formats including 128 bit HDR, 16 bit depth, volumetric 3D, and alpha formats. Imagine also makes it easy to define new formats that are easily supported within Cairo.
This codec was designed to serve as a base for experimentation. As a result, Cairo contains a number of facilities that make it easy to configure, tweak, and change the process of the pipeline. All of the compression efficiency improvements in Cairo were derived through over 30 experiments
Psychovisual Detail Retention
Using the human psychovisual system as a model, Cairo includes optimizations to aggressively reduce unseen structures, while preserving human visible detail.
Variance Adaptive Jayant Quantization
Cairo uses variance adaptive quantization to selectively lower the per-block data cost while preserving the overall visual quality of the image. Blocks with higher variance are quantized more heavily than those with low variance.
Advanced Motion Prediction
P.264 used a simple and efficient motion prediction model that searched for similar blocks within a local proximity. Cairo uses an updated motion prediction system that includes a wide area block search plus half and quarter pixel interpolation.
Context Adaptive Binary Arithmetic Coding
Cairo includes improvements to the CABAC module of P.264. For more information about CABAC, check out this blog post.
Chrominance Super Subsampling
Cairo uses 16 bpc YCbCr 4:2:0 as its intermediary image format throughout the pipeline. This format uses chrominance subsampling to reduce the information load of the chrominance channels. During the backend serialization phase of the codec, the chrominance may be further subsampled down an additional 25% in areas where the codec detects relatively consistent color patterns. This process further reduces the size of the overall image, without any noticable loss in quality.
Whenever possible, blocks are encoded as a releative change from some other previously observed block. P.264 was only capable of analyzing a local set of similar blocks and using them to perform simple differential coding on the DC coefficient. Cairo uses a more sophisticated process for differential coding that enables it to sample farther blocks and perform specific mathematical operations on block data to minimize data cost.
Periodic Intra-frame Refresh
First described by x264, this feature allows the codec to transmit intra frame data over a span of successive inter frames by delegating specific regions (usually colomns) of the frames as intra only. Rather than transmit a full intra frame followed by several inter frames, the codec will essentially amortize the intra frame over the inter ones. This improves streaming compatibility and rate control with network routers, alleviates the strain of resending or re-coordinating dropped frames, and enables seekability.
Adaptive Spatial Symbol Reordering
Cairo includes a pre-process to P.264's serialization backend that reorders symbols for more efficient entropy encoding. This applies to virtually every portion of the frame, including quantized coefficients, motion vectors, quantization matrices, and block descriptors.
In-loop Deblocking Filter
Cairo includes an in-loop deblocking filter that reduces the visual appearance of macroblock boundary artifacts. This filter adapts based on the type of block and surrounding pixel data to provide relatively artifact-free transitions over block boundaries without losing crisp detail and hard edges in the source.
It's notoriously difficult to compare video codecs, but for those interested, check out this blog post for a deeper discussion that roughly compares Cairo with H.264.
Codec Tool was heavily used during the development and testing of Cairo. For more information about Codec Tool, visit its project page.
The left video view in this screenshot shows the current source frame, which in this case is an h.264 encoded 1280x720 frame. The center image is togglable, and currently shows the state of Cairo's most recent prediction frame. The rightmost image is the Cairo output of the source frame.
In this example, the Cairo pipeline produced a re-encoded image with a PSNR of 39.33, a mean squared error of 7.58, and a data size of about 20 KB. Note that while our source is in a compressed format for this example, in practice only raw content formats are used for experimentation.
It's difficult to compare video codecs because the measurement of image quality is heavily influenced by our subjective perception, and it's difficult to create fair and unbiased test cases.
In my own experiments Cairo achieves roughly 100:1 compression of raw image data with a PSNR of at least 39.00, and is generally about 15% to 20% worse than H.264 main profile.
EVX is a collection of five evolutionary codecs that are used for experimentation and education. They are not designed for productization and lack many important features and optimizations. Nonetheless, they are an extremely useful tool for testing out new compression theories and formats. The following diagram illustrates the high level progression of each codec in the family.
virtual & augmented reality
3D video format support
large macroblock support
full rate YCbCr 4:4:4 mode
video analysis & debugging tool
Check out the source code for an early version of Cairo at this github repo! I eventually plan on also releasing the source code to the Codec Tool, but other priorities are blocking this work at the moment.
For more information about Cairo, check out my blog for articles related to this codec and general video compression.