Saturday, June 27th, 2015
Cairo is a simple video codec that I developed in early 2011 during the later stages of everyAir.
The purpose of this codec was to experiment with compression features and learn about their individual
impact on efficiency while developing a simple framework for future work.
Video codecs contain a lot of features such as subpixel motion estimation, differential coding, quantization, deblocking, rate control, entropy coding, and many more. Each feature must be adaptively configured to perform well despite abrupt changes in the source video. Balancing these features with overall frame quality and size can be a daunting task, and codecs that do it well will outperform those that don't.
It's been over four years since Cairo was completed and I thought it might be interesting to dust it off and visualize the benefits of some of its features. Specifically, I was interested in comparing the effects of its generic motion compensation, sub-pixel motion compensation, differential coding, and quality controlled quantization.
For more general information about Cairo, check out its project page.
For this test, I used the first 1,000 frames of the Big Buck Bunny trailer, sampled to a resolution of 640x368. I chose this video because it contains several interesting characteristics including abrupt transitions, cross fades, slow camera pans, and vibrant colors. Plus, I've used this video for other tests in the past. View both the full film and the trailer here.
Cairo uses motion compensation to significantly reduce the size of the frame by taking advantage of the spatial and temporal locality
of block data. We can examine the usefulness of this feature by measuring the results of our test video with and without it enabled.
The screenshot above demonstrates the involvement of motion compensation on a particular frame within Codec Tool. The middle image shows the motion type for each block in the frame. Red blocks are intra encoded, green are skip, blue are motion predicted, and teal are motion-skip. We can see that the majority of our blocks involve some degree of motion compensation, which suggests that motion compensation plays an important role in the Cairo pipeline.
This graph identifies an area for improvement between frames 80 and 220. For these frames, Cairo enabled motion compensation when it would have been more effective to disable it. The most common cause for this type of discrepancy in Cairo is a distant motion match. Cairo does not score motion matches by their distance and as a result, it cannot make decisions about whether a far motion block will ultimately encode better than a closer but perhaps less similar block.
Although not shown here, the PSNR for each test pass was roughly the same.
Generic motion prediction compares blocks within a local vicinity to find similarities within the video. If this
process is performed at the pixel level, then it may not adequately detect similarities caused by slow moving or distant objects.
To compensate for this, Cairo performs sub-pixel motion prediction that detects movement at half or quarter pixel increments.
This feature has the biggest impact during slow camera zooms or pans, where objects are moving at subpixel intervals per frame. As a result, spikes in this graph that correspond to abrupt scene changes are relatively unaffected by sub-pixel precision, but the valleys of the graph that correspond to slow title zooms and camera pans see significant improvement.
Another significant improvement in Cairo is its adaptive spatial reordering. Using this feature, Cairo detects whether
rearrangements of the block data would enable it to encode more efficiently, and applies the necessary transform if so.
As a result, the process may significantly alter the order and contents of the block data, but it will never reduce its information content.
This feature enables Cairo to submit block data to the entropy encoder in an optimal order, which results in a significant improvement in overall coding efficiency.
Next we examine the effect of Cairo's quality setting on the overall frame PSNR.
Cairo quality levels range from 0 to 100, with higher levels resulting in higher visual quality, but lower compression ratios. Quality level 20 is
generally a fair default, quality 0 is designed to be used for low bitrate or suffering network connections,
and quality 100 enables Cairo's near-lossless encoding mode.
The quality level of the encoder controls several important factors including the quantization range and the arithmetic precision of the pipeline. Cairo uses an adaptive quantizer that will increase or decrease the quantization factor based on the source data, but these values are centered about the global quality level.
Our graph above shows the PSNR at various quality levels across the 1,000 frames of our trailer video. Notice that quality and PSNR have a non-linear relationship, as the gains in PSNR diminish with increasing quality levels. This data suggests the following quality levels for certain use cases:
Near-Lossless / Archival
High Quality Stream
Low Quality Stream
Suffering Network Stream
Quality also has a significant impact on the data size, with quality between 10 and 20 generally producing a low bitrate with an acceptable loss in quality.
The following images demonstrate the visual quality of Cairo across a few quality levels. Quality 100 contains only impercetible differences from the raw source video, while quality 5 contains obvious visual artifacts. When the video is viewed in motion it becomes difficult to detect any artifacts at quality 20 or above.
Differential coding is another feature that helps Cairo lower per-frame sizes. Unlike motion compensation,
differential coding is a lossless operation and does not have as dramatic an effect on overall frame size.
This is a backend process that analyzes neighboring blocks prior to serialization into the bitstream. If Cairo detects that it would be cheaper to encode the delta between two neighbors than the regular block data, it encodes that information instead.
Cairo is a fairly simple transform video codec without many modern bells and whistles. It performs only light frame analysis but still manages to produce significant cost savings. While conventional wisdom predicts the benefits of motion compensation, differential coding, and quality control, it's interesting to see their relative impact on real data. Thanks for reading!