Sunday, July 3rd, 2011
Cairo is a simple
video codec that I developed in early 2011 during the later stages of everyAir.
The purpose of this codec was to experiment with compression features and learn about their individual
impact on efficiency while developing a simple framework for future work. Cairo was also designed to be easy to
understand, and serves as a great learning aid for students.
Throughout the development of Cairo I was greatly assisted by the availability of several world class H.264 encoders. These encoders provided a benchmark to which Cairo was tuned and re-tuned. Now that Cairo is complete, I thought it might be interesting to see how it performs against the best H.264 coder around: x264.
Note that this is not a rigorous comparison by any means. Further, it is notoriously difficult to compare video codecs due to the subjective nature of quality, inherent sampling error, operator bias, etc. This comparison is no doubt biased in favor of Cairo and should be taken with a large grain of salt.
For this test, I used the first 1,000 frames of the Big Buck Bunny trailer, sampled to a resolution of 640x368. The H.264 codec
was represented by x264, which is inarguably the gold standard for H.264 encoders. Using H.264 main profile with default settings,
I configured Cairo to roughly produce similar PSNR values for the sample video. Given the similarity of Cairo's architecture and quality settings
to x264, this setup was fairly straightforward.
In this chart we see that x264 consistently outperforms Cairo (no surprise), but that Cairo tracks fairly closely to it for the majority of the video. This trailer contains many abrupt scene cuts, fades from white, cross fades, and vibrant high frequency data that challenges both codecs.
The two frames above are virtually identical, exhibiting a very large number of pixel differences, but each one being extremely small (only 1-2 pixel values apart). These images are from frame 151 in the trailer, which is towards the end of the initial cross fade from the titles, so it's no surprise that our frame sizes are quite large for a 640x368 inter-predicted frame.
This image shows the difference between our x264 and Cairo images, but multiplied by 128 to make the differences much more visible.
You may have noticed that the Cairo frame is significantly smaller than the x264 frame, and yet our graph above doesn't seem to
represent this fact around frame 151. This is due to the fact that our graph has been smoothed with a moving average to show the
general trend in frame sizes.
In reality, both Cairo and x264 fluctuate their per-frame sizes as they determine how to best appropriate their overall data budgets. As a result, there are many individual frames where Cairo outperforms x264, but these are often followed by frames where x264 more significantly beats Cairo, and the overall result is a smaller data size for x264.
The following table lists the respective file sizes for the Big Buck Bunny trailer in raw and compressed formats. Both Cairo and x264 provide a significant savings over the raw file size, with a compression ratio of about 293:1.
Cairo is a fairly simple transform video codec without many modern bells and whistles. It performs only light frame analysis but still manages to produce significant cost savings. When compared with industry leading x264, it tracks relatively closely in several areas, but could use additional improvements in a few others. As an experimental testbed, Cairo is a great resource for future learning and experimentation.