Sunday, October 11th, 2015
Several of the most popular virtual and augmented reality experiences involve video.
These videos come in a variety of formats including combinations of stereoscopic 3D,
360° panoramas and spherical views. Unfortunately, these formats place significant
strain on our processors, memory, and network bandwidth due to their increased requirements for
resolution, framerate and latency.
To cope with this trifecta of video compression challenges, we need a video codec that capitalizes
on the potential for increased levels of self similarity and inter-frame correlation within these kinds
of videos.
The following is a list of basic mixed-reality centric features that could be integrated into a modern
video codec to help achieve higher compression efficiency for VR content, while lowering processor and
bandwidth costs.
Foveated Coding |
||
The human visual system offers us crisp central vision about a point of focus, but significantly
diminished visual acuity as we move towards the periphery of our field of view. The fovea is the
central region of the retina that is responsible for this crisp central focus.
|
||
Improved Motion Compensation |
||
Motion compensation is a compression optimization that allows us to leverage self similarities within a frame as well as
similarities between neighboring frames. Modern codecs use a variety of algorithms to quickly and efficiently detect these
similarities, but they are usually tailored to 2D content.
This feature may be greatly assisted by knowledge of the physical left and right camera separation as well as the shape of the lenses (field of view, offset, etc.). |
||
Wrap-around for 360° Videos |
||
Panoramic 360 degree videos wrap the frame at its edges. This means that a given
edge block may have neighbors on the opposite side of the frame.
|
||
Cropped Frame Encode |
||
When streaming VR, the encoder
may have knowledge of the viewer's orientation and can crop a full spherical view down
to the portion that is actually visible to the user. This would reduce encoder,
decoder, bandwidth and network latency costs.
|
||
HDR and Non-Pixel Formats |
||
High dynamic range formats require greater bit depths of (usually) 10 to 16 bits per pixel. Filming
an HDR video is fairly straightforward, but most codecs and video players do not support HDR content
because its usefulness is far more limited in 2D.
|
||
Graceful Degredation |
||
Abrupt drops in video quality are especially jarring in VR. Check out this other post for a discussion of mitigation strategies. |
||
Partial Frame Decode |
||
Traditional video decoders process a stream in a fixed order that is determined by the encoder.
In order for a decoder to decompress a specific region of a frame it may need access to multiple
neighboring regions of the current frame or neighbors.
|
This is a short list of ideas that I've been testing in the Cannes video codec. Look for my follow up post where I'll discuss some of the results. Got an idea for a VR video compression feature that you'd like to share? Send me a note and I'll add it to the list.