Bertolami.com

Virtual Reality Video Compression

Sunday, October 11th, 2015

Several of the most popular virtual and augmented reality experiences involve video. These videos come in a variety of formats including combinations of stereoscopic 3D, 360° panoramas and spherical views. Unfortunately, these formats place significant strain on our processors, memory, and network bandwidth due to their increased requirements for resolution, framerate and latency.

To cope with this trifecta of video compression challenges, we need a video codec that capitalizes on the potential for increased levels of self similarity and inter-frame correlation within these kinds of videos.

The following is a list of basic mixed-reality centric features that could be integrated into a modern video codec to help achieve higher compression efficiency for VR content, while lowering processor and bandwidth costs.

		Foveated Coding
		The human visual system offers us crisp central vision about a point of focus, but significantly diminished visual acuity as we move towards the periphery of our field of view. The fovea is the central region of the retina that is responsible for this crisp central focus. While wearing a head mounted display, the display surface is close to the viewer's eyes which reduces the number of pixels that are scanned by the fovea. In practical terms this means that a reduced portion of the screen is clearly visible to the viewer. By tracking the viewer's gaze we can dedicate higher bitrates to pixels that are inside the foveal range while downsampling pixels that are outside. This feature requires eye tracking, knowledge of the display size, and the distance between the viewer and the display. Our visual perception degrades nonlinearly with distance from the fovea, so the quantization step size is smoothly adjusted to take this into account. As a poor man's test we can render a small red dot on the image and instruct the viewer to follow it as it moves around the screen.

		Improved Motion Compensation
		Motion compensation is a compression optimization that allows us to leverage self similarities within a frame as well as similarities between neighboring frames. Modern codecs use a variety of algorithms to quickly and efficiently detect these similarities, but they are usually tailored to 2D content. Additionally, conventional virtual reality compression solutions usually squeeze stereoscopic content into existing 2D video formats. This implies that left and right views are interleaved by frame to capitalize on existing inter-frame motion compensation processes. While this already significantly improves efficiency, we can further augment this process if we: Perform a more exhaustive motion search for the left eye and leverage the left view motion vectors to optimize the right view motion search. Adjust the search algorithm based on whether the inter-frame search candidate is from the same or opposite view. Transform one block into the 3D space of the other when comparing blocks from two different views. This feature may be greatly assisted by knowledge of the physical left and right camera separation as well as the shape of the lenses (field of view, offset, etc.).

		Wrap-around for 360° Videos
		Panoramic 360 degree videos wrap the frame at its edges. This means that a given edge block may have neighbors on the opposite side of the frame. A VR codec should incorporate these kinds of characteristics so that it considers the full and proper set of neighbors when performing prediction, differential coding, or other spatial optimizations.

		Cropped Frame Encode
		When streaming VR, the encoder may have knowledge of the viewer's orientation and can crop a full spherical view down to the portion that is actually visible to the user. This would reduce encoder, decoder, bandwidth and network latency costs. A caveat is that this feature could also restrict rapid viewer movement and orientation changes. Since the client only receives the minimum data necessary to render the screen, it is unable to support movements that are faster the round trip latency between the client and server. A common workaround for this shortcoming is to have the server transmit a greater frame size than the client will initially display. Then, if the user rotates the view within a small enough arc, the client can quickly update the frame using the additional buffered pixels while waiting for a new frame to arrive.

		HDR and Non-Pixel Formats
		High dynamic range formats require greater bit depths of (usually) 10 to 16 bits per pixel. Filming an HDR video is fairly straightforward, but most codecs and video players do not support HDR content because its usefulness is far more limited in 2D. In VR, HDR would allow the viewer to experience different exposure levels based on their orientation. Imagine a scene where you're standing at the entrance to a tunnel. When you look inward at the dark expanse of the tunnel you see the detail on the cars, pavement and dimly lit signs. As you turn around to face outward the exposure begins to shift and you can clearly see the details of trees, the sky, and other brightly lit objects. In addition to HDR there are other non-pixel data formats for light fields, depth information, and more. Although it is probably less feasible, a modern VR video codec could be flexible enough to support these types of alternative formats.

		Graceful Degredation
		Abrupt drops in video quality are especially jarring in VR. Check out this other post for a discussion of mitigation strategies.

		Partial Frame Decode
		Traditional video decoders process a stream in a fixed order that is determined by the encoder. In order for a decoder to decompress a specific region of a frame it may need access to multiple neighboring regions of the current frame or neighbors. In VR, the decoder may only be interested in a subset of the current frame, and should be able to decode only the portion that is visible to the user. This feature would likely increase bandwidth costs while decreasing client side decode processing costs.

This is a short list of ideas that I've been testing in the Cannes video codec. Look for my follow up post where I'll discuss some of the results. Got an idea for a VR video compression feature that you'd like to share? Send me a note and I'll add it to the list.