• Final Stage Path Tracer 1.0

  • Completed 2011

Path tracing is a straightforward technique for rendering computer generated images. This research concluded with a functional GPU based path tracer, called FinalStage, that is designed to generate images in real-time (dependent upon device capabilities). This tracer is bundled into a library which provides multiple simple abstracted interfaces for scene description that closely resembles other popular graphics rendering APIs (e.g. OpenGL, Direct3D). Accomodations are also made for the retained nature of the renderer.


The tracer was designed to be cross-platform with logical abstractions for scene management, tracing, and shading. As a result, the tracer was quickly ported to several different platforms relatively early in development, and now supports the following:

  • HTML5 Capable Browsers

  • Windows (all versions)

  • Mac OSX 10.4+

  • iOS 4.0+

Additionally, the engine supports optimized compute paths for the following APIs:

  • NVIDIA CUDA 3.0+

  • OpenCL 1.0+


Performance is significantly improved through a combination of two features: a custom PVT hashing algorithm that greatly reduces scene traversal cost, and the renderer's full GPU implementation(s). Thus, the scene complexity becomes decoupled from rendering time - more objects do not reduce rendering performance, but they do increase pre-computation time. A separate lightweight thread is employed to maintain the hash with dynamic objects.

This rendering engine supports the following features on each platform (except HTML5):

  1. Physically based rendering: the renderer uses an unbiased simulation of light transport that incorporates both surface and subsurface interactions with materials. This system enables reflections, refractions, and subsurface scatter effects.

  2. Optimized scene traversal: a special hashing algorithm is used to efficiently cache the intermediary results of a pass. The renderer analyzes this information to shortcut the traversal and significantly reduce render times.

  3. Full 128 bit pipeline: the entire pipeline supports pixel rates of 32 up through 128 in order to support high dynamic scenes and effects.

  4. Common object and material formats: the renderer relies upon common and well documented content formats such as obj, mtl, bmp, tga, jpg, png, open-exr, and more .

  5. Large Material Library: a large selection of materials is available that demonstrate surface and subsurface light interactions. Example materials include: skin, glass, foil, anisotropic reflectance, etc.

  6. Animation and Instancing: animation would typically be considered outside the jurisdiction of this renderer, but due to the optimizing potential of instancing, some limited animation support is available. The renderer supports skeletal and key-frame animation, and will optimize the content for faster rendering.

  7. Image Based Effects: depth of field, heat transfer, anamorphic glare, blur, flares, god-rays, and more, are enabled via a configurable post-processing effects pass.

  8. Physics Modelling: certain effects can be difficult to achieve through conventional animation, so the renderer supports a lightweight physics simulator that focuses on modelling clouds, water, fur, and fog. This process is currently not GPU accelerated and may be optionally disabled.

  9. Offline Processing: the renderer expects optimized volume hierarchies for 3D meshes. Generating these hierarchies can be performed either at load time, dynamically during runtime (in the case of dynamic objects), or offline using a tool.

  10. Path Shaders: greater rendering flexibility is achieved through path shaders (see the section below for more detail).

  11. Visual memory management system: a common hurdle for renderers that execute on a GPU is the limitations of non-or-partially-virtualized video memory. The goal of the visual memory manager is to manage access to resources in a way that relieves the burden of packing entire scenes onto a GPU. See the section below for more detail.

Path Shading

Path shaders are small programs written by developers that are executed at each step (or bounce) of the path tracer. Shaders are provided with important information about the current step including the incident ray, material properties, and global information. During execution, each shader is expected to queue additional rays (as necessary), accumulate data (handled automatically by the system), and return a measure of the perceived light to the caller.

Example shader:

float4 main( Ray input, Object source )
    float3 reflect = reflect( input, source.normal );

    // gather will automatically trace 'reflect' into the scene
    // and return the accumulated light value

    float3 indirect = gather( reflect );

    float3 lambert  = max( 0, dot( source.normal, reflect ) );

    return lambert * indirect * source.diffuse;

Path shaders are currently interpreted at runtime by the renderer. In the future, I hope to migrate this to a JIT compiled approach. Given the current limitations, the renderer also provides a fallback that disables path shading and instead relies upon a generic, configurable, and natively compiled path shader.

Memory Management

GPUs today typically have on the order of 8GB of video memory available that can be used to store textures, materials, geometry, and scene hierarchies. Unfortunately this is often inadequate to handle very large or detailed scenes (where a single character can require 25GB+ of memory). To manage this, we use a visual memory management system.

The goal of the visual memory manager is two fold:

  1. to determine the minimum set of data that is necessary to render the next frame

  2. to manage movement of data between larger system caches and the more limited video memory

The first step is accomplished through a feedback system between the memory manager and the renderer. Each frame, the renderer provides information about what was actually rendered, and the memory manager translates that into a memory map that prioritizes active object residency in video memory. The second step uses a simple on-demand paging policy to provide the renderer with access to data that it needs. Step two is slow but (ideally) unnecessary when the memory manager is able to predict and fit all active portions of the scene into video memory in advance.



There are a few different variations of builds to choose from, each with very different performance characteristics. If you have an nVidia GPU, I strongly suggest trying the CUDA build below. Otherwise, you should try the (much slower) CPU build.

If none of these builds match your hardware and operating system configuration, check out the HTML5 version, but be forewarned, it is quite slow due to its reliance on JavaScript.

  FinalStage Path Tracer Demo (Windows, CUDA - NVIDIA GPUs ONLY)
  FinalStage Path Tracer Demo (Windows, CPU)
  FinalStage Path Tracer Demo (HTML5)