This page is incomplete, and for the moment just here as a placeholder describing the OIT implementations on github.
Rendering transparency is a non-trivial problem in computer graphics because colour has to be blended in order. See opengl.org’s Blending and Transparency Sorting.
It’s common to just render transparent geometry such as particles after opaque geometry and disabling depth writing (with
glDepthMask), ignoring sorting. Depending on the applications the artefacts may not be that noticeable.
For applications where correct transparency is necessary, surfaces must be rendered in sorted order. Sorting per-polygon can be expensive and in some cases requires subdividing triangles. An alternative is capturing all fragments and sorting per-pixel, post-rasterization. This is order independent transparency (OIT).
The first OIT methods rendered the scene many times or accepted fragment collisions (race conditions) due to fragments being processed in parallel. Recently (~2009-10), with atomic operations and arbitrary global memory writes from shaders, a new class of OIT techniques that accurately capture all fragments in a single pass became possible, which are now discussed. The following is needed for single pass OIT:
- Atomic operations in the fragment shader to get a unique index for writing fragment data.
- A mechanism for storing per-pixel lists of fragments.
- A fast way to sort per-pixel lists of fragments.
- An iteration through sorted fragments, blending in order for final per-pixel colour.
Capturing and storing fragments are grouped into the rendering pass, to construct a deep image. This is described in Efficient Layered Fragment Buffer (LFB) Techniques.
LFB code can be found here: github.com/pknowles/lfb, and depends on pyarlib. It implements four methods, abstracting them behind a common API:
- Basic 3D array (a fixed-sized list per pixel).
- Per-pixel linked lists.
- Linked list of fragment pages
- Linearized per-pixel arrays (using a prefix sum scan).
Next, the deep image is sorted and composited in the same full-screen pass. Copying fragments to a conservatively sized local array and using insertion sort is standard, but performance can be greatly improved with Backwards Memory Allocation (BMA) and Register-based Block Sort (RBS).
Code for sorting and compositing, using the above LFB code, is available here: github.com/pknowles/oit
This applications includes BMA, RBS and CUDA sorting and compositing, and a few combinations with a benchmarking framework.