Voxel Engines and the Surprising Utility of Software Rendering Knowledge

This project was aimed at representing large volumetric spaces. Written in Rust, my intention was to see how easy it would be to write a high-performance voxel-rendering engine from the ground up.

The type of game I’ve been imagining for this engine is a Descent-like underground shooter with full 6-degrees-of-freedom motion. (Descent was named PC Magazine’s Game of the Year way back in 1994, around the same time as Quake.)

One of the target platform types is virtual-reality (VR) headsets. The need for stable frame rates, low-latency responses, and cross-platform support pushed me in the direction of Rust. C++ is technically another option, but although I used to write a lot of C++, I have no desire to start a fresh project with it. Compared with newer languages, C++ lags in features, its build systems are crufty, it doesn’t handle modules well, and memory management is a real pain. Rust claims to address all of these problems (and being built on LLVM, its runtime performance should be comparable to C++), so I decided to try it out for this project.

[Note: If you’ve been writing code in C or C++, I’m interested in knowing what you think about Rust. It’s got an object ownership and borrowing system that offers thread-safety and automatic memory deallocation without the performance and timing problems of a garbage collector.]

But this post isn’t really about programming languages. This post is about how representing and manipulating volumetric spaces.

Target Use Cases

These days, I approach every project as a design problem. As a design problem, I have to ask some questions: What kinds of things do I want to be able to do with this code? What are its use cases?

The basic cases I have in mind are the following:

  • Easy editing of volumetric spaces. Unlike most editing tools, which focus on defining surfaces and placing objects, I want to be able to “carve” interesting spaces out of a solid volume that extends “infinitely” in all directions. Minecraft allows something a bit like this, but its voxels are quite large and it is focused on one-voxel-at-a-time manipulations. I want tools to add, remove, or alter voxels en-masse.
  • Voxels that are freely destructible at run-time. Destructibility implies the ability to re-compute/re-light selected surfaces at runtime, which implies a fast caching strategy and a fast occlusion-culling mechanism. Again, Minecraft does this to an extent, but it would be interesting to see if a higher-resolution voxel map can be made to perform well under destructibility.

Basic Representational Structure

Here’s the basic volumetric representation:

  • Span has an integer start position and a positive-integer length.
  • Row is an ordered sequence of non-overlapping spans.
  • 2D Grid is a dequeue of Rows with an integer starting offset. It is a dense axis-aligned collection that allows rows to be added quickly to either end of the grid.
  • 3D Grid is a dequeue of 2D grids with an integer starting offset. It is a dense axis-aligned collection that allows 2D grids to be added quickly to either end of the 3D grid.

The row/span structure is essentially a run-length-compressed representation of voxels. It assumes there will be long sequences of consecutive voxels that are either all empty or all solid, which I think is a fair assumption to make (especially if the voxel grid is very high-resolution).

Run-length compression makes it possible to store very large or high-resolution voxel maps in RAM. As a thought experiment, say we wanted to represent a volume that is bounded by a 10,000 x 10,000 x 10,000 cube. Furthermore, let’s say that the volume is not too complex, maybe an average of 3 spans per row (some rows may have substantially more spans, most rows will have fewer). 10,000 2D grids times 10,000 rows times 3 spans times 2 integers times 4 bytes = 2.4 GB, plus some bookkeeping and caching overhead. This number is well within the capacity of current hardware.

Queries, Manipulation, and an Unexpected Application of Software Rasterization Knowledge

All manipulation operations maintain the representational invariants of the row/grid structure. Notably, since a row must be maintained as an ordered sequence of non-overlapping spans, adding or removing voxels may cause spans to split, joined, or removed altogether.

As an implementation detail, a Row structure maintains span start offsets in one array and span lengths in another array. Querying for the presence of a voxel can then become quite a fast operation; finding the right row involves 2 memory accesses, finding the right span is a simple search of an ordered array of integers, and checking for voxel presence is a single span-length comparison.

Now: regarding the manipulation of voxels en masse: ideally, I want to add or remove large numbers of voxels at once. I want to be able to use tools to carve or extrude shapes into the voxel volume, which would allow me to quickly lay out interesting 3-dimensional spaces. It occurred to me that realizing such shapes in an axis-aligned voxel grid is a form of rasterization — 3D shapes would be discretized first as a set of 2D shapes, and these 2D shapes in turn would be decomposed into sets of spans, which would be “rendered” into the voxel row structure.

There are at least 2 major differences between the type of “rendering” I propose here and the more commonly-understood form of rendering: first, 3D shapes are rendered as volumes, not reduced to surfaces; and second, rendering into a voxel row is an operation on a compressed structure, not an uncompressed row of pixels. However, the basic operations of fragment decomposition and scanline determination seem virtually identical.


However, there is another step. Rending into a voxel space is not the same as representing that space on the player’s screen. Before the player can see the voxel space, it must be converted to triangle meshes that can be rendered by more ordinary GPU hardware.

I implemented a very basic voxel-to-surface conversion. This conversion doesn’t try to smooth the mesh (which is an interesting and possibly complex problem in itself), but leaves the voxels looking like cubes.

Essentially, the only parts of a solid voxel that must be rendered are those faces that are adjacent to empty voxels. These faces are assembled into triangle meshes, then cached. As the voxel grid is modified (either in an editor or at runtime), only those meshes that touch a modified voxel must be re-built.

Visible Surface Determination

I have not yet built an occlusion-culling mechanism. In a brief search for occlusion culling methods, I did not see any approach that would clearly work well, so I suppose I’m on my own for this problem.

The original Descent game had a portal-and-segment-based culling approach, which may make sense to adapt here. Essentially (if I recall this correctly), it worked like this: every segment is rendered with knowledge of the viewport extents (in view space, the extents are just a minimum and maximum x and y). Segment faces are traversed, and a segment face is rendered only if some part of that face is visible inside the current viewport extent. If that face is non-opaque (i.e., an adjacent segment should be visible through that face), then a new viewport extent is created for the adjacent segment: the new viewport is the intersection of the current viewport and the view extents of the face. Essentially, each non-opaque face is a portal into an adjoining segment (Descent‘s scanline renderer clips against the computed viewport extent to avoid rendering too many unnecessary pixels). Segments that never fall within the computed viewport extents of any of their faces are effectively culled.

In voxel space, it is possible to treat each face like a portal. Although this approach would work, it seems like it would be rather slow if the voxel grid has a high resolution. One possibility here is to do occlusion culling on a low-resolution approximation of the visible grid. For example, if the occlusion grid had one voxel for every 16x16x16 block of voxels in the original grid, there would probably be a substantial reduction in CPU-time cost for a probably very small increase in GPU-time cost.

Such an approach is convenient for mesh caching: each 16x16x16 block would be a single mesh, which would either be rendered in its entirety or not at all. So the new effect with respect to coding is that two voxel grids would be maintained simultaneously: one at a high resolution and one at a low resolution. GPU-renderable meshes would be generated with respect to the high-resolution mesh, but cached and culled with respect to the low-resolution mesh.

[Dear reader, what do you think? Is this reasonable?]


The voxel grid I’ve described here is logically a 1-bit-per-voxel representation. In a real game I would certainly want to store more than 1 bit of information per voxel — things like material type, level of destructibility, etc. It would also be interesting to allow level editors to “paint” voxels with different properties, or support layers, kind of like a Photoshop for voxel spaces. Perhaps the easiest way to accomplish this is to add a per-span index number, each of which would reference a specific combination of properties, materials, layers, etc. This would come at a moderate increase in storage cost and complexity of grid-manipulation code.

Finally, dear reader, I have a question for you: Is supporting voxel technology worth writing a new engine and tools?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s