Alex Nankervis's Stuff

My programming stuff:

I've been interested in programming (and computer graphics in particular) ever since I got my first 286 and the explosive banana throwing gorillas that came with it, but it didn't really take off until many years later, when I found myself messing around with Dark Basic, trying to force the game maker's scripting engine into doing all sorts of sadistic things it wasn't meant for, like rendering 1996-era game levels triangle by triangle and manual collision detection...realizing that this wasn't cutting the mustard in terms of performance, I made the jump to C++ and OpenGL and haven't looked back.

Along the way, I've come up with more than a few demos that are just collecting dust on my hard drive, so I figured I'd put some of the more interesting ones up for people to see. A large number of these require pixel shader 2 capable graphics cards (my older demos have long since died horrible deaths in hard drive crashes and I don't need to be inflicting register combiners on anyone these days), and some of them need SSE capable CPUs.

Contact: anankervis at gmail

Most of the demos include source code and a readme with instructions. The source code was originally never meant to see the light of day, but it shouldn't be too hard to follow - I figure it beats the heck out of just a binary, at least. If you find something scary in the code for the older demos, I warned you!



January 21, 2013 - updated Voxel Cone Tracing Global Illumination demo
December 17, 2012 - updated recent demos to include full source, Windows 8 fix
November 24, 2012 - added Voxel Cone Tracing Global Illumination demo
September 30, 2012 - added Pixel Dead Reckoning demo
September 25, 2012 - added Linked List Antialiasing and Edge Resampling demo
March 31, 2011 - added Super Frontier Wars
December 9, 2010 - added Straight Aces, Gammon Trigger demo, Astronomo Rex video
September 20, 2009 - added CUDA 3D Perlin Noise demo
July 15, 2009 - embedded Felwyrld tech slides
July 14, 2009 - added Air Master 3D, added Gammon Trigger video
June 21, 2008 - fixed GS Painterly demo under NVIDIA 177 drivers
April 10, 2008 - added code samples
March 23, 2008 - added new Felwyrld screenshots, Gammon Trigger
July 1, 2007 - added Geometry Shader Painterly Rendering demo
June 30, 2007 - added DXT GPU Compression demo, Geometry Shader Tessellation demo
October 23, 2006 - added Light Beams and GPU Clouds demo, Soft Particles demo
August 30, 2006 - added GPU Ray-Triangle Intersection demo


In no particular order:



voxel cone tracing global illumination (127mb)

An implementation of global illumination using voxel cone tracing, as described by Crassin et al. in Interactive Indirect Illumination Using Voxel Cone Tracing, with the Crytek Sponza model used for content.

This demo served both as a means to familiarize myself with voxel cone tracing and as a testbed for performance experiments with the voxel storage: plain 3D textures, real-time compressed 3D textures, and 3D textures aligned with the diffuse sample rays were tested. Sparse voxel octrees were not implemented due to time constraints, but would have been nice to have as a baseline reference. Compared to SVO in the context of voxel cone tracing (as opposed to ray casting, where SVO is a clear winner), 3D textures allow for easier filtering, direct lookups without evaluating the octree structure, and potentially better cache and memory bandwidth utilization (depending on cone size and scene density). The clear downside is the space requirement: 3D textures can't scale to larger scenes or smaller, more detailed voxels. There may be ways to work around this deficiency: sparse textures (GL_AMD_sparse_texture), compression, or hybrid schemes that mix tree structures with 3D textures.

Real-time DXT compression is fast enough to convert the 3D voxel textures on the fly, however API and driver limitations prevent this from being an effective choice due to the inability to write directly to tiled texture memory and CPU fallbacks that get triggered when trying to populate a compressed 3D texture from GPU memory. The potential memory bandwidth savings did not result in a performance advantage - it seems that the cone tracing is limited by texture filtering and ALU on the hardware tested. This approach may still be worth considering, simply for the compression alone.

Aligning the 3D textures with the diffuse sample cone directions simplifies cone tracing significantly (removing the need to manually filter the directionally-dependent voxels), allowing the diffuse cones to be traced much faster. Unfortunately, this also requires that the cone directions be uniform for all fragments, which in turn requires more cones to maintain quality, giving a net loss.

Requires OpenGL 4.3. Tested on an NVIDIA GeForce GTX 680 with the 310.54 beta drivers.

January 21, 2013 update: improved performance (voxel clear and mipmap steps), corrected voxel mipmap blending equation, changed camera start location



pixel dead reckoning (frame reprojection) (15.4mb)

Extrapolates new frames using color, velocity, and transform data from a previous frame. Can be used to upsample framerate (for example, from something unsteady and below 120hz to a steady 120hz), reduce perceived lag/delay in the input-process-render-display loop, align images with different delays or transforms (such as video and a rendered overlay), and generate stereo pairs. Just like with networking dead reckoning, there will be artifacts when the new frame cannot be extrapolated entirely from previous values. This demo demonstrates frame rate upsampling, delay compensation, and anaglyph stereo pair generation.

A keyframe is generated every N frames with color, depth, and velocity, and is tagged with the transform matrix and current time used to render the frame. Subsequent frames skip rendering the scene, and instead jump directly to the extrapolation pass, providing a new time and transform for the keyframe to be projected to. The reprojection algorithm first scans over the keyframe, determining which points are static between the keyframe and extrapolated frame (outputting these directly). Points which are not static are appended to a buffer with color, position, and size information (for points undergoing rotation, size must be adjusted to maintain area and prevent gaps). A second pass scatters these points to their final location in the extrapolated image.

Delay compensation simply projects the keyframe ahead in time by some number of milliseconds, and makes use of the velocity buffer. Framerate upsampling does the same, but also applies a new transform matrix to take into account user input between keyframes. Stereo pair generation reprojects a left and right pair from the keyframe (a possible optimization would be to generate the left or right side as the keyframe, and only do one reprojection to create the other half of the pair). The black regions around objects are areas of disocclusion where there is not enough information to reconstruct the frame. These could be filled with some sort of hole-filling algorithm or with data from older keyframes.

Requires OpenGL 4.3. Tested on an NVIDIA GeForce GTX 680 with the 306.63 GL 4.3 beta drivers.



linked list antialiasing and edge resampling (15.3mb)

Renders the scene at a resolution lower than the screen resolution (1/10th x 1/10th resolution, in this case) and stores a linked list of fragments that touch each pixel, along with enough information about each fragment to reconstruct the edges of the triangle that produced it. A custom resolve shader then traverses the linked list of fragments for each pixel, and calculates coverage at a higher resolution. The end result combines the lower-resolution shading with full-resolution, antialiased edges (3x3 antialiasing in this example). Conservative rasterization of the scene geometry is used to ensure adequate coverage of fragments to produce the final, upscaled image.

Compared to image post-processing techniques such as FXAA and MLAA, the performance leaves a bit to be desired and the implementation is fairly intrusive (requiring modifications to fragment and geometry shaders to implement conservative rasterization and the linked list of fragment data). However, this technique does have the advantages of separating the shading resolution from the output geometry resolution (for shading-heavy applications, you could reduce the shading workload while still keeping sharp geometry edges), allows for higher quality antialiasing than MSAA or image post-processing AA, and provides a straightforward avenue to implement order-independent transparency using the existing linked list and resolve shader infrastructure.

Requires OpenGL 4.3. Tested on an NVIDIA GeForce GTX 680 with the 305.67 GL 4.3 beta drivers.



Air Master 3D

Air Master 3D is an arcade flying sim for the iPhone and iPod Touch. Click on the link above to see the official website with screenshots and more details. Air Master 3D features procedural terrain, a high speed OpenGL ES renderer, and a cross-platform code base that allows simultaneous development on both the Win32 and iPhone platforms. The latest version adds a completely new shader-based render code path for OpenGL ES 2.0 devices, enabling effects such as fully-reflective, rippling water and dynamically lit volume clouds.



Super Frontier Wars

Super Frontier Wars merges the team-based multiplayer and jetpacks from Tribes with the ability to quickly modify (construct, harvest, and destroy) the surrounding environment to your advantage. The premise is a future where mining corporations battle over the resources of alien planets, using re-purposed mining technology as weapons to alter the environment. Rendering features include dynamically lit volume clouds, real-time ambient occlusion, procedural trees, detail geometry (grass, etc.) generated via geometry shader, and displacement mapping using parallax mapping, relief mapping, or tessellation depending on hardware capabilities. The game is playable over the internet or against AI.



Gammon Trigger (19.3mb)

Gammon Trigger is a top-down space adventure game. On the rendering side, it features normal-mapped, specular-mapped, environment-mapped, emissive-mapped, and pretty much every-other-mapped sprites, which allows an otherwise 2D game to take advantage of dynamic per-pixel lighting, image based lighting, and all sorts of other fancy effects. Amazingly, it manages to run quite well even on most integrated graphics chips that support fragment programs. There's plenty more besides graphics, though - a randomly generated galaxy for each game; a dialogue system and snazzy UI; physics, AI, and audio; a mission system - and anything else that comes up along the way.



felwyrld (terrain generation/rendering/occlusion, procedural trees, sky rendering, client/server)

Felwyrld is a prototype of a graphical MUD, and includes a client/server setup designed for minimal bandwidth usage through means such as server prediction of clients' extrapolated data state (updates are sent only when client-side extrapolated data will become out of sync), client views, compression, and quantization. Server CPU load is kept low through dynamic spawning around clients, and logins are handled through an encrypted system.

The Felwyrld client features generated terrain (about 50km x 50km, requiring relative coordinates for accurate rendering) with disk paging, procedural trees with wind animation, and sky rendering with day/night cycles. The terrain and all models are fogged using the sky color (including sunset variations) and lit with the sun color and ambient directional sky color. Trees are procedurally generated using a method similar to the one described in 'Creation and Rendering of Realistic Trees' by Weber and Penn. Vertex programs are used to make tree branches bend and leaves twist based on wind direction. Sky rendering is performed similar to bump mapping and allows for overly dramatic sunsets, cloud linings, and sun bleed-through based on cloud depth. Occlusion queries are used extensively in conjunction with a quad tree to cull as much as possible against the terrain and other occluders. (Update: the newest Felwyrld screenshots display improved terrain texturing and lighting, high quality dynamically lit volume clouds, dense grass and undergrowth, and the engine is now using deferred rendering.)



geometry shader painterly rendering (3.2mb)

Renders a scene into color, position, and normal textures, and then outputs the scene as a large number of brush strokes covering the screen using the geometry shader. Brush strokes can be applied from arbitrary triangle meshes or screen-space meshes, and adjust size and quantity to fill the source triangles. Strokes rotate to follow the curvature of objects in the scene, and are placed using random numbers generated in the geometry shader. Requires a shader model 4 card.



dxt gpu compression (2.1mb)

Implements DXT1 texture compression entirely on the GPU for shader model 3 cards. Lookup tables are used to get FP16 values containing the correct bit values needed for compression. Pixel buffer objects are then used to perform a GPU copy from the compression results into a compressed texture. Useful for when the results of a render to texture are reused multiple times before being rendered again, or when compressing a large number of textures.



CUDA 3D Perlin Noise (.2mb)

Implements 3D Perlin Noise on the GPU in a CUDA kernel. Two samples of 3D noise are taken per vertex and used to animate a 1024x1024 heightfield. The resulting vertex buffer is filled directly by the CUDA kernel, avoiding the need for readbacks or copies.



astronomo rex (per-pixel lit particles)

Astronomo Rex is a side-scrolling shooter (or close enough) that was created for a game competition. It features per-pixel dynamically lit particles, which makes for a pretty spectacular addition to otherwise boring particle-based smoke. Other effects are put to good use, such as screen displacement and 3d textures that make models appear to burn through and disintegrate over time (with a little fragment program assistance). Particles are already fill-intensive, and adding per-pixel lighting certainly doesn't help, but the game manages a good framerate on most graphics cards and performance is easily scaled by changing screen resolution, as well as limiting the number of contributing lights on less capable cards.



geometry shader tessellation (0.5mb)

Tessellates a heightfield based on distance to the viewer using the geometry shader, with smooth transitions between each tessellation level. The geometry shader isn't really meant for tessellation, but it works well when combined with transform feedback to store the results for later reuse. Requires a shader model 4 card.



gpu ray-triangle intersection testing (3.8mb)

Finds the closest triangle that intersects a ray, using the GPU. A GLSL shader is used to calculate the ray-triangle intersection for each triangle and then another shader is used to progressively downsample the set of results into a 1x1 texture containing the nearest intersection. This demo probably requires a shader model 3 card.



per-pixel lighting, bump/parallax mapping, and stencil accelerated shadow mapping (7.6mb)

Shows off bump mapping with shadow maps, making use of the stencil buffer and alpha testing to give a huge performance boost when looking at shadowed pixels or pixels out of a light's range. This goes a long way towards having game levels with a large number of dynamic or static per-pixel lights in the same room. Parallax mapping gives a nice sense of depth to the bump maps. The shadow maps use packed RGBA8 cube map textures, so there's none of the headaches associated with float or depth textures. Stencil shadow volumes can also provide the same shadow masking acceleration, though the light range optimization is much more difficult.



ambient occlusion/bent normals (5.5mb)

Uses the GPU to accelerate calculation of an ambient occlusion and bent normal texture map. Variance shadow mapping is used to determine visibility from a particular direction, and a large number of passes from random angles are summed into a floating point texture. After all passes are completed, the texture is downloaded to system memory and edges inside the texture map are bled outwards to avoid artifacts when mipmaps of the texture are sampled. A source model with unique texture mapping works best. Variance shadow mapping seems to produce better results than standard shadow mapping, despite precision issues, though this does have an impact on the program's ability to catch small details in the model. A small change to the fragment program can also enable cosine weighting.



displacement modeler (9.1mb)

Allows editing of high resolution models by using 'push' and 'pull' tools. Geometry is manipulated and rendered quickly through spatial partitioning and the construction of a topology map. Multiple tool shapes are supported by placing grayscale targa files in the brushes folder, and tools can operate in either the normal push/pull mode or in a smooth/sharpen mode. Models with several million triangles can be edited easily, provided your graphics card is up to the task and you have enough system memory. Rendering of several million triangle models in real time works well enough, though it requires splitting the model into a large number of chunks to avoid issues with driver/hardware vertex buffer size limitations. A nice extension to this demo would be to only re-render altered sections of the model when using the tools.



motion blur (31.5mb)

Implements motion blur by combining previous and current frame transform matrices. Vertices are stretched along the direction of motion, and the per-pixel velocity is stored into a 2nd render target, which is then used to blur the color output. The blur takes into account both object and camera movement because calculations are done using post-transform positions and screen space velocity.



red: space trucker

Red is a completed top-down shooter. It will work on pretty much anything that supports OpenGL 1.1 and OpenAL, and features a variety of generated content, namely space backdrops and planet sprites.



procedural planets (1.9mb)

Creates several procedural planets using SSE-optimized 3d Perlin noise. Extra detail is generated as you get closer to the planets, and a cube-based warped mesh is used to facilitate GPU-friendly level of detail and texture mapping. The planets even sport cheesy atmospheres that are calculated inside a fragment program by intersecting the view vector against the near and far sides of the atmosphere shell and calculating the distance between the two intersections, then passing that and the distance to the planet's bounding sphere (altitude) into a texture lookup. Two versions exist, one using multithreading for detail generation and the other not, in case of performance issues on some processors.



light beams and gpu clouds (8.8mb)

Implements radial light beams from a point light source (in screen space). Light emitters are rendered into a texture, with the depth buffer primed with occluders that block light sources. An image processing step similar to a radial blur is performed and the image is then added on top of the scene. This approach could be modified to create shadows as well. Clouds are generated using a GPU perlin noise implementation, which prevents stutters when creating the next cloud animation target.



soft particles (.9mb)

Large particles typically clip noticably against other geometry. 'Soft particles' get around this problem by scaling alpha when the distance between the particle and the nearest solid object behind it is less than the particle's size. A texture with RGBA8 encoded depth is used for the scene depth lookup in the particle's fragment shader. Proper truncation of values is important when encoding floats into an RGBA8 texture, as the results of the encoding process could otherwise be rounded producing an incorrect depth when unpacked. This effect is from a Crytek paper about screen depth effects.



Straight Aces

Straight Aces for the iPhone is a poker-themed puzzle game, created in collaboration with designer Aaron Calta and artist Zack Wallig. The objective of the game is to recognize and select high-value poker hands by choosing five adjacent cards from the randomly dealt game grid. Bonus time is awarded for valuable hands, allowing for games that range from a minute in length all the way up to 10-minute-long epic play sessions by chaining multiple high-value hands in a row. High score and in-depth statistic tracking shows your improvement over time.



gpu bicubic filtering (.6mb)

Creates Perlin noise by summing a grid of 2d noise at different resolutions using a bicubic filter implemented in a fragment program.



neural network (.6mb)

A simple neural network that performs image compression and visually displays the results of learning. Contains C++, C++ with x86 assembly, and C++ with x86 + SSE assembly implementations.



lorenz (.1mb)

A 3d viewer for the Lorenz attractor, it lets you pause, adjust evaluation speed, and create multiple simultaneous instances. The differential equations are adaptively evaluated to avoid creating excessively dense line segments while retaining detail where needed and maintaining a good rendering speed on older graphics cards.



water simulation (.3mb)

An ocean water simulation with FFT-animated heightfields, reflections, and fresnel effect. Began as an attempt to implement a projected grid, where all vertices are equally spaced in screen space, but this resulted in a disturbing swimming effect on the height of vertices, and so it was dropped in favor of a traditional heightfield grid.


code samples (.1mb)

Contains code samples from my current project. Many of the demos on this site were originally written for my personal learning and haven't been touched in quite a while, so I've uploaded some newer code demonstrating a few engine utilities and components from my current project. The code is mostly portable thanks to cross-platform libraries, but does have some non-essential dependencies on the Win32 API or MSVC (directory monitoring and debug support). Included are:
  • A fixed-size string implementation that emulates STL string while avoiding allocations
  • A texture loader supporting PSD and TGA files, and runtime reloading of modified texture files
  • A shader loader for ARB_fragment_program and ARB_vertex_program which supports runtime reloading of modified shaders
  • An OpenAL wrapper which can load and play .ogg files, including music streamed from file
  • Win32/MSVC Debug support - console creation and output, memory allocation tracking
  • A simple allocator for reusing a block of scratch memory, good for avoiding allocations and fragmentation


c-- compiler (.1mb)

A compiler for a stripped down version of C, written for a university compilers class. Outputs MIPS assembly.



hair tool

This was created as a styling tool to explore the possibility of character models with individual animated hairs, but mostly resulted in comical Sean Connery look-alikes and power mullets.



3d clouds (1.5mb)

An implementation of Mark Harris's 3d clouds. By using dynamic billboards, a large number of these can be rendered. They require a preprocessing step any time the light source changes, which does not take excessively long, but a different approach borrowing from volume fog might be more suitable for use with dynamic light sources. (Update: though not shown in this demo, using a simple software rasterizer instead of GPU readback gives interactive performance with dynamic lights. See Felwyrld above for a large scale implementation of this.)


Jedi Knight modifications and reverse engineering:

Jedi Knight is an old LucasArts game released in 1997. It presents a number of unique challenges for the editors that still work with it, mostly due to the software rendering-era portal engine that it is built around. For more information see jkhub.net.


The Sith2 Engine: this was a short-lived open source project that made quite a bit of progress towards cloning the original game engine. My contributions included an optimized renderer, portal visibility determination, audio, input, and some collision detection/response coding.

Soft Shadows: a plugin for the game's level editor that creates static soft shadows by exploiting vertex lighting through cutting up of surfaces to create areas of blending between lit and unlit vertices. Efficient culling of a light's view frustum against successive level portals makes this a relatively quick operation on levels of all complexity.

Scripting Extensions and Rendering Improvements: by reverse engineering and debugging the game, I was able to produce a series of patches that removed limitations in the game's rendering and resource management, as well as inserting an extendable DLL addon into the game's built-in scripting engine that provides new scripting functions.

DirectX Wrappers: a pair of wrappers, one for DirectInput which adds mouse wheel support to the game, and another for DirectDraw which converts all rendering to OpenGL. The DirectDraw -> OpenGL wrapper has to properly handle pre-transformed vertices that lack perspective texture correction (the game engine supplies transformed vertices, but expects DirectX to handle perspective correction).
Back to Naixela Software