From GPWiki
Jump to: navigation, search

Dynamic Particle Systems


Particle systems have long served as a technique for implementing environmentally based effects. A particle system consists of many sub-elements (the particles) that when considered together (the system) appear to represent a more complex entity. The particle system technique provides a useful and intuitive method for rendering complex phenomena by accumulating the results of many simple rendering operations. By using a cumulative rendering process, the particle system provides a rough simulation of a significantly more complex object with relatively low cost rendering. They are commonly used for things like smoke, clouds, dust, fire, rain, and snow.

However, even though particle systems can contribute to the general impression of realism in a game world, they have historically been restricted to graphical effects and have generally not been capable of interacting with the player or other entities within the game world due to the large number of particles that are normally used. Consider trying to perform collision detection and response of 100,000 particles with all of the objects in the game world – this would be extremely difficult to implement in an efficient manner!

In this chapter, we will investigate how to implement a particle system on the GPU that provides interaction with the objects inhabiting the game world. This system will include the physical simulation of the particles as well as the rendering process used to present the results of the simulation. The implementation operates on the GPU with minimal intervention from the CPU, and utilizes some of the API features that are new to Direct3D 10.

Figure 1 Sample Particle System.png
Figure 1: Sample fireworks particle system.

Particle Systems Background

A typical particle system consists of two basic elements: a simulation step to update the state of the particle system and finally a rendering step to present the results of the simulation. This allows the system to be updated independently of the rendering sequence, and allows the concept of a particle system to be generalized into a series of ‘rules’. Each system has a set of rules used to update each particle, and a set of rules used to render the particle.

In this chapter we will be adding an intermediate step in between these two to allow interaction with the particle system. A collision detection and response step to allow the particles to respond to its surroundings. This step also follows the ‘rules’ generalization – we will specify how the particle system is going to detect and respond to a collision with a game object.

Each particle in a particle system is normally represented by a single point in world space. For every frame, the simulation phase updates the position and velocity information of each particle according to the simulation rules of the system being modeled. For example, in a smoke particle system the particles will drift upward in a semi-uniform manner, while a spark particle system would move the particles very quickly in a small cone around the direction of the spark spray.

Once the particles have been updated, their new states are then used to determine if any collisions have occurred between the previous time step and the current time step. If a collision has occurred, then a collision response can be applied to that particle. This collision detection and response is used to model some physical properties of the phenomena that the particle system is representing. An example of this would be the difference between smoke wrapping around an object as opposed to a spark bouncing off of an object and flying in the opposite direction. This could also be specialized based on the object that the particle collides with as well. For example, if a spark hits a wall it would bounce off, but if it hits a bucket of water it would be extinguished and cease to exist.

Specialty state information specific to a given particle system can also be stored and updated for each particle. Examples of this type of specialty information could be an animation state, transparency levels, or particle color. More sophisticated rendering techniques can be implemented using these additional state data.

Once a particle has been updated, it is typically rendered as a more complex primitive than a simple particle. A screen aligned quad (pair of triangles as shown in Figure 2) with a suitable texture applied for the given particle system is often used to provide additional complexity to the system without incurring large rendering speed penalty. Of course, this is not the only representation available - the particles could also be rendered as individual particle primitives, more complex geometry, or even animated meshes. However, most systems utilize the screen aligned quads or some similar technique for rendering due to the large number of renders that need to be performed. The expansion of particles into textured quads is visualized in Figure 2.

Figure 2 Particle Representations.png
Figure 2: Various particle representation: points, screen aligned quads, and textured quads.

When using screen aligned quads, the rules of the system for rendering amount to designating a texture to be applied to them. This allows flexibility for implementing different particle systems by simply changing the texture used during the rendering process. For example, using a sphere texture to simulate round particles or using a smoke puff texture to simulate portions of a cloud or a smoke plume. Other potential rules for rendering a particle system can include the use of transparency, depth sorting, and various alpha blending modes to achieve special effects.

All of the example particle systems mentioned above (such as smoke systems and spark systems) can be represented in the same framework by simply changing the rules that are used to update the particle's state, how it responds to collisions, and how it is rendered.

Implementation Difficulty

The main difficulty with implementing this type of interactive particle system is the large amount of CPU processing that is required. For each frame to be rendered, the CPU must iterate through an array of particles and test each particle for collisions with other objects in the game world, and then update the particle state accordingly. This type of operation may seem suited to modern multi-core CPUs, and indeed they are. Each particle can be updated independently of each other, making each particle able to be updated in separate threads running on multiple processors in isolation. However, modern CPUs typically are only dual or quad-core with octo-core CPUs likely to come sometime in the future. Even if the particles do not interact with any objects in the environment, the CPU would be required to perform a significant number of operations every frame – basically monopolizing the available processing time available for other operations.

In contrast to a general purpose CPU's serialized performance in these types of tasks, modern GPUs have been specially designed to perform multiple similar computations at the same time on different data. Modern GPUs can have several hundred simple processing elements on them, making them very well suited to updating a particle system. This computational model just happens to be precisely what is required to update a particle system. With this in mind, if we move a particle system onto the GPU then we could leverage its parallel processing power to create more attractive and realistic effects that can be interactive with the game world.

GPU Based Particle Systems (D3D9)

In fact, this type of system has been implemented on the GPU with Direct3D 9 level hardware. The typical implementation uses a pair of render targets to store particle state information. Each frame would render a full screen quad and read the particle information from the render target in a pixel shader texture read. That information was used to calculate the new state of the particle, which was subsequently written to the second render target. Then a set of four vertices is rendered for each pixel (usually using instancing) to read the information from the render target and draw the particle in the appropriate position on screen. Then, on the next frame, the two render targets are swapped and the process is repeated. The flowchart for this updating and rendering process is depicted in Figure 3 below.

Figure 3 Render Target Particle System.png
Figure 3: Particle system using render targets for particle storage.

This technique does indeed work well and moves the computational load off of the CPU and onto the GPU, but is somewhat limited with respect to creating new particles or removing old ones from the system. Since the particles are stored in a render target they can only be accessed through a texture sampler or by manually locking and editing the render target data, which can become a performance bottleneck. In addition, the amount of information available to store per particle is limited to a single four component vector unless multiple render targets are used. Finally, since each pixel of a render target is considered a particle, it is necessary to update all pixels of the render target with a full screen quad since the CPU doesn't know how many of the particles are currently active. This process can be optimized somewhat with a stencil pre-pass to eliminate the unused pixels early on in the simulation step. The inactive pixels can be culled away, but it still requires additional processing to determine that they aren't active.

This era of particle systems have also used some forms of interactivity as well [Latta04]. These typically used scene depth information to approximate collision detection. This system is functional, but it would be desirable to have a system that could produce collisions from any direction around an object in the scene.

GPU Based Particle Systems (D3D10)

Even though D3D9 techniques were a huge improvement over CPU only particle system, its utilization of a somewhat inefficient update mechanism and its limited collision detection system limits the amount of work that can be done during the processing of each particle. However, with D3D10 hardware we can use a more efficient method of storing/retrieving, updating, and rendering the particles in our particle system.

The additional pipeline stages added in Direct3D 10, specifically the geometry shader and the stream output stage, provide the tools to implement an efficient technique to manage the lifetime of a particle. Since the geometry shader and stream output stages work with vertex information prior to the rasterization stage, the particle state information can be stored as a vertex instead of a pixel/texel. This provides the nice benefit that you can store as much per particle data as can fit within a single vertex declaration, which is up to 16 four component vectors per vertex! Figure 4 provides a flow chart demonstrating how this updated technique operates.

Figure 4 Vertex Buffer Particle System.png
Figure 4: Particle system using vertex buffers for particle storage.

With the efficient updating mechanism of this technique, additional work can be performed during the particle update phase at a relatively low performance penalty. In particular, it is possible to make the particle system interact with the other objects that inhabit the game world in a parametric form. This capability would add significantly to the realism of the scene. For example, it would be feasible to have a smoke plume wrap around an animated character or snow fall appropriately onto a landscape entirely in real time on the GPU.

To demonstrate how to implement an interactive particle system, we will extend one of the samples provided in the DirectX SDK. In the installation directory of the DirectX SDK, there is a Direct3D 10 sample program named ParticlesGS that demonstrates how to implement a fireworks particle system using stream output and the geometry shader to control the update and lifespan of each particle in the system.

The particles are represented as vertices in two vertex buffers which alternate between frames as the source or destination vertex buffers. For each particle system update, the current source vertex buffer is rendered to the geometry shader, which then determines how to update each type of particle. Once the vertex is updated, the geometry shader emits the updated particle along with an appropriate number of new particles or if the particle is considered 'dead' the geometry shader will not emit a new vertex, effectively removing that particle from the system.

In this chapter, we will modify this sample to make the particles in the system interact with a series of application specified and controlled shapes. Multiple shape types can be supported in a single pass, while still retaining operation entirely on the GPU with only CPU guidance. Figure 6 demonstrates the modified algorithm flow chart for the interactive particle system.

Figure 5 Modified Vertex Buffer Particle System.png
Figure 5: Modified algorithm flow chart for this chapter's demo.

Algorithm Theory

To create an interactive particle system, we first need to define how we will represent the world objects while performing collision detection with the particle system. The most accurate solution would be to use the full triangle meshes of each of the game objects for all of the collision tests with each particle. Even though this would be the most accurate, it would also be the slowest technique while running due to the huge number of particle-to-triangle collisions carried out for each particle.

As a side note, in many cases physics engines provide these types of collision primitives for use in their physics simulation. These primitives could be read from the physics API objects and fed directly into this particle system for a seamless integration of the rendering effects and the true game entity states.

Instead, we can simulate our complex game objects with simpler representations that are easier to perform collision detection on. Some simple objects with respect to collision detection would be a plane and a sphere, but other objects could be used as well, such as boxes, ellipsoids, or other sphere swept volumes like lozenges or capsules. For the purposes of this sample, we will implement spheres and planes as our collision models and provide information for how to extend the system later on with additional shapes.

The representation of a sphere in these samples will consist of a three dimensional position and a radius for a total of four floating point values. Similarly, we can also represent a plane by a four component vector, where each component represents one of the coefficients of the plane's equation. These representations will allow for a compact shape description that will fit into a single four component shader constant parameter for use in the geometry shader. Utilizing these compact representations means that the maximum number of collision primitives that can be used in a single geometry shader invocation is the maximum number of constants in a constant buffer in Direct3D 10 – allowing for a whopping 4,096 collision objects!

For each update sequence, the geometry shader will use these shader parameters to check each of the incoming particles to see if it has collided with the corresponding game object. If so, it will modify the particle's state according to the system rules (by modifying its position and velocity) and then emit the particle and move on to the next one. Utilizing shader constants to represent the game world objects will allow for multiple objects to be tested with a single invocation of the geometry shader, which provides an efficient manner for updating the particles even with several objects in the nearby scene.


As discussed in the background section, this algorithm utilizes the geometry shader and the stream output functionalities to update the state of the particle system. In order to use the stream output stage to perform this task, Direct3D requires the following items:

  • A Geometry Shader for outputting vertices to the Stream Output stage
  • A source vertex buffer to store current particle system state
  • A destination vertex buffer to receive updated particle system state

When using effect files, the geometry shader must be compiled using a different function than the normal 'compile' directive within a technique definition. Instead, the 'ConstructGSWithSO' directive must be used. This function specifies the compilation target, the geometry shader name, and the output format to be streamed out as shown here:

GeometryShader gsStreamOut = ConstructGSWithSO( CompileShader( gs_4_0, GSAdvanceParticlesMain() ),
";; TIMER.x; TYPE.x" );

The two vertex buffers used to hold the intermediate results of the simulation in alternating frames must also be created with the stream output bind flag in their buffer description as shown here:

vbdesc.BindFlags |= D3D10_BIND_STREAM_OUTPUT;

Once these items have been created, updating the state of the particle system is performed by rendering the current source vertex buffer into the current destination vertex buffer. The source vertex buffer (which is the result of the previous update) is set in the input assembler stage to be sent to the rendering pipeline. To stream these updated vertices out to the destination buffer, we simply need to set the appropriate destination buffer in the stream output stage as shown here:

pd3dDevice->SOSetTargets( 1, pBuffers, offset );

The current number of vertices available in the source buffer is unknown to the application since particles are both created and destroyed in the geometry shader according to the rules of the particle system (although there is a query available to obtain the number of vertices streamed out: D3D10_QUERY_SO_STATISTICS). Because of this, we need to use the 'DrawAuto' device member function instead of the normal 'DrawXXX' calls. The 'DrawAuto' function utilizes internal vertex buffer state to determine the number of vertices in the vertex buffer, allowing the stream output results to be rendered without any intervention from the CPU.

With the basic framework setup, we can now focus on making this particle system interactive with the other game objects in the current scene. In the particle system update geometry shader, there is a generic particle handler function (named GSGenericHandler) defined for all of the different particle's motion to be updated. We are going to intervene prior to this update process to determine if the current particle has collided with one of the game objects specified by the application.

To perform these collision tests, the application must set the shader constants for this update function to use. Depending on the number of collision objects that you want to use, you can either set these items individually or as an entire array of shader constants. The shader constants are declared within the effect file in a single constant buffer as shown here:

cbuffer cbShapes
    float4 g_vPlanes[3];
    float4 g_vSpheres[1];

As mentioned previously, the number of each of these objects can be changed as needed. The geometry shader uses these object definitions in the collision tests. This process is shown here first for the plane objects:

// Test against plane objects
for ( int i = 0; i < 3; i++ )
    float fDistToPlane = dot( g_vPlanes[i].xyz, input.pos );
    if ( g_vPlanes[i].w > fDistToPlane )
        input.vel = reflect( input.vel, g_vPlanes[i].xyz );

First the signed distance from the particle to the plane is calculated by taking the dot product of the first three coefficients (a, b, and c) of the plane equation. This is equivalent to projecting the point's position onto the normal vector of the plane. This distance is then compared to the fourth coefficient of the plane equation (d). If the distance is smaller than d, then the point has intersected the plane. This intersection test can be thought of as testing which side of the plane the particle is currently on. If it is on the negative side of the plane, then the velocity of the particle is reflected about the plane's normal vector. Next, the particle is tested against any of the loaded sphere objects as shown here:

// Test against sphere objects
for ( int i = 0; i < 1; i++ )
    float fDistToSphere = distance( g_vSpheres[i].xyz, input.pos );
    if ( g_vSpheres[i].w > fDistToSphere )
        input.vel = reflect( input.vel, 
        normalize( input.pos - g_vSpheres[i].xyz ) );

Here we calculated the distance of the particle from the center of the sphere and compare this distance to the radius of the sphere. If the radius is larger than the distance to the particle, then the particle is intersecting the sphere. Its velocity is then reflected along the vector from the sphere center to the particle.

In both of these cases, the collision reactions only modify the direction of the velocity of the particle. This is not a physically true collision reaction since the position is not modified, but for the purposes of a particle system it is sufficient to simply modify the direction of the velocity and allow the following update to move the particle into the correct position.

Once the particle system has been updated and the collision detection and response has been applied, we can finally render the particles. The rendering is carried out in all three programmable stages of the rendering pipeline. The vertex shader sets the color of the particle based on what type of particle is currently passing through it. Next, the expansion of the particle into a quad is carried out in the geometry shader. This is shown below:

// Emit two new triangles
for ( int i=0; i<4; i++ )
    float3 position = g_positions[i]*input[0].radius;
    position = mul( position, (float3x3)g_mInvView ) + input[0].pos;
    output.pos = mul( float4(position,1.0), g_mWorldViewProj );
    output.color = input[0].color;
    output.tex = g_texcoords[i];

The quad is constructed by generating four new vertices. The vertex position for each of these is calculated from a constant array of positions that get scaled by the input vertex radius and then transformed by the inverse of the view matrix, locating the vertices in world space right and aligned with the viewing plane. Finally, the world space particle position is added to the transformed vertices and the final position is found by transforming them to clip space. The end result is four vertices per particle, aligned with the camera, arranged as a triangle strip. The triangle strip is then passed on to the rasterizer and ultimately to the pixel shader, where a standard texture lookup is performed.

This sample implementation provides the sphere and plane collision objects, but it could also be easily extended to include additional shape types. The process would involve performing the actual collision test and providing the corresponding collision response once the collision has been detected. The sample could be further extended with special effects by changing the rules used to update the particle states. It would be possible to implement special collision responses that depend on the particle type as well as the shape object that is collided with. For instance, a particle could be terminated after hitting a shape instead of reflecting off of it.

Alternatively, modifying the color of the particle or the opacity/transparency of the particle based on the number of collisions could provide some interesting effects as well. There is a significant amount of flexibility in how to make the particle react. The responses are ultimately only limited to actions that can alter the state information stored for each particle.


Demo Download:

The modified SDK sample provided with this article can be used as an indication of the level of performance that can be attained for a given number of collision objects and particles. The performance of the particle system can be scaled with the performance of the GPU on the current system by using a larger or smaller number of particles. The application program simply needs to load the appropriate shader constants, configure the rendering pipeline with the correct vertex buffers, and execute the 'DrawAuto' function and the particle system is updated for one frame. Since the CPU usage is minimized, the system is more or less only dependent on the GPU.

Another very interesting note regarding this particle system is that the particles are not homogeneous. The 'ParticlesGS' sample provides several different particles identified by a type identifier variable stored for each vertex. With an identifier, a specialized update, collision detection and response, and rendering sequence can be performed based on the particle type. The implication of this fact is that all particle systems in a given scene can be implemented in a single pair of vertex buffers and updated all at the same time in a single rendering sequence!

This means that several different smoke, explosion, spark, cloud, and dust particle systems could all exist and operate within the same particle system – the only requirement is that there is an appropriate handler function defined for each particle type and each update sequence. This could lead to very efficient 'global' particle systems that are updated once per frame, with very little interaction from the CPU.


This chapter has examined and demonstrated how interactive particle systems can be implemented primarily on the GPU, freeing up the CPU to perform other activities. The system is provided with the capability of using multiple collision shapes and can easily be extended to incorporate additional shapes. The particle update paradigm used in this chapter (from the ParticlesGS sample in the DXSDK) provides an efficient and useful technique suitable for most types of particle systems.