From GPWiki
Jump to: navigation, search

Dynamic Patch Tessellation

In real-time 3D simulations, meshes with low triangle counts are often used to gain performance by not taxing the vertex processing units and the data bus to their limits. However, cutting on triangle count will make the meshes look coarse in their edges, which is a visual artifact that strikes the casual observer’s eye as one of the first things to distinguish a computer-rendered image from a photograph.

To enhance the apparent fidelity of a mesh with a low triangle count, a technique called “tessellation” can be utilized. Tessellation is defined as an algorithm that splits the original geometry primitives into smaller primitives and applies a curve algorithm to them that better approximate the curvature of the original primitive as defined by its normal and the normal of its original adjacent primitives.

The curve algorithm can be linear (defined by first-degree polynomial), cubic(second-degree polynomial), quadratic (third-degree polynomial) or some other curve as long as the evaluation behaves in the same way – that is, the function gives a point based on the control points, given a parameter in 0…1 domain.

A position relative to a triangle can be specified using so-called barycentric coordinates, which are coordinates in a space formed by the triangle edges. The first vertex of a triangle corresponds to barycentric coordinate [1,0,0] since the weight of the first vertex to the value is 1 and the other vertices’ weights are 0. In a same fashion, a barycentric coordinate of [0.5,0,0.5] is situated between the vertices 0 and 2, since their weights are of equal importance to the return value.

A given barycentric coordinate is inside a triangle if its components summed equal to 1. Therefore, if we were to define a barycentric coordinate as two components, U an V, the third weight W would have to be equal to 1-U-V.

First, let’s take a look at a simple tessellator that takes in a triangle with positions and normals defined in the vertices. The following geometry shader code splits the input triangle into four parts – one triangle for each corner, and a center triangle, vertices of which are situated between the original triangle vertices:

//Flat tessellation geometry shader.
//This shader will subdivide the original triangle to 
//4 smaller triangles by using linear subdivision.
//Do note that we interpolate the coordinates in world space,
//and transform them to view*projection space after.
//The vertex input is assumed to be in world space.
 
[maxvertexcount(12)]
void FlatTessGS( triangle GSPS_INPUT input[3], inout TriangleStream<GSPS_INPUT> TriStream )
{
    GSPS_INPUT output = (GSPS_INPUT)0;
 
    //precalculate the edge median vertices, since they get 
    //used in more than one subtriangle:
    float3 medPosv0v1 = (input[0].Pos.xyz + input[1].Pos.xyz) / 2.0;
    float3 medPosv0v2 = (input[0].Pos.xyz + input[2].Pos.xyz) / 2.0;
    float3 medPosv1v2 = (input[1].Pos.xyz + input[2].Pos.xyz) / 2.0;
 
    //also interpolate normals and normalize them:
    float3 medNormv0v1 = normalize(input[0].Norm.xyz + input[1].Norm.xyz);
    float3 medNormv0v2 = normalize(input[0].Norm.xyz + input[2].Norm.xyz);
    float3 medNormv1v2 = normalize(input[1].Norm.xyz + input[2].Norm.xyz);
 
    // Now, we'll construct the new triangles in
    // the following pattern:
 
    /*    V1
         /\
        /  \
    M01/____\M12
      / \  / \
     /___\/___\
    V0   M02   V2
 
    */
 
    //The M01, M02 and M12 stand at the midpoints
    // of the edges.
 
 
    //Output first subtriangle. This is the triangle 
    //that touches the v0 vertex and reaches the other 
    // vertices half-way: 
    output.Pos = input[0].Pos;
    //transform to view space:
    output.Pos = mul(output.Pos, View ); 
    //and to projection space:
    output.Pos = mul( output.Pos, Projection);
    output.Norm = input[0].Norm;
    TriStream.Append(output);
 
    output.Pos = float4(medPosv0v1, 1.0);
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);
    output.Norm = medNormv0v1;
    TriStream.Append(output);
 
    output.Pos = float4(medPosv0v2, 1.0);
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);
    output.Norm = medNormv0v2;
    TriStream.Append(output);
 
    TriStream.RestartStrip();
 
 
    //Output second subtriangle. This is the triangle
    // that touches v1 and reaches halfway to v0 and v2. 
    output.Pos = float4(medPosv0v1, 1.0);
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);	
    output.Norm = medNormv0v1;
    TriStream.Append(output);
 
    output.Pos = input[1].Pos;
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);
    output.Norm = input[1].Norm;
    TriStream.Append(output);
 
    output.Pos = float4(medPosv1v2, 1.0);
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);
    output.Norm = medNormv1v2;
    TriStream.Append(output);
 
    TriStream.RestartStrip();
 
 
    //third subtriangle:
    output.Pos = float4(medPosv0v1, 1.0);
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);	
    output.Norm = medNormv0v1;
    TriStream.Append(output);
 
    output.Pos = float4(medPosv1v2, 1.0);
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);
    output.Norm = medNormv1v2;
    TriStream.Append(output);
 
    output.Pos = float4(medPosv0v2, 1.0);
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);
    output.Norm = medNormv0v2;
    TriStream.Append(output);
 
    TriStream.RestartStrip();
 
 
    //fourth subtriangle:
    output.Pos = float4(medPosv1v2, 1.0);
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);
    output.Norm = medNormv1v2;
    TriStream.Append(output);
 
    output.Pos = input[2].Pos;
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);
    output.Norm = input[2].Norm;
    TriStream.Append(output);
 
    output.Pos = float4(medPosv0v2, 1.0);
    output.Pos = mul(output.Pos, View );
    output.Pos = mul( output.Pos, Projection);
    output.Norm = medNormv0v2;
    TriStream.Append(output);
 
    TriStream.RestartStrip();
}

While the linear subdivision can already help achieving smoother lighting by introducing more normals, the resulting positions are left essentially untouched. This means that the geometry edges are still just as coarse as they were before the subdivision.

We can alleviate this problem by using a more sophisticated interpolation method when generating the mid-edge vertices. A quadratic Bézier curve is ideal for our purpose, since we have per-vertex data for two curve endpoints and we can calculate two tangents of the edge from the vertex normals.

Bézier curve is a commonly-used curve type in graphics programming. A quadratic Bézier curve takes four points; start and end, and two tangent points. In addition, it takes a parameter varying from 0 to 1, which specifies the position on the curve that we want. The following formula describes a third-degree curve, where v0 and v1 are the curve endpoints, t0 and t1 are the tangent points and p is the parameter across the curve:

B(v0, t0, t1, v1, p) = (1-p)3v0 + p3v1 + ((1-p)2p)t0 + (p2(1-p))t1

The function is equivalent of performing a linear interpolation of two linear interpolations:

B2(v0, t0, t1, v1, p) = lerp(lerp(v0, t0, p), lerp(t1, v1, p), p)
Pvgps6 4 quadbezier.png

The tangents are illustrated here as lines from the vertices. However, in reality the tangent points are ordinary points in space just like the vertices.

A lot more information about Bézier curves can be found on the Internet; in particular, Wolfram MathWorld web site (mathworld.wolfram.com) is an excellent place to study formal descriptions of mathematical concepts.

For the curvature reconstruction in a triangle domain, an algorithm called N-Patches [Vlachos01] is commonly used. A N-patch is a Bezier surface of the third degree, as it has four control points affecting its shape along each edge and one point to control the midpoint of the patch parameter domain. That is, the edges are interpolated as quadratic Bézier curves.

A triangular N-patch is formally defined as ten control points; one for each of the original vertices of the triangle, six to represent the tangent vectors of the edges, and one for the center of the patch. Two N-patches calculated from adjacent triangles have a closed surface across their shared edge, as their tangents are continuous on the edge.

In the implementation presented in this chapter, the patch center isn’t evaluated so that leaves us with the vertices and their tangents. In the following illustration, the vertices are marked V0, V1 and V2, and the tangents Txx where xx denotes the tangent direction; for example, T01 is from vertex 0 towards vertex 1:

Pvgps6 5 patchtan.png

A simple Bézier curve evaluation function can be implemented as follows. It also constructs the tangent points based on the normals – a handy way to handle our vertex data:

//Evaluate a Bezier curve given two endpoints and two corresponding normals.
//This is a helper function used in the Bezier 
//tessellation shader.
//This is a general method; since in the parent GS we only use the
// value 0.5 as the parameter, we could pre-calculate the 
//weights in advance for optimization.
 
float3 EvalBezierN(float3 v0, float3 v1, float3 n0, float3 n1, float p)
{
    float oneMinusP = 1.0f-p;
 
    //Calculate the tangents:
    float3 t0 = 2 * v0 + v1 - dot(v1 - v0, n0)* n0; 
    float3 t1 = 2 * v1 + v0 - dot(v0 - v1, n1)* n1; 
 
    //We weight the positions and tangents with the
    //Bezier basis coefficients multiplied by powers of p:
 
    return oneMinusP * oneMinusP * oneMinusP 
	* v0 + 
	p * p * p * v1 +
	oneMinusP * oneMinusP * p * t0 +
	oneMinusP * p * p * t1;
 
    //It is possible to modify the weights dynamically so 
    //that the curve profile would change. 
    //This would enable per-vertex modification of the
    //smoothness of the resulting surface, and could be 
    //used for crease modeling. However, extra edge 
    //profile weights are omitted here for clarity.
    //Any weights of which sum equals 1 given p = 0...1 
    //(at least) could be used. 
}

Equipped with this function, we can interpolate the new vertices for the patch in such way that the curvature of the triangle, implied by the normals of the vertices, is taken into account. This results in much more accurate geometric edges than linear interpolation, and is visually more pleasing.

The following snippet shows how we can generate the midpoint vertices using Bézier interpolation:

//Bezier patch tessellation shader.
//The output is structured exactly the same way as in the flat 
//tessellation shader, but the median points
//are interpolated quadratically over a n-patch rather than 
//linearly.
[maxvertexcount(12)]
void BezierTessGS( triangle GSPS_INPUT input[3], inout LineStream<GSPS_INPUT> TriStream )
{
    GSPS_INPUT output = (GSPS_INPUT)0;
 
    //We evaluate positions along the Bezier patch edges 
    //by using our helper function defined earlier:
    float3 medPosv0v1 = EvalBezierN(input[0].Pos.xyz, input[1].Pos.xyz, input[0].Norm, input[1].Norm, 0.5f);
    float3 medPosv0v2 = EvalBezierN(input[0].Pos.xyz, input[2].Pos.xyz, input[0].Norm, input[2].Norm, 0.5f);
    float3 medPosv1v2 = EvalBezierN(input[1].Pos.xyz, input[2].Pos.xyz, input[1].Norm, input[2].Norm, 0.5f);
 
    //We'll also interpolate the normals along the 
    //patch edges for better quality:
    float3 medNormv0v1 = normalize(EvalBezierN(input[0].Norm, input[1].Norm, input[0].Norm, input[1].Norm, 0.5f));
    float3 medNormv0v2 = normalize(EvalBezierN(input[0].Norm, input[2].Norm, input[0].Norm, input[2].Norm, 0.5f));
    float3 medNormv1v2 = normalize(EvalBezierN(input[1].Norm, input[2].Norm, input[1].Norm, input[2].Norm, 0.5f));
 
    //In a similar way, we can interpolate any input data,
    //including texture coordinates.
 
 
    //From now on, exactly the same code as in the flat 
    //tessellation shader. We just output the new triangles.
 
    //... omitted for clarity ...
}

If we would deem individual edges to be straight, they wouldn't need to be tessellated. Therefore, we could change the tessellation pattern in per-triangle basis to conditionally output less triangles. Additional tessellation patterns are omitted here for brevity, but could be implemented as a switch statement with four distinct cases - "all edges need split", "one edge needs split", "two edges need split", and "no edges need split" which would be the ideal case.

The edge straightness determination is trivial to implement by observing the input positions, normals and their relations, and applying some desired treshold to the results for quality control. We could also measure the lengths of the edges, and if below a certain quality treshold, leave them as they are.

The following code performs a simple straightness check on a Bézier curve:

//this will return the maximum deviation from straightness of
// an edge.
//if the return value is 1, the endpoints and tangents form a 
//straight curve. if 0, the endpoints are oriented at a 
//right angle with regard to each other.
//if -1, the endpoints are oriented exactly opposite of 
//each other (which is rare in real geometry).
float Straightness(float3 v0, float3 v1, float3 t0, float3 t1)
{
    //take the directions of the tangents with regard to 
    //the endpoints:
    float3 t01 = normalize(t0 - v0);
    float3 t10 = normalize(t1 - v1);
 
    //then take the deviations from a straight line 
    //separately from both ends by taking the angles to
    // opposite endpoints:
    float dev01 = dot(t01, normalize(v1-v0));
    float dev10 = dot(t10, normalize(v0-v1));
 
    //and take the minimum curvature (or inverse 
    //maximum straightness):
    return min(dev01, dev10);
}

The tessellation pattern for the current triangle could then be selected by observing the straightness and length of its edges. This, in effect, is adaptive real-time patch tessellation – if the triangle is straight in curvature or sufficiently small already, it doesn’t need extra vertices.

The following illustration shows the four unique cases for the edge split pattern. Note that the actual number of cases is 2+6, since cases 1 and 2 create different effective output based on the combination of input edge characteristics:

Pvgps6 6 patchsubdiv.png

For the dynamic tessellation to be effective, the vertices should be transformed to projection space by the vertex shader in order to determine the edges’ final (that is, screen-space length and straightness) by the time the tessellation criteria is evaluated. However, the transformation of new vertices is still best left to the geometry shader as it is much easier to interpolate the positions before the perspective transform.

As we can amplify primitives by generating more primitives to represent the original geometry, we can do the tessellation fully in the graphics hardware without CPU intervention. This, again, translates to more performance as the extra data resulting from the tessellation doesn’t have to be stored in intermediate system memory buffers or sent across the graphics bus every time the geometry is updated.

It is worth noting that there is a hard limit on how much data can be output by a single geometry shader invocation. The previous patch tessellation code outputs a maximum of four triangles per one input triangle (or 12 vertices), to adapt to this situation.

There is a way to recursively run the geometry shader by using a concept called Stream Out that can be used to loop the execution of the shader stages for re-processing of the data as needed. The workflow of using SO consists of creating an extra buffer for output geometry, creating the proper shader object with the ID3D10Device::CreateGeometryShaderWithStreamOutput method, binding the output buffer to the device by calling D3D10Device::SOSetTargets method and doing the data processing by drawing the geometry as normal or calling the ID3D10Device::DrawAuto method to use previously filled SO buffer without having to know how many primitives were generated in the previous pass of the shader.

Note that in the case of geometry amplification (such as tessellation), you must generally allocate more space to the output buffer than the amount of data in the input buffer. In our tessellation case, we need potentially 4 times as much memory for the output buffer than the input data size per pass. For two passes, this would be 16 times as much as the original geometry, and so on. However, the memory is allocated and managed on the graphics card so it doesn’t need to be downloaded and uploaded via the graphics bus.

Our tessellation shader would need a small modification to be usable in the passes between SO rendering. We would need to remove the transformations to world, view and projection spaces in the in-between shaders and apply them at the final pass only, just before the pixel shader that actually draws the stuff out, or only at the first vertex shader. It is trivial to derive the in-between passes from the current geometry shader (just remove the world, view and projection transforms from the intermediate VS and GS).


Geometry Displacement

To put the cherry on the top of dynamic tessellation cake, we could modify the tessellated geometry by sampling height values from a texture inside the vertex shader executed between the tessellation passes and translating the vertex positions with those values. This technique is often called “displacement mapping” because it displaces the geometry by the values specified in a texture map.

A simple way to implement the displacement is to apply the height values to the vertex positions at the final tessellation pass. However, this is inaccurate because the normals do not get modified to match the new positions due to the surface height change not being communicated between the inner tessellation passes. Hence, there is a need for a more accurate, mip-map based solution.

To create an optimal mip chain for the height map used in displacement mapping, we would need to separate the frequencies of the adjacent mip levels from each other. This prevents the same height values from being applied in more than one tessellation pass, and gives expected values for the geometry displacement. The process of extracting the frequencies from an ordinary, already filled mip chain would go something like this (somewhat metacode):

image mipLevels[numMips]; 
//it is assumed that “image” is a type that supports arithmetic
// operations for combining pixels with other image instances.
//Also, it is assumed that the range of values in the texture is
// centered at 0.0. If this is not the case, a simple biasing after the sampling is needed.
for (int currentMip = numMips; currentMip > 0; currentMip--)
{
    mipLevels[currentMip] -= mipLevels[currentMip - 1];
}
 
// now, the mip levels do not contain the higher-frequency data
// stored in the next-lower (in index) levels, and at usage time,
// no height value biasing is needed.

Alternatively to frequency separation, we could bias the height sampling by -0.5 (assuming the original height value range is 0 to 1) and scale the height contribution down with each pass so as to make the 0.5 texel value the neutral height - in which the original geometry wouldn’t change. However, the lower-frequency data would be applied (and over-emphasized) in each pass, thus resulting in less accurate reproduction of the height values in the result.

The frequency separation will give more accurate results, but simple biasing is more efficient and convenient. Also, depending on the height map being used, it is difficult to distinguish the appearances of these two methods. To alleviate the problem of the frequency bleeding, we could apply a gamma curve of 0.25 to the height values so that the higher mip levels (with low frequencies and values tending towards the center) would have less contribution to the final values overall.

To actually select the appropriate mip levels to sample from, we can use the current pass index multiplied by the minimum edge length treshold used in the geometry subdivision. Remember that our tessellation algorithm subdivides the triangles by two on two dimensions. Thus, the ratio of number of pixels in adjacent mip levels of a 2-dimensional texture is exactly the same as the ratio of our worst case (finest) tessellation, 1/4.


Patch Tessellation Wrap-Up

All in all, the technique of dynamic patch tessellation discussed herein is a very good way to increase the apparent fidelity of the geometry without the considerable overhead associated with CPU-based techniques. The programmer doesn’t need to feed any extra data to the pipeline in addition to the ordinary positions and normals (and other data as needed). When used with reasonable maximum per-patch recursion step counts and careful edge quality tresholds, this method of dynamic tessellation could be implemented in all key geometry of an application using Direct3D 10, without causing noticeable performance hit – providing greatly increased visual fidelity for relatively little cost.

As the original geometry can potentially be saved with less fidelity than usual, the savings in system memory and graphics bus bandwidth usage may quite possibly tip the performance balance to favor dynamic patching, too.

AUTHOR'S NOTE - Niko

To conclude this chapter, I would like to mention that the majority of it was written before there was concrete performance data available on geometry shaders on actual hardware. Since then, it has been determined by many independent parties (including Microsoft's DirectX team) that while tessellation is perfectly doable the way it is presented here, the performance and flexibility is not exactly good because the geometry shader introduces a relatively heavy additional overhead to the gpu's pipeline. To counter the situation, Direct3D 11 introduces a generalized tessellation model based on dedicated shader stages; our next revision of this book will reflect that and the description of "legacy" method of tessellation herein will be replaced by a practical description and sample code for the new architecture.