In this article I will try to explain how you can render your meshes on using a shader, on the GPU of your videocard. You might be wondering what the benefit is of this method. Let me explain.

## Skeletal animations

On my site I have code downloadable to display Milkshape3D meshes, using DirectX8. The milkshape meshes I use have bones. Also called skeletal animation, or skinned meshes. Those bones are connected to the 'skin' (faces) of the mesh. Just like your bones are inside body. When you bend your arm, your skin will follow your bone (luckily :P).

Every bone has it's own transformation. These transformations are also cumulative. If you have a mesh with a upper-arm bone, lower-arm bone and a bone attached to the hand, the hand will be placed at the offset calculated by using those three matrices. The lower arm will depend on two matrices, and the upper arm will only use it's own matrix.

You can easily google a bit more about the theory behind the skeletal animation. I won't explain that in detail here.

## Animating the mesh

Now, to animate our mesh we have to loop through all vertices and apply the (calculated) matrix transform belonging to it. Assume you have a mesh of 1500 poly's. Every time you update this mesh the game will recalculate the positions of this mesh based on the "initial" vertex positions, and copy those updated vertices to a new vertexbuffer (locking, unlocking, copying, uploading to the videocard...) - We can't overwrite our original vertices of course!

Needless to say, this is slow. I had about 5 birds of 600 poly's in my game, and the FPS went from 500-600 to 35! I realize afterwards that the code could be a bit improved on some points, but oh well: Even 100 fps would be low on my machine. Calculating the new vertex positions simply shouldn't be done on the CPU. A GPU is made to do that work!

To solve this problem, we will use a different type of vertex format. I won't simply paste full code here. My code is way too big and too integrated to simple be posted here. I'll explain what I changed, compared to the MS3D loader on my site.

## Using vertex blending

First, we have to change our FVF type. Note that this is a step that might be skipped. Vertex shaders don't use the FVF format anymore. I always thought you still needed them to make the vertexbuffer. WRONG! Simply pass 0 as the vertex type, and make sure your streaming format (see below) is accurate. If you do specify a FVF format, the DirectX debugger (in C++) will tell you every frame you're doing something unnecessary.

```  static const int MAX_MATRIX_INDEX=2;	//Maximum of 4 bones.
typedef struct
{
D3DXVECTOR3 p;
float		fWeights[MAX_MATRIX_INDEX];	//Weights
float		matrixBoneIndicies[MAX_MATRIX_INDEX];	//Weights
D3DXVECTOR3 n;
D3DXVECTOR2 uv;
} D3DBlendVertex_t;
#define D3DFVF_BLENDVERTEX (D3DFVF_XYZ | D3DFVF_TEX3 | D3DFVF_TEXCOORDSIZE2(2) |  D3DFVF_TEXCOORDSIZE2(1)
|D3DFVF_NORMAL | D3DFVF_TEXCOORDSIZE2(0) )```

I will explain this structure first. A bone has a weight, which 'says' how much the bone affects the faces around it. You can have multiple bones affecting a face (a maximum of 4 in DirectX I believe). Milkshape3D format only supports 1 bone/face, I believe. I just put it to two for an extra margin. It's probably something I will change back to one though.

Notice the order is of importance here. You can do it different if you're only using the VertexShader declaration, but I prefer to stick with the FVF order, as given in the SDK documentation.

The vertex structure above has a vertex position, space for 2 bone weights, 2 bone indices (will get back to that later), a normal, and one set of texture coordinates. We are cheating a bit with the floats of the bones, by using texture coordinates for it. Again, this is a FVF approach. You do not need this. If you reduce/increase the max bones, be sure to increase the TEXCOORDSIZE too.

Right now I will show you the proper shader declaration for this:

```  HRESULT hr;
LPD3DXBUFFER pCode;
dwBlendingShader = 0; //Global variable, just to make it easy to use.

// Assemble the vertex shader from the file
if( FAILED( hr = D3DXAssembleShaderFromFile( "vertexskinning.vsh", D3DXASM_DEBUG , NULL, &pCode, NULL ) ) )
hr += 0;

DWORD dwVertexDecl[] =
{
D3DVSD_STREAM( 0 ),
D3DVSD_REG(D3DVSDE_POSITION,  D3DVSDT_FLOAT3),
D3DVSD_REG(D3DVSDE_BLENDWEIGHT, D3DVSDT_FLOAT2), //blending weight
D3DVSD_REG(D3DVSDE_BLENDINDICES, D3DVSDT_FLOAT2), //matrices
D3DVSD_REG(D3DVSDE_NORMAL,    D3DVSDT_FLOAT3),
D3DVSD_REG(D3DVSDE_TEXCOORD0, D3DVSDT_FLOAT2),
D3DVSD_END()
};

pCode->Release();```

This is just some general code for making the shader. I might write an article about setting up shaders soon, as well.

The important part for this article is the dwVertexDecl declaration. This is the vertex shader version of the FVF. If this declaration does not match your shader code, it might error out. DirectX8 in C++, when using debug mode and debug information enabled, will give you the error messages and which line they are located. I believe D3DXAssembleShaderFromFile can also return the error in a string for you.

As you can see, the setup is quite similar. Every D3DVSD_REG simply says: location, size. Just like an FVF. Also note that it simply matches our vertex structure where I started this article with!

Note: I've heard on GameDeve.net that some videocards expect the registers (D3DVSDE_POSITION,D3DVSDE_BLENDWEIGHT) to be sequential. Right now D3DVSDE_TEXCOORD0 equals 7. In the vertex shader the order would now be:

```  v0 = position
v1 = blending weights
v2 = blending indices
v3 = normal
v7 = texture coordinates.```

You might want to change TEXCOORD0 to v4. I probably will.

Okay, we got our shader constant. This is just a DWORD. This DWORD is just a low number starting at 1, telling DirectX which stored vertex shader to use.

## Back to the bones

Back to our bones. I assume you know how to get the bones from your 3D model. Whatever format it is. I'm using my Milkshape3D loader at my site as reference.

Here is the code that will store the proper information in the new format. I;m using the earlier given vertex structure.

```  /* Create Vertex blending vertices */
/* which we can use in our shader. */
m_arrBlendVertices[dwNumVertices].p[0] = pVertex->vertex[0];
m_arrBlendVertices[dwNumVertices].p[1] = pVertex->vertex[1];
m_arrBlendVertices[dwNumVertices].p[2] = pVertex->vertex[2];
m_arrBlendVertices[dwNumVertices].n[0] = pTriangle->vertexNormals[k][0];
m_arrBlendVertices[dwNumVertices].n[1] = pTriangle->vertexNormals[k][1];
m_arrBlendVertices[dwNumVertices].n[2] = pTriangle->vertexNormals[k][2];
m_arrBlendVertices[dwNumVertices].uv[0] = pTriangle->s[k];
m_arrBlendVertices[dwNumVertices].uv[1] = pTriangle->t[k];
//New:
m_arrBlendVertices[dwNumVertices].fWeights[0] = 1.0f;
m_arrBlendVertices[dwNumVertices].matrixBoneIndicies[0] = pVertex->boneId;
/* end */```

fWeights is just 1.0f. This means that this bone fully influents the vertices, and there is no other one. Using more will result to better results (quite better!) but it also adds more calculations. And your modeler needs to spend more time on weighting too. Although I've also read that it is theoretically possible to do weighting calculation yourself. If someone wants to write a nice article about that, you'd certainly be welcome!

The pVertex->boneID is just the ID of the current vertex we are storing. You could easily put this code in my loader. Don't forget to add these two then, though:

```  /* Blending purposes : */
LPDIRECT3DVERTEXBUFFER8 m_pBlendVertexBuffer;
std::vector<D3DBlendVertex_t> m_arrBlendVertices;
/* End blending  */```

Note: Don't delete your original vertexbuffer. We still need it for static geometry! Since those do not have bones, and need a different shader to work with. My method is currently to determine if a mesh has bones, if yes, load the model into the Blendvertexbuffer, if not, load it into the static buffer.

To create your new vertexbuffer, simply use code like this:

```  dwVertexSize = D3DXGetFVFVertexSize(D3DFVF_BLENDVERTEX);
hr = m_pDevice->CreateVertexBuffer(dwVertexSize * dwNumVertices, D3DUSAGE_WRITEONLY,0, D3DPOOL_DEFAULT,
&m_pBlendVertexBuffer);
//We have to copy all the verts in here now..
D3DBlendVertex_t *pVertices = NULL;
hr = m_pBlendVertexBuffer->Lock(0, 0, (BYTE **) &pVertices, D3DLOCK_NOSYSLOCK );
if (SUCCEEDED(hr)) {
memcpy(pVertices, &m_arrBlendVertices[0], (sizeof(D3DBlendVertex_t) * dwNumVertices));
hr = m_pBlendVertexBuffer->Unlock();
}```

Note: I specified 0, no FVF is used to create this shader.

To update the model animation, I have a function called SetTime(time). This function only has to calculate the correct matrices now, which it does already. Have a look at it, if you are interested. It will perfectly calculate the right transforms, taking into account parent bones as well. I've added a few extra lines, to store the complete transformation in the bone structure. This way, we don' have to recalculate the transform every time the frame is rendered.

Remember: Vertex shaders do not change the data stored in memory, everything is happening on the videocard!

```  //Find:
D3DXMatrixMultiply(&m_arrBones[i].matWorldAnim, &m_arrBones[i].matObjectAnim,
&m_arrBones[nParentJoint].matWorldAnim);

temp = m_arrBones[i].matWorldInv;
D3DXMatrixMultiply(&temp,&temp, &m_arrBones[i].matWorldAnim);
D3DXMatrixTranspose( &matTrans, &temp);
m_arrBones[i].matWorldInvTransposed = matTrans;```

As you see, we store the matrices in a new member of the bone structure. Just add it manually yourself. Also note that you can comment out the "updatevertexbuffer". We don't need to update our vertexbuffer anymore. The shader will nicely calculate our vertices for us.

Rendering our vertices:

```  D3DXMATRIX matWorld;
m_pDevice->GetTransform(D3DTS_WORLD,&matWorld);

D3DXMATRIX modelViewProjection = matWorld * matView  * matProjection;
D3DXMatrixTranspose( &modelViewProjection, &modelViewProjection );

//Do we have bones?
if (m_arrBones.size() > 0) {

D3DXVECTOR4 vRequired = D3DXVECTOR4(4.0f,8.0f,0.0f,0.0f);
D3DXVECTOR4 vZero = D3DXVECTOR4(0.0f,0.0f,0.0f,0.0f);

//Loop through all bones and set the constant in the shader:
for (int i = 0; i < m_arrBones.size(); i++)
{
}

m_pDevice->SetStreamSource(0, m_pBlendVertexBuffer, D3DXGetFVFVertexSize(D3DFVF_BLENDVERTEX));
} else {
m_pDevice->SetStreamSource(0, m_pVertexBuffer, D3DXGetFVFVertexSize(D3DFVF_VERTEX));
}```

Okay. Quite some code. I'm using GetTransform to get the current world matrix. This isn't real nice, and could simply be improved. Then I calculated the modelViewProjection, which we need to transform our vertices to worldspace -in our shader-.

Improvement: You only have to calculate the viewmatrix*projection matrix once in a frame. It should only change during camera movement (input). We set this shader in our shader using SetVertexShaderConstant. It will be stored in register c0 - c3.

Now comes the 'if'. Check if we render a model with bones, or not. The one without bones simply uses a shader which only transforms the coordinates, and sets the texture coords. (you could add lighting calculations in the shader). The version with bones adds a few more constants. the vRequired one is used inside the shader. I'll explain it later. It is stored in c4. Same for vZero (stored in v5).

Then we loop through ALL our bones and put the matrix in the right position. For simplicity, I start at position 8 (could start at 6) and increase with steps of 4. (this is needed because the matrix will take 4 constant entries, every time.). These is the array the vertex indices will point to. The boneindex in the vertex!

Then we set the stream source:

` m_pDevice->SetStreamSource(0, m_pBlendVertexBuffer, D3DXGetFVFVertexSize(D3DFVF_BLENDVERTEX));`

I'm not sure if the last one (size) is necessary. I might change this later in this article.

Then we simply use the original rendering code from the MS3D loader (in DrawSubset, no changes there):

```  DWORD dwVertexCount = m_arrSubsets[dwSubset].VertexCount / 3;
m_pDevice->DrawPrimitive(D3DPT_TRIANGLELIST, m_arrSubsets[dwSubset].VertexStart, dwVertexCount);```

This should render the modeled animation for you right away, after we get the shader ready.

Before we can put something on the screen, we still need to create our shaders.

```vertexskinning.vsh

vs.1.1

//    c0-c3 = combined model-view-projection matrices
//    c8-c? = bones

// v0 = vertex
// v1 = bone weight
// v2 = bone index (zero based + 4 needed for world)
// v3 = normal
// v7 = tex

//Calculate appropriate index:
//Start index (-vertex array based.
mov r0,   v2
//Calculate real position: multiply with the increments of the matrix (*4) + add start offset

//Move r0.x in the register we use to access c0
mov a0.x, r0.x

// Transform position
dp4 r1.x, v0, c[a0.x+0]
dp4 r1.y, v0, c[a0.x+1]
dp4 r1.z, v0, c[a0.x+2]
dp4 r1.w, v0, c[a0.x+3]

//Now multiply with world*view
m4x4 oPos, r1, c0

; Send the tex coords out unmodified
; ----------------------------------
mov oT0.xy, v7```

This code will first calculate our appropriate index. As you have seen.. our own bone array uses an starting index of zero. But the bones in our constants start at 8, and increment with steps of 4. So we calculate the proper position first into a0.x This register can be used for lookups in the constant register. (put "a0.x" "shader" in google, and you will get more information).

Note: You might notice I don't do anything with blendingweights. Since the blendinweights are all 1.0f, I don't need to do any calculations. I'm keeping them in the vertex structure, just in case I do multiple weights someday.

Remember the vRequired constant we set, with 8.0 and 4.0? We are using that one for this calculation. First we move the index as given in the index in a temporary (r0) register. Then we "multiply and add" c4 (*4) and add (+8), and then we get the right index. The position of the right bone in our constants array. We move this to a0.x (.x is the only one supported in our shader version).

Then we multiply the bone matrices with x,y,z,w. We are doing the same here as the m4x4 does (which is like a macro). Now we have our vertices transformed by the bone matrices. Then we transform it to the world*view*projection matrix and output it in oPos.

Finally we simply put the texture coordinates in oT0 without any changes.

We also need a normal shader, for the static objects:

``` vs.1.1
```
```  //Only transform the vertices:
m4x4 oPos, v0, c0
mov oT0.xy, v7```

Which does the same as above, but does not use the bone matrices (because there are none).

## Conclusion

I hope this article will help people creating animations, running in the vertex shader. This method should really speed up your game. The CPU is unloaded from the vertex processing work, which is now done on the GPU. This processor is build for that kind of job.

I might change a few things in this article, depending on what I learn myself. (Game programming is quite a continuous learning process )

Almar