I was talking to Michael Morton (Ziggyware owner) about the possibility of making my code run on ATI cards, and I told him that someone else tried using R2VB in XNA and failed. In that moment, an idea came to me: "Let's simulate R2VB on the CPU". Yes, I know, this sound crazy, but let's analyze a little.
In my tutorials, because I wanted to use vertex textures, I had to store lots of data in textures and do simulations in textures. Examples of these are: the particle systems, the deformation map creation, applying the deformation map to the displacement map, etc. These were, in a way, secondary effects of using VTF, which actually proved to increase the performance. So, rather than doing everything on the CPU, we will still do these texture manipulations on the GPU.
First, let me explain how R2VB works. For the most part, the operations I did in the tutorials are exactly the same: use some pixel shader for simulations and deformations. The difference between VTF and R2VB comes on play when you finally have a texture, and you want to use the information from this texture to alter the vertices. As we saw in the tutorial, VTF reads from this texture inside the vertex shader. On the other hand, R2VB takes that information and draws it in a "texture". That "texture" is at the same GPU memory address as a Vertex Buffer, so, when writing to the texture, the information is actually written to the vertex buffer. Hence, the name Render To Vertex Buffer.
We will emulate this behavior in software, on the CPU. So all the performance gained by doing the simulations using pixel shaders and render-to-texture remains.
We take the second tutorial, the one with morphing terrain. Let's see what needs to be done. Open Grid.cs, and add the following function.
Single[] textureData;
public void DrawCPUSingle(Texture2D displacementTexture)
{
displacementTexture.GetData<Single>(textureData);
for (int i = 0; i < displacementTexture.Height; i++)
{ for (int j = 0; j < displacementTexture.Width; j++)
vertices[i * displacementTexture.Width + j].Position.Y = textureData[j * displacementTexture.Height + i];
}
IGraphicsDeviceService igs = (IGraphicsDeviceService)game.Services.GetService(typeof(IGraphicsDeviceService));
device = igs.GraphicsDevice;
device.VertexDeclaration = vertexDecl;
device.Indices = ib;
device.DrawUserIndexedPrimitives<VertexPositionNormalTexture>(PrimitiveType.TriangleList, vertices, 0, vertices.Length, indices, 0, 2 * dimension * dimension);
}
public void LoadGraphicsContent()
{ textureData = new Single[(dimension + 1) * (dimension + 1)];
[...]
}
As you can see, we take as input the texture that would be used as a displacement texture, then we use GetData() on it, and we write that data to our vertex array. The index array is the same as before. After this, instead of using a dynamic vertex buffer, we use the DrawUserIndexedPrimitives.
Now, we need to modify the VTFDisplacement.fx effect file. The modifications appear in the vertex shader:
VS_OUTPUT TransformCPU(VS_INPUT In)
{ VS_OUTPUT Out = (VS_OUTPUT)0;
float4x4 viewProj = mul(view, proj);
float4x4 worldViewProj= mul(world, viewProj);
// we don't read from the texture anymore
float height = In.position.y;
In.position.y = height * maxHeight;
Out.worldPos = mul(In.position, world);
Out.position = mul( In.position , worldViewProj);
Out.uv = In.uv;
float4 TexWeights = 0;
TexWeights.x = saturate( 1.0f - abs(height - 0) / 0.2f);
TexWeights.y = saturate( 1.0f - abs(height - 0.3) / 0.25f);
TexWeights.z = saturate( 1.0f - abs(height - 0.6) / 0.25f);
TexWeights.w = saturate( 1.0f - abs(height - 0.9) / 0.25f);
float totalWeight = TexWeights.x + TexWeights.y + TexWeights.z + TexWeights.w;
TexWeights /=totalWeight;
Out.textureWeights = TexWeights;
return Out;
}
[...] //pixe shader remains the same
technique GridDrawCPU
{ pass P0
{ //we can use SM 2.0
vertexShader = compile vs_2_0 TransformCPU();
pixelShader = compile ps_2_0 PixelShader();
}
}
The final modifications are done in the Draw function in Game1.cs
foreach (EffectPass pass in gridEffect.CurrentTechnique.Passes)
{ pass.Begin();
grid.DrawCPUSingle(morphRenderTarget.GetTexture());
pass.End();
}
That's all. The Steps in Snow sample is converted in exactly the same way.
For the particle system sample, we have the following modifications. In the vertex shader of the file Particle.fx, change the line
float4 realPosition = tex2Dlod ( positionSampler,
float4(In.vertexData.x, In.vertexData.y,0,0));
into
float4 realPosition = In.vertexData;
In Game1.cs, modify the following code:
Vector4[] textureData;
protected override void LoadGraphicsContent(bool loadAllContent)
{ if (loadAllContent)
{ textureData = new Vector4[particleCount * particleCount];
[...]
}
[...]
}
protected override void Draw(GameTime gameTime)
{ graphics.GraphicsDevice.Clear(Color.Black);
Render2TextureMorph((float)Math.Sin(gameTime.TotalGameTime.TotalSeconds) * 0.5f + 0.5f);
Render2TextureNormalCompute();
SimulateParticles(gameTime);
graphics.GraphicsDevice.RenderState.CullMode = CullMode.None;
[...]
graphics.GraphicsDevice.RenderState.DestinationBlend = Blend.One;
Texture2D positionTexture = positionRT.GetTexture();
positionTexture.GetData<Vector4>(textureData);
for (int i = 0; i < positionTexture.Height; i++)
{ for (int j = 0; j < positionTexture.Width; j++)
{ vertices[i * positionTexture.Width + j].Position.X = textureData[j * positionTexture.Height + i].X;
vertices[i * positionTexture.Width + j].Position.Y = textureData[j * positionTexture.Height + i].Y;
vertices[i * positionTexture.Width + j].Position.Z = textureData[j * positionTexture.Height + i].Z;
}
}
using (VertexDeclaration decl = new VertexDeclaration(
graphics.GraphicsDevice, VertexPositionColor.VertexElements))
{ graphics.GraphicsDevice.VertexDeclaration = decl;
renderParticleEffect.Begin();
renderParticleEffect.CurrentTechnique.Passes[0].Begin();
graphics.GraphicsDevice.DrawUserPrimitives<VertexPositionColor>(PrimitiveType.PointList, vertices, 0, particleCount * particleCount);
renderParticleEffect.CurrentTechnique.Passes[0].End();
renderParticleEffect.End();
}
[...]
Same technique used here: process the data on the GPU, in the textures, and then just GetData from the texture, and use it in to draw the particles.
The problem of performance: here are some experiments I did:
| Sample |
Framerate with vertex textures |
framerate with GetData() |
| Morphing Terrain, with Bilinear Filtering in VTF |
100 FPS |
65 FPS |
| Morphing Terrain, without Biliner Filtering |
155 FPS |
69 FPS |
| Particle System (65536 particles) Terrain Collision, no terrain rendered |
130 FPS |
65 FPS |
| Steps in Snow |
260 FPS |
80 FPS |
The system this was ran on is Intel Core2 6300 (1.87Ghz), 1 GB RAM, GeForce 7600 GS 256MB
Some care still has to be taken. Grid.Dimension has to be of size power-of-two-minus-one (255,511, etc). This is just because of my code. By doing this on the CPU, we loose the billinear filtering done in the vertex shader, so the Dimension of the grid has to be the same as the dimension of the displacement texture. (actually, that size minus one, as explained earlier). So, for a heightmap of 256*256, the grid's dimension has to be 255. The bilinear filtering can of course be done on the CPU also, but you'll have to see if it is worth it. These are the only restrictions I can think of.
I hope this made some of you (ATI owners) more happy about my tutorial :)
The source code for the modified chapters is found here: Chapter2CPU.zip, Chapter3CPU.zip and Chapter4CPU.zip