Chapter 15: using GPU draw_indexed_primitives for instanced rendering

The approach to ICB command creation from chapter 15 will generate a draw_indexed_primitives command for every instance of a given mesh.

Now I want to implement instanced rendering, as described in chapter 13. What would be the best approach for implementing this in a GPU-driven render loop?

I’ve looked at the various WWDC presentations as well as the various Apple example projects and could not find a draw_indexed_primitives invocation with an instanceCount higher than 1.

The only approach that I can think of right now is to

  1. group the meshes on the CPU first
  2. run the ICB encoding kernel separately for each mesh type with a potential instance count higher than 1, (“potential” in the sense that the instance may be occluded or frustum-culled)
  3. run the ICB encoding kernel for all remaining meshes that have an instance count of, at most, 1

This approach should work, but is less than satisfactory, given that I’m aiming for a fully GPU-driven render loop.

So I’m wondering if there may be other techniques for achieving instanced rendering that do not rely on the CPU. Any ideas?

I regret that I haven’t looked at GPU multiple instance encoding in a while, so I probably can’t help at the moment.

But isn’t it just a question of sending the instance information in a separate buffer?

For example, if I change the instance count to 2 in Renderer, and change the vertex shader:

vertex VertexOut vertex_main(const VertexIn vertexIn [[stage_in]],
                             constant Uniforms &uniforms [[buffer(BufferIndexUniforms)]],
                             constant ModelParams *modelParamsArray
                                               [[buffer(BufferIndexModelParams)]],
                             uint baseInstance [[base_instance]],
                             uint instanceId [[instance_id]])
{
  ModelParams modelParams = modelParamsArray[baseInstance];
  float4 position = vertexIn.position;
  position.x += (instanceId - baseInstance) * 3;
  VertexOut out {
    .position = uniforms.projectionMatrix * uniforms.viewMatrix
                       * modelParams.modelMatrix * position,
    .uv = vertexIn.uv,
    .modelIndex = baseInstance
  };
  return out;

I render two grounds and two houses. It shouldn’t be that hard to get the position data from an extra buffer and extract it from there?

Hi Caroline, thanks for replying. The book has been a great help, BTW.

The draw call itself isn’t the problem. Once you have determined which instances need to be rendered, you can set up the matching buffers and instance count.

Having thought about it some more, the solution seems to be to encode the compute dispatches on the GPU.

This is briefly touched upon in the WWDC19 Metal presentation, around the 41:30 mark.

So basically this part of the render loop would have to be moved to its separate kernel on the GPU:

Admittedly, this is all well beyond the scope of chapter 15.

I’m glad the book has helped :blush:

I checked the modern rendering sample code, which is my go-to complicated Apple code, but they didn’t have multiple instancing in the sample.

Having looked into it some more, the solution seems to be to use MTLDrawIndexedPrimitivesIndirectArguments

This way you provide the draw call arguments through a buffer.

The Terrain sample uses indirect arguments in this way for rendering vegetation.

1 Like