Buffers, applying a property wrapper

tchelyzt · September 2, 2024, 12:03am

Hi,

I have a SwiftUI app that sends some of its work to Metal to take advantage of parallelism (using a compute pipeline). A number of buffers are involved holding arrays of [Int32] and [Float]. It seemed like a good idea to invent a property wrapper to handle all the related buffer stuff (buffer index and pointer, capacity, stride, loading and recovering data and so on). It seemed to work just fine.

Then I decided I’d compare my fantastic parallel app with a slow old serial version and it turns out it was much slower!!! The larger the arrays, the worse the problem is.

The problem seems to be (I think) that every refresh of the screen by SwiftUI recopies the arrays. Here’s a page that discusses it. (It’s all a little over my head).

A massive oversight by Apple I presume.

Have you come across this and is there a clever workaround that I might understand?

Regards

caroline · September 4, 2024, 8:25am

I think I would expect that if you have an Array, which is a value type, that SwiftUI would refresh it when the view refreshes.

Just thinking about it . I don’t know if this would work, but have you tried setting up a StateObject at the beginning of the app and holding the arrays in a class (a reference object)? You’d then pass the class to the views as an EnvironmentObject.

Or not having an array, but just using the MTLBuffer, which is a reference type, and indexing into it with pointers?

tchelyzt · September 4, 2024, 10:22pm

Hi Caroline,
Actually I have tried something like your recommendation, building my own wrapper which never owns the array input, just uses it to obtain buffers and pointers and then leaves it alone.

That works pretty well and I now have a nice class which can send in a pointer to the result array and overwrite the array without having to recover it from the buffer if I wish. So I’ve binned the @PropertyWrapper for now (but will probably revisit it later to see can I make a better version)

In the meantime, I think I misrepresented / misunderstood the problem.

My simple app creates two [Float] which are randomly initialised and one [Int32] which is zeroed. The shader multiplies the first two (constant space) and casts to get the result (device space). I wanted to compare GPU speed against a similar calculation by looping in the CPU. The array sizes can be 1_000, 10_000, 100_000 and 1_000_000.

The interface has a slider to change array size and two buttons, one to re-randomise the inputs and the other to run the shader. It reports the time taken serially and in parallel. It also partly shows the array contents so I know they are changing when they should.

I wasn’t surprised that serial was faster for a small array, but even with 100_000 elements, it was still matching the GPU. The GPU finally gets ahead at a million elements (I seem to have inadvertently fixed something, because it was even failing here before).

Now I’ve noticed that if I run the shader without reinitialising the arrays, it gets faster. Reinitialise input and it drops back to a higher value. This happens with my revised non-@PropertyWrapper version too.

It’s really curious because this is all that happens to call the shader:

mutating func runShader() -> (TimeInterval, TimeInterval) {
   guard let commandBuffer = GPU.CommandQueue.makeCommandBuffer(),
         let computeEncoder = commandBuffer.makeComputeCommandEncoder()
   else { return (0,0) }
  
   let startTime = Date.now

   computeEncoder.setComputePipelineState(computePipelineState)
  
   // *** Bind the buffer to the GPU buffer table ***
   computeEncoder.setBuffer(multiplicandBuffer, offset: 0, attributeStride: stride, index: 0))
   computeEncoder.setBuffer(multiplierBuffer, offset: 0, attributeStride: stride, index: 1))
   computeEncoder.setBuffer(productBuffer, offset: 0, attributeStride: stride, index: 2))
  
   let gridSize = MTLSize(width: capacity, height: 1, depth: 1)
   let threadGroupSize = min(computePipelineState.maxTotalThreadsPerThreadgroup, capacity)
   let threadsPerGroup = MTLSize(width: threadGroupSize, height: 1, depth: 1)
  
   computeEncoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadsPerGroup)
  
   computeEncoder.endEncoding()
   commandBuffer.commit()
   commandBuffer.waitUntilCompleted()
  
   product = Array(UnsafeBufferPointer(start: productPointer, count: capacity))

   let parallel = Date().timeIntervalSince(startTime)
   let serial = serialRun()
   return (parallel,serial)
}

It’s either binding three updated buffers or three unchanged buffers. Nothing else changes between runs. At this stage I’m fairly sure that @PropertyWrapper was not the culprit. Is it possible that setBuffer(_, offset: , attributeStride:, index: )) has less to do if buffer is unchanged?

The time taken drops 10-fold if I don’t rerandomise and rises again if I do. I’m only measuring time inside this method.

P.S None of this is important stuff so don’t lose time on it. I’m just fooling around, but I’ve learned a ton of interesting things about Arrays, various pointer types, ContiguousArrays by exploring this.