Hi Caroline,
Actually I have tried something like your recommendation, building my own wrapper which never owns the array input, just uses it to obtain buffers and pointers and then leaves it alone.
That works pretty well and I now have a nice class which can send in a pointer to the result array and overwrite the array without having to recover it from the buffer if I wish. So I’ve binned the @PropertyWrapper for now (but will probably revisit it later to see can I make a better version)
In the meantime, I think I misrepresented / misunderstood the problem.
My simple app creates two [Float] which are randomly initialised and one [Int32] which is zeroed. The shader multiplies the first two (constant space) and casts to get the result (device space). I wanted to compare GPU speed against a similar calculation by looping in the CPU. The array sizes can be 1_000, 10_000, 100_000 and 1_000_000.
The interface has a slider to change array size and two buttons, one to re-randomise the inputs and the other to run the shader. It reports the time taken serially and in parallel. It also partly shows the array contents so I know they are changing when they should.
I wasn’t surprised that serial was faster for a small array, but even with 100_000 elements, it was still matching the GPU. The GPU finally gets ahead at a million elements (I seem to have inadvertently fixed something, because it was even failing here before).
Now I’ve noticed that if I run the shader without reinitialising the arrays, it gets faster. Reinitialise input and it drops back to a higher value. This happens with my revised non-@PropertyWrapper version too.
It’s really curious because this is all that happens to call the shader:
mutating func runShader() -> (TimeInterval, TimeInterval) {
guard let commandBuffer = GPU.CommandQueue.makeCommandBuffer(),
let computeEncoder = commandBuffer.makeComputeCommandEncoder()
else { return (0,0) }
let startTime = Date.now
computeEncoder.setComputePipelineState(computePipelineState)
// *** Bind the buffer to the GPU buffer table ***
computeEncoder.setBuffer(multiplicandBuffer, offset: 0, attributeStride: stride, index: 0))
computeEncoder.setBuffer(multiplierBuffer, offset: 0, attributeStride: stride, index: 1))
computeEncoder.setBuffer(productBuffer, offset: 0, attributeStride: stride, index: 2))
let gridSize = MTLSize(width: capacity, height: 1, depth: 1)
let threadGroupSize = min(computePipelineState.maxTotalThreadsPerThreadgroup, capacity)
let threadsPerGroup = MTLSize(width: threadGroupSize, height: 1, depth: 1)
computeEncoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadsPerGroup)
computeEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
product = Array(UnsafeBufferPointer(start: productPointer, count: capacity))
let parallel = Date().timeIntervalSince(startTime)
let serial = serialRun()
return (parallel,serial)
}
It’s either binding three updated buffers or three unchanged buffers. Nothing else changes between runs. At this stage I’m fairly sure that @PropertyWrapper was not the culprit. Is it possible that setBuffer(_, offset: , attributeStride:, index: )) has less to do if buffer is unchanged?
The time taken drops 10-fold if I don’t rerandomise and rises again if I do. I’m only measuring time inside this method.
P.S None of this is important stuff so don’t lose time on it. I’m just fooling around, but I’ve learned a ton of interesting things about Arrays, various pointer types, ContiguousArrays by exploring this.