Hi Caroline,
I’ve built a program which tries to use a kernel function and used chapter 16 to implement an atomic-int type. I followed your model carefully except that my version uses an array of atomic_ints. Here are the bones with most of the detail removed:
#include <metal_stdlib>
using namespace metal;
struct Thing {
constant int *values;
};
kernel void simplified(constant Thing *things [[buffer(0)]],
device atomic_int *scores [[buffer(1)]],
uint thingIdx [[thread_position_in_grid]]) {
int score = 0;
for (int i = 0; i < 200 ; i++) {
// things not used here but I believe this func will be called once for each Thing and loop 200 times
score += 1;
atomic_fetch_add_explicit(&scores[thingIdx], score, memory_order_relaxed);
}
}
In summary, 256 (much more elaborate than shown) Things are passed in with an array of scores (Ints) and I want to update the scores table based on the Things. I’ve stripped it down here to ignore the Things and simply accumulate scores in a loop.
The thing is, when I return to the CPU, the scores remain at 0.
Here’s some of what the CPU does:
init() {
...
thingBuffer = Model.Device.makeBuffer(length: MemoryLayout<Thing>.stride * 256)!
var thingPointer = thingBuffer.contents().bindMemory(to: Thing.self, capacity: 256)
for t in 0..<256 {
thingPointer.pointee = things[t]
thingPointer = thingPointer.advanced(by: 1)
}
scoreBuffer = Model.Device.makeBuffer(length: MemoryLayout<Int>.stride * 256)!
scorePointer = scoreBuffer.contents().bindMemory(to: Int.self, capacity: 256)
var pointer = scorePointer
for p in 0..<256 {
pointer.pointee = scores[p]
pointer = pointer.advanced(by: 1)
}
}
func compute() {
guard let commandBuffer = Model.CommandQueue.makeCommandBuffer(),
let computeEncoder = commandBuffer.makeComputeCommandEncoder()
else { return }
computeEncoder.setBuffer(thingBuffer, offset: 0, index: 0)
computeEncoder.setBuffer(scoreBuffer, offset: 0, index: 1)
computeEncoder.setComputePipelineState(computePipelineState)
let threadsPerGrid = MTLSize(width: 256, height: 1, depth: 1)
let threadsPerGroup = MTLSize(width: computePipelineState.maxTotalThreadsPerThreadgroup, height: 1, depth: 1)
computeEncoder.dispatchThreads(threadsPerGrid, threadsPerThreadgroup: threadsPerGroup)
computeEncoder.endEncoding()
commandBuffer.commit()
var pointer = scorePointer
// score pointer was saved and never advanced
// This loop returns all zeros
for p in 0..<256 {
print(pointer.pointee, terminator: ",")
scores[p] = pointer.pointee
pointer = pointer.advanced(by: 1)
}
}
Am I missing something very obvious?
regards
Don