-
Notifications
You must be signed in to change notification settings - Fork 1.1k
CUDA Texture Sampler #5423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CUDA Texture Sampler #5423
Conversation
src/StorageFlattening.cpp
Outdated
args, Call::Intrinsic); | ||
return Evaluate::make(store); | ||
} else if (in_gpu && textures.count(op->name)) { | ||
} else if (in_gpu && textures.count(op->name) && false && in_device_api != DeviceAPI::CUDA) { // CUDA writes are still directly to memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to do here. In CUDA the texture sampling is read-only, so writes should be flattened as before. But when the StorageFlattening pass runs, DeviceAPI hasn't been assigned and so there isn't anything to key off to flatten or not flatten at this phase.
There should also be a check somewhere to ensure there are not reads and writes to a buffer-as-texture within any one kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the DeviceAPI assignment to before StorageFlattening, I need it anyways so allocations can be properly strided if necessary.
builder->CreateStore(ConstantInt::get(i8_t, closure_args[i].is_buffer), | ||
int8_t buffer_type = 0; | ||
if (closure_args[i].is_buffer && closure_args[i].memory_type == MemoryType::GPUTexture) { | ||
buffer_type = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little clumsy but it's certainly effective for getting the texture flag to the runtime. Alternatives welcome. Haven't thought about yet what this would look like if any sampler-related info needs to be sent as well.
Also somewhere in codegen there should be a check that you have compute 3.0 if generating a GPUTexture buffer, but not sure where that should go |
Rather unbelievably this works now. LLVM master has way way better error messages than 11 or whatever I was using before. Ready for review, beat me up. |
Opened #5431 for testing |
Various test failures in the clone build, possibly related |
1-month check-in. (No rush on our side of course, just want to see if this is on the backburner for a while.) |
I assume this PR is on the backburner. Please ping when the test failures get addressed. |
Thanks @steven-johnson will do, hope to have some more side time come the holiday weeks. |
Post-holiday check-in: where does this PR stand? |
Is this PR still active? Should it be closed? |
This is the next step on my texture sampler journey. I have some architecture questions, as not much feels very clear to me.
Unlike OpenCL which has a separate device interface for textures, right now I'm passing a bit of information to halide_cuda_run so it can wrap existing buffers in the texture descriptor. Tradeoff for this is only 1D and 2D samplers can be used, 3D requires allocating a cuArray.
Any thoughts, feedback welcome, I have a few extra questions inline as well.