-
-
Notifications
You must be signed in to change notification settings - Fork 78
CUDNN improvements #404
Conversation
Decided to do the more ambitious change of using auto-generated wrappers. We often struggle to keep up with updates to e.g. CUDNN, and this should lower the effort quite a bit. Furthermore, it already discovered some misuses of the API, using 32-bit integers for workspace arguments. Here's hoping that (or other issues this might uncover) helps to deal with some of the vague param errors we've been seeing on some platforms. |
Moved some additional functionality over from Flux, see https://github.com/FluxML/Flux.jl/compare/tb/cuarrays_dnn |
Awesome. Looks like you've taken pretty much everything, which is ideal. It'd be nice to additionally move the bulk of the |
src/dnn/libcudnn.jl
Outdated
@@ -639,7 +639,7 @@ function cudnnGetRNNLinLayerBiasParams(handle, rnnDesc, pseudoLayer, xDesc, wDes | |||
end | |||
|
|||
function cudnnRNNForwardInference(handle, rnnDesc, seqLength, xDesc, x, hxDesc, hx, cxDesc, cx, wDesc, w, yDesc, y, hyDesc, hy, cyDesc, cy, workspace, workSpaceSizeInBytes) | |||
@check ccall((:cudnnRNNForwardInference, @libcudnn), cudnnStatus_t, (cudnnHandle_t, cudnnRNNDescriptor_t, Cint, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnFilterDescriptor_t, Ptr{Cvoid}, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, CuPtr{Cvoid}, Csize_t), handle, rnnDesc, seqLength, xDesc, x, hxDesc, hx, cxDesc, cx, wDesc, w, yDesc, y, hyDesc, hy, cyDesc, cy, workspace, workSpaceSizeInBytes) | |||
@check ccall((:cudnnRNNForwardInference, @libcudnn), cudnnStatus_t, (cudnnHandle_t, cudnnRNNDescriptor_t, Cint, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnFilterDescriptor_t, CuPtr{Cvoid}, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, CuPtr{Cvoid}, Csize_t), handle, rnnDesc, seqLength, xDesc, x, hxDesc, hx, cxDesc, cx, wDesc, w, yDesc, y, hyDesc, hy, cyDesc, cy, workspace, workSpaceSizeInBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ptr
->CuPtr
here; I changed this because after making this change, RNNs failed to execute due to trying to convert CuArray to Ptr. I'm not sure why that wasn't revealed before. If this is generated code we presumably need to check this carefully and figure out why that happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been revealed, that was the whole purpose of CuPtr vs Ptr. I'll have a look. Either way, there's bound to be other issues like this one (please change appropriately in pointers.json
), but it shouldn't hurt since you'd typically be using a GPU array where you expect the library to take one, with the wrapper converting to Ptr and failing to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this functionality is never tested / unused, because triggering it manually does properly show an error:
julia> rnn = CUDNN.RNNDesc{Float32}(CUDNN.CUDNN_RNN_RELU, 1, 1)
CuArrays.CUDNN.RNNDesc{Float32}(CuArrays.CUDNN.CUDNN_RNN_RELU, 1, 1, Float32[0.0, 0.0, 0.0, 0.0], (Float32[0.0], Float32[0.0]), Float32[0.0], Ptr{Nothing} @0x00000000439d6b80)
julia> CUDNN.forward(rnn, CuArrays.rand(1), CuArrays.rand(1))
typeof(w) = CuArray{Float32,1}
ERROR: ArgumentError: cannot take the CPU address of a CuArray{Float32,1}
Stacktrace:
[1] cconvert(::Type{Ptr{Nothing}}, ::CuArray{Float32,1}) at /home/tbesard/Julia/pkg/CuArrays/src/array.jl:152
[2] cudnnRNNForwardInference(::Ptr{Nothing}, ::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArray{UInt8,1}, ::Int64) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/libcudnn.jl:17
[3] cudnnRNNForward(::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArray{UInt8,1}, ::Nothing) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:94
[4] forward(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::Nothing, ::Type) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:132
[5] forward(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1}, ::CuArray{Float32,1}) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:117
[6] top-level scope at REPL[37]:1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I'd seen this during a test, but it looks like we're actually not testing a plain forward pass. Will add that when moving the tests over.
b1aabca
to
2c822b0
Compare
I simplified this PR by |
Sounds good, though unfortunately when I run this I get julia> using CuArrays
[ Info: Precompiling CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae]
ERROR: LoadError: LoadError: LoadError: type Nothing has no field alloc
Stacktrace:
[1] getproperty(::Any, ::Symbol) at ./Base.jl:20
[2] macro expansion at /home/mike/.julia/packages/TimerOutputs/7zSea/src/TimerOutput.jl:216 [inlined]
[3] macro expansion at /home/mike/projects/flux/Flux/dev/CuArrays/src/memory.jl:103 [inlined] |
Huh, the |
Fixed. |
This triggers the RNN error much more easily.
RNN test failure fixed 🎉 |
That's huge. I'm happy with how the wrappers look now; we could use some sanity checks here but it's not that urgent given that we have tests in Flux. So if you're on board with it we could merge these branches. |
No description provided.