Skip to content
This repository was archived by the owner on Mar 12, 2021. It is now read-only.

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Aug 27, 2019

No description provided.

@maleadt maleadt changed the title WIP: Add CUDNN wrappers from Flux. WIP: Use auto-generated wrappers Aug 28, 2019
@maleadt
Copy link
Member Author

maleadt commented Aug 28, 2019

Decided to do the more ambitious change of using auto-generated wrappers. We often struggle to keep up with updates to e.g. CUDNN, and this should lower the effort quite a bit. Furthermore, it already discovered some misuses of the API, using 32-bit integers for workspace arguments. Here's hoping that (or other issues this might uncover) helps to deal with some of the vague param errors we've been seeing on some platforms.

@maleadt maleadt changed the title WIP: Use auto-generated wrappers WIP: CUDNN improvements Aug 30, 2019
@maleadt
Copy link
Member Author

maleadt commented Aug 30, 2019

Moved some additional functionality over from Flux, see https://github.com/FluxML/Flux.jl/compare/tb/cuarrays_dnn
@MikeInnes is that a sane split? I moved everything CUDNN related, including some higher-level wrappers, leaving everything that depends on Flux or Tracker. From a CuArrays (implementing NNlib) point-of-view, should any of that functionality remain, or should we move additional stuff?

@MikeInnes
Copy link
Collaborator

MikeInnes commented Aug 30, 2019

Awesome. Looks like you've taken pretty much everything, which is ideal.

It'd be nice to additionally move the bulk of the @grad definitions here as well, so that LSTMs are approximately usable from CuArrays alone. I'm imagining something like CUDNN.forward(desc, x, h, Wi, Wh, b) which returns a pullback, and the @grads can wrap that with tracker stuff as needed. Then we can also move the bulk of the gradient sanity checks from Flux to CuArrays as well, which seems more robust.

@@ -639,7 +639,7 @@ function cudnnGetRNNLinLayerBiasParams(handle, rnnDesc, pseudoLayer, xDesc, wDes
end

function cudnnRNNForwardInference(handle, rnnDesc, seqLength, xDesc, x, hxDesc, hx, cxDesc, cx, wDesc, w, yDesc, y, hyDesc, hy, cyDesc, cy, workspace, workSpaceSizeInBytes)
@check ccall((:cudnnRNNForwardInference, @libcudnn), cudnnStatus_t, (cudnnHandle_t, cudnnRNNDescriptor_t, Cint, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnFilterDescriptor_t, Ptr{Cvoid}, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, CuPtr{Cvoid}, Csize_t), handle, rnnDesc, seqLength, xDesc, x, hxDesc, hx, cxDesc, cx, wDesc, w, yDesc, y, hyDesc, hy, cyDesc, cy, workspace, workSpaceSizeInBytes)
@check ccall((:cudnnRNNForwardInference, @libcudnn), cudnnStatus_t, (cudnnHandle_t, cudnnRNNDescriptor_t, Cint, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnFilterDescriptor_t, CuPtr{Cvoid}, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, CuPtr{Cvoid}, Csize_t), handle, rnnDesc, seqLength, xDesc, x, hxDesc, hx, cxDesc, cx, wDesc, w, yDesc, y, hyDesc, hy, cyDesc, cy, workspace, workSpaceSizeInBytes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ptr->CuPtr here; I changed this because after making this change, RNNs failed to execute due to trying to convert CuArray to Ptr. I'm not sure why that wasn't revealed before. If this is generated code we presumably need to check this carefully and figure out why that happened.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been revealed, that was the whole purpose of CuPtr vs Ptr. I'll have a look. Either way, there's bound to be other issues like this one (please change appropriately in pointers.json), but it shouldn't hurt since you'd typically be using a GPU array where you expect the library to take one, with the wrapper converting to Ptr and failing to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this functionality is never tested / unused, because triggering it manually does properly show an error:

julia> rnn = CUDNN.RNNDesc{Float32}(CUDNN.CUDNN_RNN_RELU, 1, 1)
CuArrays.CUDNN.RNNDesc{Float32}(CuArrays.CUDNN.CUDNN_RNN_RELU, 1, 1, Float32[0.0, 0.0, 0.0, 0.0], (Float32[0.0], Float32[0.0]), Float32[0.0], Ptr{Nothing} @0x00000000439d6b80)

julia> CUDNN.forward(rnn, CuArrays.rand(1), CuArrays.rand(1))
typeof(w) = CuArray{Float32,1}
ERROR: ArgumentError: cannot take the CPU address of a CuArray{Float32,1}
Stacktrace:
 [1] cconvert(::Type{Ptr{Nothing}}, ::CuArray{Float32,1}) at /home/tbesard/Julia/pkg/CuArrays/src/array.jl:152
 [2] cudnnRNNForwardInference(::Ptr{Nothing}, ::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArray{UInt8,1}, ::Int64) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/libcudnn.jl:17
 [3] cudnnRNNForward(::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArray{UInt8,1}, ::Nothing) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:94
 [4] forward(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::Nothing, ::Type) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:132
 [5] forward(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1}, ::CuArray{Float32,1}) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:117
 [6] top-level scope at REPL[37]:1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I'd seen this during a test, but it looks like we're actually not testing a plain forward pass. Will add that when moving the tests over.

@maleadt maleadt force-pushed the tb/flux branch 3 times, most recently from b1aabca to 2c822b0 Compare September 18, 2019 12:16
@maleadt
Copy link
Member Author

maleadt commented Sep 18, 2019

I simplified this PR by filter-branching out the wrapper generator, it only contains CUDNN stuff now.

@MikeInnes
Copy link
Collaborator

Sounds good, though unfortunately when I run this I get

julia> using CuArrays
[ Info: Precompiling CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae]
ERROR: LoadError: LoadError: LoadError: type Nothing has no field alloc
Stacktrace:
 [1] getproperty(::Any, ::Symbol) at ./Base.jl:20
 [2] macro expansion at /home/mike/.julia/packages/TimerOutputs/7zSea/src/TimerOutput.jl:216 [inlined]
 [3] macro expansion at /home/mike/projects/flux/Flux/dev/CuArrays/src/memory.jl:103 [inlined]

@maleadt
Copy link
Member Author

maleadt commented Sep 18, 2019

Huh, the pool[] ref should be initialized by memory_pool! called from __init__, so that's weird.

@maleadt
Copy link
Member Author

maleadt commented Sep 20, 2019

Fixed.

@maleadt
Copy link
Member Author

maleadt commented Sep 27, 2019

RNN test failure fixed 🎉

@MikeInnes
Copy link
Collaborator

That's huge. I'm happy with how the wrappers look now; we could use some sanity checks here but it's not that urgent given that we have tests in Flux. So if you're on board with it we could merge these branches.

@MikeInnes MikeInnes merged commit b8b1c4e into master Sep 27, 2019
@bors bors bot deleted the tb/flux branch September 27, 2019 13:48
@maleadt maleadt changed the title WIP: CUDNN improvements CUDNN improvements Sep 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants