CUDNN improvements #404

maleadt · 2019-08-27T09:30:01Z

No description provided.

maleadt · 2019-08-28T15:36:17Z

Decided to do the more ambitious change of using auto-generated wrappers. We often struggle to keep up with updates to e.g. CUDNN, and this should lower the effort quite a bit. Furthermore, it already discovered some misuses of the API, using 32-bit integers for workspace arguments. Here's hoping that (or other issues this might uncover) helps to deal with some of the vague param errors we've been seeing on some platforms.

maleadt · 2019-08-30T11:44:42Z

Moved some additional functionality over from Flux, see https://github.com/FluxML/Flux.jl/compare/tb/cuarrays_dnn
@MikeInnes is that a sane split? I moved everything CUDNN related, including some higher-level wrappers, leaving everything that depends on Flux or Tracker. From a CuArrays (implementing NNlib) point-of-view, should any of that functionality remain, or should we move additional stuff?

MikeInnes · 2019-08-30T11:53:38Z

Awesome. Looks like you've taken pretty much everything, which is ideal.

It'd be nice to additionally move the bulk of the @grad definitions here as well, so that LSTMs are approximately usable from CuArrays alone. I'm imagining something like CUDNN.forward(desc, x, h, Wi, Wh, b) which returns a pullback, and the @grads can wrap that with tracker stuff as needed. Then we can also move the bulk of the gradient sanity checks from Flux to CuArrays as well, which seems more robust.

MikeInnes · 2019-09-17T16:16:59Z

src/dnn/libcudnn.jl

@@ -639,7 +639,7 @@ function cudnnGetRNNLinLayerBiasParams(handle, rnnDesc, pseudoLayer, xDesc, wDes
 end

 function cudnnRNNForwardInference(handle, rnnDesc, seqLength, xDesc, x, hxDesc, hx, cxDesc, cx, wDesc, w, yDesc, y, hyDesc, hy, cyDesc, cy, workspace, workSpaceSizeInBytes)
-    @check ccall((:cudnnRNNForwardInference, @libcudnn), cudnnStatus_t, (cudnnHandle_t, cudnnRNNDescriptor_t, Cint, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnFilterDescriptor_t, Ptr{Cvoid}, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, CuPtr{Cvoid}, Csize_t), handle, rnnDesc, seqLength, xDesc, x, hxDesc, hx, cxDesc, cx, wDesc, w, yDesc, y, hyDesc, hy, cyDesc, cy, workspace, workSpaceSizeInBytes)
+    @check ccall((:cudnnRNNForwardInference, @libcudnn), cudnnStatus_t, (cudnnHandle_t, cudnnRNNDescriptor_t, Cint, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnFilterDescriptor_t, CuPtr{Cvoid}, Ptr{cudnnTensorDescriptor_t}, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, cudnnTensorDescriptor_t, CuPtr{Cvoid}, CuPtr{Cvoid}, Csize_t), handle, rnnDesc, seqLength, xDesc, x, hxDesc, hx, cxDesc, cx, wDesc, w, yDesc, y, hyDesc, hy, cyDesc, cy, workspace, workSpaceSizeInBytes)


Ptr->CuPtr here; I changed this because after making this change, RNNs failed to execute due to trying to convert CuArray to Ptr. I'm not sure why that wasn't revealed before. If this is generated code we presumably need to check this carefully and figure out why that happened.

This should have been revealed, that was the whole purpose of CuPtr vs Ptr. I'll have a look. Either way, there's bound to be other issues like this one (please change appropriately in pointers.json), but it shouldn't hurt since you'd typically be using a GPU array where you expect the library to take one, with the wrapper converting to Ptr and failing to.

I guess this functionality is never tested / unused, because triggering it manually does properly show an error:

julia> rnn = CUDNN.RNNDesc{Float32}(CUDNN.CUDNN_RNN_RELU, 1, 1) CuArrays.CUDNN.RNNDesc{Float32}(CuArrays.CUDNN.CUDNN_RNN_RELU, 1, 1, Float32[0.0, 0.0, 0.0, 0.0], (Float32[0.0], Float32[0.0]), Float32[0.0], Ptr{Nothing} @0x00000000439d6b80) julia> CUDNN.forward(rnn, CuArrays.rand(1), CuArrays.rand(1)) typeof(w) = CuArray{Float32,1} ERROR: ArgumentError: cannot take the CPU address of a CuArray{Float32,1} Stacktrace: [1] cconvert(::Type{Ptr{Nothing}}, ::CuArray{Float32,1}) at /home/tbesard/Julia/pkg/CuArrays/src/array.jl:152 [2] cudnnRNNForwardInference(::Ptr{Nothing}, ::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArray{UInt8,1}, ::Int64) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/libcudnn.jl:17 [3] cudnnRNNForward(::CuArrays.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArrays.CUDNN.FilterDesc, ::CuArray{Float32,1}, ::Array{CuArrays.CUDNN.TensorDesc,1}, ::CuArray{Float32,1}, ::CuArrays.CUDNN.TensorDesc, ::CuArray{Float32,1}, ::Ptr{Nothing}, ::CUDAdrv.CuPtr{Nothing}, ::CuArray{UInt8,1}, ::Nothing) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:94 [4] forward(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::Nothing, ::Type) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:132 [5] forward(::CuArrays.CUDNN.RNNDesc{Float32}, ::CuArray{Float32,1}, ::CuArray{Float32,1}) at /home/tbesard/Julia/pkg/CuArrays/src/dnn/rnn.jl:117 [6] top-level scope at REPL[37]:1

I thought I'd seen this during a test, but it looks like we're actually not testing a plain forward pass. Will add that when moving the tests over.

maleadt · 2019-09-18T12:16:50Z

I simplified this PR by filter-branching out the wrapper generator, it only contains CUDNN stuff now.

MikeInnes · 2019-09-18T15:17:00Z

Sounds good, though unfortunately when I run this I get

julia> using CuArrays
[ Info: Precompiling CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae]
ERROR: LoadError: LoadError: LoadError: type Nothing has no field alloc
Stacktrace:
 [1] getproperty(::Any, ::Symbol) at ./Base.jl:20
 [2] macro expansion at /home/mike/.julia/packages/TimerOutputs/7zSea/src/TimerOutput.jl:216 [inlined]
 [3] macro expansion at /home/mike/projects/flux/Flux/dev/CuArrays/src/memory.jl:103 [inlined]

maleadt · 2019-09-18T16:35:50Z

Huh, the pool[] ref should be initialized by memory_pool! called from __init__, so that's weird.

maleadt · 2019-09-20T17:21:33Z

Fixed.

This triggers the RNN error much more easily.

maleadt · 2019-09-27T12:51:16Z

RNN test failure fixed 🎉

MikeInnes · 2019-09-27T13:19:30Z

That's huge. I'm happy with how the wrappers look now; we could use some sanity checks here but it's not that urgent given that we have tests in Flux. So if you're on board with it we could merge these branches.

maleadt changed the title ~~WIP: Add CUDNN wrappers from Flux.~~ WIP: Use auto-generated wrappers Aug 28, 2019

maleadt force-pushed the tb/flux branch from 53bc3e1 to ce727b8 Compare August 30, 2019 11:39

maleadt changed the title ~~WIP: Use auto-generated wrappers~~ WIP: CUDNN improvements Aug 30, 2019

appleparan mentioned this pull request Sep 3, 2019

Fix #378 by making culiteral_pow generic #380

Merged

maleadt force-pushed the tb/flux branch from ce727b8 to d2446ae Compare September 5, 2019 14:03

MikeInnes force-pushed the tb/flux branch from 1d56696 to 7c1478b Compare September 17, 2019 16:15

MikeInnes reviewed Sep 17, 2019

View reviewed changes

maleadt force-pushed the tb/flux branch 3 times, most recently from b1aabca to 2c822b0 Compare September 18, 2019 12:16

maleadt mentioned this pull request Sep 19, 2019

GPU CI maintainance FluxML/Flux.jl#861

Merged

maleadt force-pushed the tb/flux branch from 2c822b0 to 5048e89 Compare September 20, 2019 17:20

maleadt and others added 11 commits September 27, 2019 10:11

Add CUDNN wrappers from Flux.

55b9638

Move cudnn functions that don't exist in the library to different file.

2fa415e

use Clang.jl generated headers.

3cc7971

Compatibility fixes.

d148255

Fix signatures.

716fd08

More fixes.

d0f7bdb

Split CUDNN in files, add Flux wrappers.

04788ea

copy_transpose! and setweights!

11bfc2c

Regenerate wrappers for 7.6.3, and add an indentation pass.

f984fe2

Improved indentation.

b60bb8e

Fix initialization.

9efcec6

MikeInnes added 2 commits September 27, 2019 10:11

pullback functions

bb6cb81

workspace fix

f73b92a

maleadt force-pushed the tb/flux branch from c942c69 to f73b92a Compare September 27, 2019 08:25

maleadt added the enhancement label Sep 27, 2019

maleadt added 5 commits September 27, 2019 11:31

Use the counterpart Flux branch.

f3b9fe0

Use local workspaces.

045f13d

This triggers the RNN error much more easily.

Fix workspace size calculation.

3d18c6a

Use typed handle pointers inside structs.

b9b3077

Use 'using' when we don't actually extend a method.

9ae7b88

MikeInnes merged commit b8b1c4e into master Sep 27, 2019

bors bot deleted the tb/flux branch September 27, 2019 13:48

maleadt changed the title ~~WIP: CUDNN improvements~~ CUDNN improvements Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CUDNN improvements #404

CUDNN improvements #404

Uh oh!

maleadt commented Aug 27, 2019

Uh oh!

maleadt commented Aug 28, 2019

Uh oh!

maleadt commented Aug 30, 2019

Uh oh!

MikeInnes commented Aug 30, 2019 •

edited

Loading

Uh oh!

MikeInnes Sep 17, 2019

Uh oh!

maleadt Sep 17, 2019

Uh oh!

maleadt Sep 18, 2019

Uh oh!

MikeInnes Sep 18, 2019

Uh oh!

maleadt commented Sep 18, 2019

Uh oh!

MikeInnes commented Sep 18, 2019

Uh oh!

maleadt commented Sep 18, 2019

Uh oh!

maleadt commented Sep 20, 2019

Uh oh!

maleadt commented Sep 27, 2019

Uh oh!

MikeInnes commented Sep 27, 2019

Uh oh!

Uh oh!

Uh oh!

CUDNN improvements #404

CUDNN improvements #404

Uh oh!

Conversation

maleadt commented Aug 27, 2019

Uh oh!

maleadt commented Aug 28, 2019

Uh oh!

maleadt commented Aug 30, 2019

Uh oh!

MikeInnes commented Aug 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MikeInnes Sep 17, 2019

Choose a reason for hiding this comment

Uh oh!

maleadt Sep 17, 2019

Choose a reason for hiding this comment

Uh oh!

maleadt Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

MikeInnes Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

maleadt commented Sep 18, 2019

Uh oh!

MikeInnes commented Sep 18, 2019

Uh oh!

maleadt commented Sep 18, 2019

Uh oh!

maleadt commented Sep 20, 2019

Uh oh!

maleadt commented Sep 27, 2019

Uh oh!

MikeInnes commented Sep 27, 2019

Uh oh!

Uh oh!

MikeInnes commented Aug 30, 2019 •

edited

Loading