This repository was archived by the owner on Mar 12, 2021. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 78
This repository was archived by the owner on Mar 12, 2021. It is now read-only.
Very slow 4D broadcast in 2.0.1 #677
Copy link
Copy link
Closed
JuliaGPU/CUDAnative.jl
#624Labels
Description
Describe the bug
I get very poor performance on broadcasting operations on arrays with more than 3 dimensions compared to 1.7.3
To Reproduce
The Minimal Working Example (MWE) for this bug:
## 1.7.3
julia> using CuArrays
julia> using BenchmarkTools
julia> CuArrays.allowscalar(false)
julia> const xxxx = cu(randn(64,64,64,64));
julia> const yyyy = cu(randn(1,1,1,64));
julia> @benchmark CuArrays.@sync xxxx .+ yyyy
BenchmarkTools.Trial:
memory estimate: 4.20 KiB
allocs estimate: 78
--------------
minimum time: 504.800 μs (0.00% GC)
median time: 584.401 μs (0.00% GC)
mean time: 718.303 μs (1.35% GC)
maximum time: 4.462 ms (37.51% GC)
--------------
samples: 6871
evals/sample: 1
(CuArraysFp16) pkg> up CuArrays
Updating registry at `E:/Programs/julia/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General.git`
Resolving package versions...
Updating `E:\Programs\julia\.julia\dev\CuArraysFp16\Project.toml`
[3a865a2d] ↑ CuArrays v1.7.3 ⇒ v2.0.1
Updating `E:\Programs\julia\.julia\dev\CuArraysFp16\Manifest.toml`
[3895d2a7] ↑ CUDAapi v3.1.0 ⇒ v4.0.0
[c5f51814] ↑ CUDAdrv v6.0.0 ⇒ v6.2.2
[be33ccc6] ↑ CUDAnative v2.10.2 ⇒ v3.0.3
[da1fd8a2] + CodeTracking v0.5.8
[f68482b8] + Cthulhu v1.0.1
[3a865a2d] ↑ CuArrays v1.7.3 ⇒ v2.0.1
[e2ba6199] + ExprTools v0.1.0
[0c68f7d7] ↑ GPUArrays v2.0.1 ⇒ v3.1.0
[189a3867] + Reexport v0.2.0
[76f85450] + LibGit2
[44cfe95a] + Pkg
[3fa0cd96] + REPL
julia> using CuArrays
julia> using BenchmarkTools
julia> CuArrays.allowscalar(false)
julia> const xxxx = cu(randn(64,64,64,64));
julia> const yyyy = cu(randn(1,1,1,64));
julia> @benchmark CuArrays.@sync xxxx .+ yyyy
BenchmarkTools.Trial:
memory estimate: 4.30 KiB
allocs estimate: 87
--------------
minimum time: 20.733 ms (0.00% GC)
median time: 25.089 ms (0.00% GC)
mean time: 24.637 ms (0.02% GC)
maximum time: 29.447 ms (0.00% GC)
--------------
samples: 203
evals/sample: 1
Expected behavior
I guess I was hoping for performance to at least be on par with 1.7.3 :)
Build log
julia> Pkg.build()
Building NNlib → `E:\Programs\julia\.julia\packages\NNlib\FAI3o\deps\build.log`
false
julia> using CuArrays, BenchmarkTools
[ Info: Precompiling CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae]
julia> const xxxx = cu(randn(64,64,64,64));
julia> const yyyy = cu(randn(1,1,1,64));
julia> @benchmark CuArrays.@sync xxxx .+ yyyy
BenchmarkTools.Trial:
memory estimate: 4.30 KiB
allocs estimate: 87
--------------
minimum time: 19.186 ms (0.00% GC)
median time: 25.103 ms (0.00% GC)
mean time: 24.327 ms (0.03% GC)
maximum time: 28.657 ms (5.64% GC)
--------------
samples: 206
evals/sample: 1
Environment details (please complete this section)
Details on Julia:
julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Julia packages:
See MWE above
CUDA: toolkit and driver version: Driver version 442.74. CUDA bundled with 2.0.1? I have 10.2 installed for 1.7.3.
Additional context
Add any other context about the problem here.