Skip to content
This repository was archived by the owner on Mar 12, 2021. It is now read-only.
This repository was archived by the owner on Mar 12, 2021. It is now read-only.

Very slow 4D broadcast in 2.0.1 #677

@DrChainsaw

Description

@DrChainsaw

Describe the bug
I get very poor performance on broadcasting operations on arrays with more than 3 dimensions compared to 1.7.3

To Reproduce
The Minimal Working Example (MWE) for this bug:

## 1.7.3
julia> using CuArrays

julia> using BenchmarkTools

julia> CuArrays.allowscalar(false)

julia> const xxxx = cu(randn(64,64,64,64));

julia> const yyyy = cu(randn(1,1,1,64));

julia> @benchmark CuArrays.@sync xxxx .+ yyyy
BenchmarkTools.Trial: 
  memory estimate:  4.20 KiB
  allocs estimate:  78
  --------------
  minimum time:     504.800 μs (0.00% GC)
  median time:      584.401 μs (0.00% GC)
  mean time:        718.303 μs (1.35% GC)
  maximum time:     4.462 ms (37.51% GC)
  --------------
  samples:          6871
  evals/sample:     1

(CuArraysFp16) pkg> up CuArrays
  Updating registry at `E:/Programs/julia/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
  Updating `E:\Programs\julia\.julia\dev\CuArraysFp16\Project.toml`
  [3a865a2d]  CuArrays v1.7.3  v2.0.1
  Updating `E:\Programs\julia\.julia\dev\CuArraysFp16\Manifest.toml`
  [3895d2a7]  CUDAapi v3.1.0  v4.0.0
  [c5f51814]  CUDAdrv v6.0.0  v6.2.2
  [be33ccc6]  CUDAnative v2.10.2  v3.0.3
  [da1fd8a2] + CodeTracking v0.5.8        
  [f68482b8] + Cthulhu v1.0.1
  [3a865a2d]  CuArrays v1.7.3  v2.0.1
  [e2ba6199] + ExprTools v0.1.0        
  [0c68f7d7]  GPUArrays v2.0.1  v3.1.0
  [189a3867] + Reexport v0.2.0
  [76f85450] + LibGit2 
  [44cfe95a] + Pkg 
  [3fa0cd96] + REPL 

julia> using CuArrays

julia> using BenchmarkTools

julia> CuArrays.allowscalar(false)

julia> const xxxx = cu(randn(64,64,64,64));

julia> const yyyy = cu(randn(1,1,1,64));

julia> @benchmark CuArrays.@sync xxxx .+ yyyy
BenchmarkTools.Trial: 
  memory estimate:  4.30 KiB
  allocs estimate:  87
  --------------
  minimum time:     20.733 ms (0.00% GC)
  median time:      25.089 ms (0.00% GC)
  mean time:        24.637 ms (0.02% GC)
  maximum time:     29.447 ms (0.00% GC)
  --------------
  samples:          203
  evals/sample:     1

Expected behavior
I guess I was hoping for performance to at least be on par with 1.7.3 :)

Build log

julia> Pkg.build()
  Building NNlib → `E:\Programs\julia\.julia\packages\NNlib\FAI3o\deps\build.log`
false

julia> using CuArrays, BenchmarkTools
[ Info: Precompiling CuArrays [3a865a2d-5b23-5a0f-bc46-62713ec82fae]

julia> const xxxx = cu(randn(64,64,64,64));

julia> const yyyy = cu(randn(1,1,1,64));

julia> @benchmark CuArrays.@sync xxxx .+ yyyy
BenchmarkTools.Trial: 
  memory estimate:  4.30 KiB
  allocs estimate:  87
  --------------
  minimum time:     19.186 ms (0.00% GC)
  median time:      25.103 ms (0.00% GC)
  mean time:        24.327 ms (0.03% GC)
  maximum time:     28.657 ms (5.64% GC)
  --------------
  samples:          206
  evals/sample:     1

Environment details (please complete this section)
Details on Julia:

julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)       
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
  WORD_SIZE: 64    
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)

Julia packages:
See MWE above

CUDA: toolkit and driver version: Driver version 442.74. CUDA bundled with 2.0.1? I have 10.2 installed for 1.7.3.

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions