This repository was archived by the owner on Mar 12, 2021. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 78
This repository was archived by the owner on Mar 12, 2021. It is now read-only.
Overhead of memory copies #528
Copy link
Copy link
Closed
Labels
Description
From https://discourse.julialang.org/t/cpu-gpu-data-transfer-speed/31857
Copy using CuArray constructor, including time to allocate:
julia> using CuArrays, BenchmarkTools
julia> data = rand(Float32, 134217728);
julia> time = @belapsed CuArray($data);
julia> Base.format_bytes(sizeof(data) / time) * "/s"
"1.103 GiB/s"
Leaving out the allocation:
julia> gpu_data = CuArray(data);
julia> time = @belapsed copyto!($gpu_data, $data)
0.238991723
julia> Base.format_bytes(sizeof(data) / time) * "/s"
"2.092 GiB/s"
Using the underlying APIs:
julia> using CUDAdrv
julia> gpu = Mem.alloc(Mem.Device, sizeof(data))
CUDAdrv.Mem.DeviceBuffer(CuPtr{Nothing}(0x00007f65a6000000), 536870912, CuContext(Ptr{Nothing} @0x000000000232c140, true, true))
julia> gpu_ptr = convert(CuPtr{Float32}, gpu)
CuPtr{Float32}(0x00007f65a6000000)
julia> time = @belapsed unsafe_copyto!($gpu_ptr, $(pointer(data)), 134217728)
0.050821662
julia> Base.format_bytes(sizeof(data) / time) * "/s"
"9.838 GiB/s"
And using pinned host memory (this one can't be the default):
julia> cpu = Mem.alloc(Mem.Host, 134217728*sizeof(Float32))
CUDAdrv.Mem.HostBuffer(Ptr{Nothing} @0x00007f65c6000000, 536870912, CuContext(Ptr{Nothing} @0x000000000232c140, true, true), false)
julia> cpu_ptr = convert(Ptr{Float32}, cpu)
Ptr{Float32} @0x00007f65c6000000
julia> time = @belapsed unsafe_copyto!($gpu_ptr, $cpu_ptr, 134217728)
0.040853038
julia> Base.format_bytes(sizeof(data) / time) * "/s"
"12.239 GiB/s"