Skip to content
This repository was archived by the owner on Mar 12, 2021. It is now read-only.
This repository was archived by the owner on Mar 12, 2021. It is now read-only.

Overhead of memory copies #528

@maleadt

Description

@maleadt

From https://discourse.julialang.org/t/cpu-gpu-data-transfer-speed/31857

Copy using CuArray constructor, including time to allocate:

julia> using CuArrays, BenchmarkTools

julia> data = rand(Float32, 134217728);

julia> time = @belapsed CuArray($data);

julia> Base.format_bytes(sizeof(data) / time) * "/s"
"1.103 GiB/s"

Leaving out the allocation:

julia> gpu_data = CuArray(data);

julia> time = @belapsed copyto!($gpu_data, $data)
0.238991723

julia> Base.format_bytes(sizeof(data) / time) * "/s"
"2.092 GiB/s"

Using the underlying APIs:

julia> using CUDAdrv

julia> gpu = Mem.alloc(Mem.Device, sizeof(data))
CUDAdrv.Mem.DeviceBuffer(CuPtr{Nothing}(0x00007f65a6000000), 536870912, CuContext(Ptr{Nothing} @0x000000000232c140, true, true))

julia> gpu_ptr = convert(CuPtr{Float32}, gpu)
CuPtr{Float32}(0x00007f65a6000000)

julia> time = @belapsed unsafe_copyto!($gpu_ptr, $(pointer(data)), 134217728)
0.050821662

julia> Base.format_bytes(sizeof(data) / time) * "/s"
"9.838 GiB/s"

And using pinned host memory (this one can't be the default):

julia> cpu = Mem.alloc(Mem.Host, 134217728*sizeof(Float32))
CUDAdrv.Mem.HostBuffer(Ptr{Nothing} @0x00007f65c6000000, 536870912, CuContext(Ptr{Nothing} @0x000000000232c140, true, true), false)

julia> cpu_ptr = convert(Ptr{Float32}, cpu)
Ptr{Float32} @0x00007f65c6000000

julia> time = @belapsed unsafe_copyto!($gpu_ptr, $cpu_ptr, 134217728)
0.040853038

julia> Base.format_bytes(sizeof(data) / time) * "/s"
"12.239 GiB/s"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions