Skip to content

JuliaGPU/WeirdFloats.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WeirdFloats.jl

This package is for experimenting with interesting numerical formats available on some GPUs.

Installation

To install this package:

import Pkg
Pkg.add(; url="https://github.com/JuliaGPU/WeirdFloats.jl")

This requires an AMD MI300 GPU or later generation and Julia v1.12 or later (to have LLVM 18 or later, to support that generation of GPU):

julia> versioninfo()
Julia Version 1.12.0-beta4
Commit 600ac61d3d2 (2025-06-05 07:03 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 384 × AMD EPYC 9654 96-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, znver4)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 384 virtual cores)

julia> AMDGPU.versioninfo()
[ Info: AMDGPU versioninfo
┌───────────┬──────────────────┬───────────┬────────────────────────────────────────────────────────────────────────────────────────┐
│ Available │ Name             │ Version   │ Path                                                                                   │
├───────────┼──────────────────┼───────────┼────────────────────────────────────────────────────────────────────────────────────────┤
│     +     │ LLD              │ -         │ /opt/rocm/llvm/bin/ld.lld                                                              │
│     +     │ Device Libraries │ -         │ /home/cceamgi/.julia/artifacts/b46ab46ef568406312e5f500efb677511199c2f9/amdgcn/bitcode │
│     +     │ HIP              │ 6.4.43482 │ /opt/rocm/lib/libamdhip64.so                                                           │
│     +     │ rocBLAS          │ 4.4.0     │ /opt/rocm/lib/librocblas.so                                                            │
│     +     │ rocSOLVER        │ 3.28.0    │ /opt/rocm/lib/librocsolver.so                                                          │
│     +     │ rocSPARSE        │ 3.4.0     │ /opt/rocm/lib/librocsparse.so                                                          │
│     +     │ rocRAND          │ 2.10.5    │ /opt/rocm/lib/librocrand.so                                                            │
│     +     │ rocFFT           │ 1.0.32    │ /opt/rocm/lib/librocfft.so                                                             │
│     +     │ MIOpen           │ 3.4.0     │ /opt/rocm/lib/libMIOpen.so                                                             │
└───────────┴──────────────────┴───────────┴────────────────────────────────────────────────────────────────────────────────────────┘

[ Info: AMDGPU devices
┌────┬─────────────────────┬────────────────────────┬───────────┬─────────────┬───────────────┐
│ Id │                Name │               GCN arch │ Wavefront │      Memory │ Shared Memory │
├────┼─────────────────────┼────────────────────────┼───────────┼─────────────┼───────────────┤
│  1 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │    64.000 KiB │
│  2 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │    64.000 KiB │
│  3 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │    64.000 KiB │
│  4 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │    64.000 KiB │
│  5 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │    64.000 KiB │
│  6 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │    64.000 KiB │
│  7 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │    64.000 KiB │
│  8 │ AMD Instinct MI300X │ gfx942:sramecc+:xnack- │        64 │ 191.984 GiB │    64.000 KiB │
└────┴─────────────────────┴────────────────────────┴───────────┴─────────────┴───────────────┘

Examples

Downcasting to FP8 with stochasting rounding

We want to convert four Float32 numbers to the 8-bit floating point format (FP8) used on AMD GPUs (or DLFP8Types.Float8_E4M3FNUZ, using the DLFP8Types.jl package)

julia> using AMDGPU, WeirdFloats, DLFP8Types

julia> A = AMDGPU.rand(Float32, 4)
4-element ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
 0.3193292
 0.007127027
 0.77079546
 0.036067273

julia> B = AMDGPU.zeros(Int32, 1); C = AMDGPU.zeros(Int32, 1); D = AMDGPU.zeros(Int32, 1)
1-element ROCArray{Int32, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
 0

With deterministic rounding we expect to have

julia> Float8_E4M3FNUZ.(Array(A))
4-element Vector{Float8_E4M3FNUZ}:
 0.3125
 0.0068359375
 0.75
 0.03515625

Let's define a kernel which converts the elements of A to FP8 with stochastic rounding by using WeirdFloats.convert_sr, and run the kernel multiple times, storing the result in the arrays B, C, and D:

julia> @inbounds function kernel_sr(A, B)
           v = reinterpret(WeirdFloats.VFP8, Int32(0))
           v = WeirdFloats.convert_sr(v, A[1], rand(Int32), Int32(0))
           v = WeirdFloats.convert_sr(v, A[2], rand(Int32), Int32(1))
           v = WeirdFloats.convert_sr(v, A[3], rand(Int32), Int32(2))
           v = WeirdFloats.convert_sr(v, A[4], rand(Int32), Int32(3))
           B[1] = reinterpret(Int32, v)

           return nothing
       end
kernel_sr (generic function with 1 method)

julia> @device_code dir="fp8" @roc kernel_sr(A, B); B
1-element ROCArray{Int32, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
 440141618

julia> @device_code dir="fp8" @roc kernel_sr(A, C); C
1-element ROCArray{Int32, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
 423364658

julia> @device_code dir="fp8" @roc kernel_sr(A, D); D
1-element ROCArray{Int32, 1, AMDGPU.Runtime.Mem.HIPBuffer}:
 423364403

The four output FP8 numbers are packed in the one-element Float32 arrays B, C, and D, we can unpack them to see their values:

julia> for shift in (0, 8, 16, 24); println(reinterpret(Float8_E4M3FNUZ, (Array(B)[1] >> shift & 0xff) % UInt8)); end
0.3125
0.0068359375
0.75
0.0390625

julia> for shift in (0, 8, 16, 24); println(reinterpret(Float8_E4M3FNUZ, (Array(C)[1] >> shift & 0xff) % UInt8)); end
0.3125
0.0078125
0.75
0.03515625

julia> for shift in (0, 8, 16, 24); println(reinterpret(Float8_E4M3FNUZ, (Array(D)[1] >> shift & 0xff) % UInt8)); end
0.34375
0.0068359375
0.75
0.03515625

We can see that the elements of A have not been downcast deterministically to FP8 according to the expected result shown above, but in some cases they were rounded to the other adject number in FP8.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages