By: Blog on JuliaGPU
Re-posted from: https://juliagpu.org/2020-07-18-cuda_1.3/
Today we’re releasing CUDA.jl 1.3, with several new features. The most prominent
change is support for multiple GPUs within a single process.
Multi-GPU programming
With CUDA.jl 1.3, you can finally use multiple CUDA GPUs within a single process. To switch
devices you can call device!
, query the current device with device()
, or reset it using
device_reset!()
:
julia> collect(devices())
9-element Array{CuDevice,1}:
CuDevice(0): Tesla V100-PCIE-32GB
CuDevice(1): Tesla V100-PCIE-32GB
CuDevice(2): Tesla V100-PCIE-32GB
CuDevice(3): Tesla V100-PCIE-32GB
CuDevice(4): Tesla V100-PCIE-16GB
CuDevice(5): Tesla P100-PCIE-16GB
CuDevice(6): Tesla P100-PCIE-16GB
CuDevice(7): GeForce GTX 1080 Ti
CuDevice(8): GeForce GTX 1080 Ti
julia> device!(5)
julia> device()
CuDevice(5): Tesla P100-PCIE-16GB
Let’s define a kernel to show this really works:
julia> function kernel()
dev = Ref{Cint}()
CUDA.cudaGetDevice(dev)
@cuprintln("Running on device $(dev[])")
return
end
julia> @cuda kernel()
Running on device 5
julia> device!(0)
julia> device()
CuDevice(0): Tesla V100-PCIE-32GB
julia> @cuda kernel()
Running on device 0
Memory allocations, like CuArray
s, are implicitly bound to the device they
were allocated on. That means you should take care to only use an array when the
owning device is active, or you will run into errors:
julia> device()
CuDevice(0): Tesla V100-PCIE-32GB
julia> a = CUDA.rand(1)
1-element CuArray{Float32,1}:
0.6322775
julia> device!(1)
julia> a
ERROR: CUDA error: an illegal memory access was encountered
Future improvements might make the array type device-aware.
Multitasking and multithreading
Dovetailing with the support for multiple GPUs, is the ability to use these GPUs
on separate Julia tasks and threads:
julia> device!(0)
julia> @sync begin
@async begin
device!(1)
println("Working with $(device()) on $(current_task())")
yield()
println("Back to device $(device()) on $(current_task())")
end
@async begin
device!(2)
println("Working with $(device()) on $(current_task())")
end
end
Working with CuDevice(1) on Task @0x00007fc9e6a48010
Working with CuDevice(2) on Task @0x00007fc9e6a484f0
Back to device CuDevice(1) on Task @0x00007fc9e6a48010
julia> device()
CuDevice(0): Tesla V100-PCIE-32GB
Each task has its own local GPU state, such as the device it was bound to,
handles to libraries like CUBLAS or CUDNN (which means that each task can
configure libraries independently), etc.
Minor features
CUDA.jl 1.3 also features some minor changes:
- Reinstated compatibility with Julia 1.3
- Support for CUDA 11.0 Update 1
- Support for CUDNN 8.0.2
Known issues
Several operations on sparse arrays have been broken since CUDA.jl 1.2, due to
the deprecations that were part of CUDA 11. The next version of CUDA.jl will
drop support for CUDA 10.0 or older, which will make it possible to use new
cuSPARSE APIs and add back missing functionality.