By: Blog on JuliaGPU
Re-posted from: https://juliagpu.org/2020-11-05-oneapi_0.1/
We’re proud to announce the first version of oneAPI.jl, a Julia package for programming
accelerators with the oneAPI programming model. It is currently
available for select Intel GPUs, including common integrated ones, and offers a similar
experience to CUDA.jl.
The initial version of this package, v0.1, consists of three key components:
- wrappers for the oneAPI Level Zero interfaces;
- a compiler for Julia source code to SPIR-V IR;
- and an array interface for convenient data-parallel programming.
In this post, I’ll briefly describe each of these. But first, some essentials.
Installation
oneAPI.jl is currently only supported on 64-bit Linux, using a sufficiently recent kernel,
and requires Julia 1.5. Furthermore, it currently only supports a limited set of Intel GPUs:
Gen9 (Skylake, Kaby Lake, Coffee Lake), Gen11 (Ice Lake), and Gen12 (Tiger Lake).
If your Intel CPU has an integrated GPU supported by oneAPI, you can just go ahead and
install the oneAPI.jl package:
That’s right, no additional drivers required! oneAPI.jl ships its own copy of the Intel
Compute Runtime, which works out of the box on
any (sufficiently recent) Linux kernel. The initial download, powered by Julia’s artifact
subsystem, might take a while to complete. After that, you can import the package and start
using its functionality:
julia> using oneAPI
julia> oneAPI.versioninfo()
Binary dependencies:
- NEO_jll: 20.42.18209+0
- libigc_jll: 1.0.5186+0
- gmmlib_jll: 20.3.2+0
- SPIRV_LLVM_Translator_jll: 9.0.0+1
- SPIRV_Tools_jll: 2020.2.0+1
Toolchain:
- Julia: 1.5.2
- LLVM: 9.0.1
1 driver:
- 00007fee-06cb-0a10-1642-ca9f01000000 (v1.0.0, API v1.0.0)
1 device:
- Intel(R) Graphics Gen9
The oneArray
type
Similar to CUDA.jl’s CuArray
type, oneAPI.jl provides an array abstraction that you can
use to easily perform data parallel operations on your GPU:
julia> a = oneArray(zeros(2,3))
2×3 oneArray{Float64,2}:
0.0 0.0 0.0
0.0 0.0 0.0
julia> a .+ 1
2×3 oneArray{Float64,2}:
1.0 1.0 1.0
1.0 1.0 1.0
julia> sum(ans; dims=2)
2×1 oneArray{Float64,2}:
3.0
3.0
This functionality builds on the GPUArrays.jl
package, which means that a lot of operations are supported out of the box. Some are still
missing, of course, and we haven’t carefully optimized for performance either.
Kernel programming
The above array operations are made possible by a compiler that transforms Julia source code
into SPIR-V IR for use with oneAPI. Most of this work is part of
GPUCompiler.jl. In oneAPI.jl, we use this
compiler to provide a kernel programming model:
julia> function vadd(a, b, c)
i = get_global_id()
@inbounds c[i] = a[i] + b[i]
return
end
julia> a = oneArray(rand(10));
julia> b = oneArray(rand(10));
julia> c = similar(a);
julia> @oneapi items=10 vadd(a, b, c)
julia> @test Array(a) .+ Array(b) == Array(c)
Test Passed
Again, the @oneapi
macro resembles @cuda
from CUDA.jl. One of the differences with the
CUDA stack is that we use OpenCL-style built-ins, like get_global_id
instead of
threadIdx
and barrier
instead of sync_threads
. Other familiar functionality, e.g. to
reflect on the compiler, is available as well:
julia> @device_code_spirv @oneapi vadd(a, b, c)
; CompilerJob of kernel vadd(oneDeviceArray{Float64,1,1}, oneDeviceArray{Float64,1,1},
; oneDeviceArray{Float64,1,1}) for GPUCompiler.SPIRVCompilerTarget
; SPIR-V
; Version: 1.0
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 46
; Schema: 0
OpCapability Addresses
OpCapability Linkage
OpCapability Kernel
OpCapability Float64
OpCapability Int64
OpCapability Int8
%1 = OpExtInstImport "OpenCL.std"
OpMemoryModel Physical64 OpenCL
OpEntryPoint Kernel
...
OpReturn
OpFunctionEnd
Level Zero wrappers
To interface with the oneAPI driver, we use the Level Zero
API. Wrappers for this API is available under the
oneL0
submodule of oneAPI.jl:
julia> using oneAPI.oneL0
julia> drv = first(drivers())
ZeDriver(00000000-0000-0000-1642-ca9f01000000, version 1.0.0)
julia> dev = first(devices(drv))
ZeDevice(GPU, vendor 0x8086, device 0x1912): Intel(R) Graphics Gen9
This is a low-level interface, and importing this submodule should not be required for the
vast majority of users. It is only useful when you want to perform very specific operations,
like submitting an certain operations to the command queue, working with events, etc. In
that case, you should refer to the upstream
specification; The wrappers in the
oneL0
module closely mimic the C APIs.
Status
Version 0.1 of oneAPI.jl forms a solid base for future oneAPI developments in Julia. Thanks
to the continued effort of generalizing the Julia GPU support in packages like GPUArrays.jl
and GPUCompiler.jl, this initial version is already much more usable than early versions of
CUDA.jl or AMDGPU.jl ever were.
That said, there are crucial parts missing. For one, oneAPI.jl does not integrate with any
of the vendor libraries like oneMKL or oneDNN. That means several important operations, e.g.
matrix-matrix multiplication, will be slow. Hardware support is also limited, and the
package currently only works on Linux.
If you want to contribute to oneAPI.jl, or run into problems, check out the GitHub
repository at JuliaGPU/oneAPI.jl. For questions,
please use the Julia Discourse forum under
the GPU domain and/or in the #gpu channel of the Julia
Slack.