oneAPI.jl 1.5: Ponte Vecchio support and oneMKL improvements

By: Tim Besard

Re-posted from: https://juliagpu.org/post/2024-05-24-oneapi_1.5/index.html

oneAPI.jl v1.5 is a significant release that brings many new features, from extended hardware support to greatly improved wrappers of the oneMLK math library.

Intel Ponte Vecchio

In oneAPI.jl v1.5 we introduce support for the Intel Ponte Vecchio (PVC) architecture, which empowers the Xe HPC GPUs as found in the Aurora supercomputer:

julia> oneAPI.versioninfo()
Binary dependencies:
- NEO: 24.13.29138+0
- libigc: 1.0.16510+0
- gmmlib: 22.3.18+0
- SPIRV_LLVM_Translator_unified: 0.4.0+0
- SPIRV_Tools: 2023.2.0+0Toolchain:
- Julia: 1.10.3
- LLVM: 15.0.71 driver:
- 00000000-0000-0000-17d2-6b1e010371d2 (v1.3.29138, API v1.3.0)16 devices:
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550

Apart from a handful of MKL-related issues, oneAPI.jl is fully functional on PVC, and passes all tests.

oneMLK wrappers

Thanks to the work of @amontoison, oneAPI.jl now provides greatly improved wrappers of the oneMKL library. This includes support for:

  • LAPACK: geqrf(_batched), orgqr(_batched), ormqr, potrf(_batched), potrs(_batched), getrf(_batched), getri(_batched), gebrd, gesvd, syevd, heevd, sygvd, hegvd

  • Sparse arrays: sparse_gemm, sparse_gemv, sparse_symv, sparse_trmv, sparse_trsv, sparse_optimize_gemv, sparse_optimize_trsv

Where possible, these functions are integrated with standard library interfaces, e.g., making it possible to simply call eigen, or to multiply two oneSparseMatrixCSRs.

Minor changes

There have of course been many other changes and improvements in oneAPI.jl v1.5. For a full list, please refer to the release notes, but some highlights include:

  • a new launch configuration heuristic that should generally improve performance;

  • broadcast now preserves the buffer type (host, device, or shared);

  • support for very large arrays that exceed the default device memory limit;

  • several toolchain bumps, with v1.5 using oneAPI 2024.1.0 with driver 24.13.29138.7;

  • minimal support for native Windows (next to WSL, which is fully supported).