By: Tim Besard
Re-posted from: https://blog.maleadt.net/2017/02/24/julia-asan/
Address sanitizer is a useful tool for
debugging various memory problems, from invalid accesses to mismanagement or leaks. It is
similar to Valgrind’s
memcheck, but uses compile-time
instrumentation to lower the cost.
In this post I’ll explain how to use Clang’s address sanitizer (or ASAN) with Julia. This is
somewhat tricky, as the Julia compiler uses LLVM for code generation purposes. Long story
short, this implies that all instances of LLVM (ie. the one Julia is compiled with, and the
one used for code generation) have to match up exactly for the instrumentation to work as
expected.
LLVM toolchain
We’ll start by building a toolchain to compile Julia with. As mentioned before, all LLVM
instances in play have to match up exactly for instrumentation to work, so we’ll use Julia’s
build infrastructure to generate us an LLVM toolchain.
Start by checking-out Julia, and creating an out-of-tree build directory:
$ git clone https://github.com/JuliaLang/julia
$ cd julia
$ make O=configure sanitize_toolchain
This build will need to provide clang
, so create a Make.user
containing
BUILD_LLVM_CLANG=1
. In addition, LLVM does not build its sanitizers with autotools, so add
override LLVM_USE_CMAKE=1
to that file as well. And because that triggers LLVM bug
#23649, also add USE_LLVM_SHLIB=0
Now execute make install-llvm
from the deps
subfolder. When it
finishes, check if binaries have been written to usr/bin
(due to what’s probably a bug in
LLVM’s build scripts), and move them to usr/tools
if they have.
Sanitized Julia
Now that we have a working toolchain, we’ll use it to compile a sanitized version of the
Julia compiler and libraries. Start by creating a new out-of-tree build directory using
make O=configure sanitize
. But this time, our Make.user
will be significantly more
complex:
TOOLCHAIN=$(BUILDROOT)/sanitize_toolchain/usr/tools
# use our new toolchain
USECLANG=1
override CC=$(TOOLCHAIN)/clang
override CXX=$(TOOLCHAIN)/clang++
export ASAN_SYMBOLIZER_PATH=$(TOOLCHAIN)/llvm-symbolizer
# enable ASAN
override SANITIZE=1
override LLVM_SANITIZE=1
# autotools doesn't have a self-sanitize mode
override LLVM_USE_CMAKE=1
# make the GC use regular malloc/frees, which are intercepted by ASAN
override WITH_GC_DEBUG_ENV=1
# default to a debug build for better line number reporting
override JULIA_BUILD_MODE=debug
Now kick-off the build using make
from the sanitize
build directory. Barring any memory
issues triggered during system image generation, this should yield a sanitized julia
binary and system image.
Running the test-suite
The test-suite is a beast, and because ASAN keeps track of a lot of information it easily
takes over 128GiB of memory to run it to completion. Instead, we’ll tune ASAN to consume
less memory at the expense of accuracy and report detail.
Julia however already configures default ASAN
options,
which we need to copy when specifying a different set. Do so by defining the
ASAN_OPTIONS
environment variable and assigning it the value of
detect_leaks=0:allow_user_segv_handler=1:fast_unwind_on_malloc=0:malloc_context_size=2
.
This copies aforementioned default values, and caps backtrace collection.
Using CUDA packages
If you thought all that was convoluted, prepare for some more. ASAN uses so-called shadow memory to store
information about memory allocations. There is a correspondence between regular memory
addresses and their shadow counterpart, and this mapping is
fixed
in order to keep the instrumentation overhead
low. Sadly, the default shadow memory
location overlaps with fixed memory allocated by CUDA (presumably for its unified virtual
address
space).
Because the shadow memory is fixed, we need to patch both instances of LLVM (easiest to add
a patch to llvm.mk
) and have it pick a different shadow offset:
--- lib/Transforms/Instrumentation/AddressSanitizer.cpp
+++ lib/Transforms/Instrumentation/AddressSanitizer.cpp
@@ -359,7 +359,7 @@
if (IsKasan)
Mapping.Offset = kLinuxKasan_ShadowOffset64;
else
- Mapping.Offset = kSmallX86_64ShadowOffset;
+ Mapping.Offset = kDefaultShadowOffset64;
} else if (IsMIPS64)
Mapping.Offset = kMIPS64_ShadowOffset64;
else if (IsAArch64)
--- projects/compiler-rt/lib/asan/asan_mapping.h
+++ projects/compiler-rt/lib/asan/asan_mapping.h
@@ -146,7 +146,7 @@
# elif SANITIZER_IOS
# define SHADOW_OFFSET kIosShadowOffset64
# else
-# define SHADOW_OFFSET kDefaultShort64bitShadowOffset
+# define SHADOW_OFFSET kDefaultShadowOffset64
# endif
# endif
#endif
Note that you might need to redefine a different macro for your platform.
Sanitizing older versions of Julia
If you want to sanitize older versions of Julia, before the switch to LLVM
3.9, there’s yet other issues: only LLVM
3.9 is compatible with recent versions of
glibc, while the CMake build system of LLVM 3.7 doesn’t
export all necessary public symbols. You can
work around these issues by using a sufficiently old system, and overriding the LLVM version
to 3.8 (by specifying override LLVM_VER=3.8.1
in the Make.user
of both build
directories) or preventing it from generating a shared library (by specifying
USE_LLVM_SHLIB=0
in the Make.user
of the final Julia build).