[Llvm-bgq-discuss] address sanitizer

Sat Jul 6 16:49:11 CDT 2013

Hello everyone,

One of my motivations for working on LLVM/Clang for the BG/Q was to enable use of the sanitizer debugging projects on the BG/Q. Over the last couple of weeks, I've taken initial steps toward that goal. For those of you who don't know, address sanitizer is a tool for detecting memory allocation and use errors: use-after-free, double-free stack and heap overruns, etc. Because this works using instrumentation, address sanitizer, in general, has much lower overhead than tools like valgrind (which reply on processor emulation). This should make it feasible to use address sanitizer to debug memory misuse errors on the Q, including those which only show up at scale.

To use this feature, pass -fsanitize=address to the compiler (when compiling and linking). I highly recommend using at least -O1 (if not -O3) and -g as well.

Address sanitizer requires dynamic linking. When you provide -fsanitize=address, the wrapper script will automatically switch into non-static-linking mode. I've made a number of improvements to the wrapper scripts (both bgclang and the mpi scripts), and small fix to Clang that applies to dynamically linking in C++11 mode, to better support dynamic linking. In short, hopefully this will now all *just work*.

Because of a limitation of the LLVM PowerPC backend (it cannot do dynamic stack realignment yet), the ability of the current build to detect stack overruns is limited. I'll make the necessary improvements in the PowerPC backend soon, and then the ability to detect stack overruns will be the same as on other platforms.

Some details on overheads: address sanitizer introduces runtime and memory overheads in several different ways. First, the runtime allocates a 'shadow' memory region which it uses to record state information on allocated memory regions. As I have it configured, this uses 1 byte of 'shadow' memory for every 32 bytes. The 'normal' upstream address sanitizer uses a simple mapping between addresses and 'shadow' bytes. Unfortunately, due to limitations imposed by CNK on virtual memory use and mapping, I've had to divide this shadow region into three distinct pieces (one for segments from the executable image, one for the heap/stack, and one for things in /dev/shm). Selecting between these regions introduces an extra penalty from the instrumentation. Nevertheless, the additional overhead does not seem too bad. Also, because of CNK restrictions, this shadow area needs to be allocated somewhere in the heap/stack region. I'm currently placing it in the middle, so if your application currently uses more than 8GB of stack in c1 mode, for example, then this configuration won't work for you. 'red zones' are also allocated around every heap allocation and stack variable, further increasing the memory overhead. I've tried this on HACC, and the runtime slowdown on various stages was between 3x and 50x. If you're code spends 95% of its time in dgemm, however, you'll probably not notice anything ;)

For those maintaining their own installs:
As of yet, I've not bumped the version number of the install (I'll do that the next time I rebase). Nevertheless, there are obviously new parts of the patchset, build scripts, etc. You'll find a new archive (-v2) on the trac page https://trac.alcf.anl.gov/projects/llvm-bgq -- please note: do not checkout compiler-rt into the llvm/projects subdirectory as you would for a normal build (and as specified on the clang web page). The compiler-rt library needs to be cross-compiled using the bgclang-wrapped compiler. Just checkout compiler-rt into its own top-level directory, create an empty build directory for it, and use the build script in the archive (after adjusting paths as appropriate).

Happy bug hunting! (and please let me know if you encounter any problems).

 -Hal

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory