[Llvm-bgq-discuss] thread local crash

Geoffrey Irving irving at naml.us
Sat Feb 2 22:56:36 CST 2013


I've confirmed that switching to manual pthread calls instead of
__thread fixes the problem.  I'm good to go, but let me know if you
want me to investigate the crashing version further.

Geoffrey

On Sat, Feb 2, 2013 at 7:58 PM, Geoffrey Irving <irving at naml.us> wrote:
> I'm getting a crash on the login nodes which seems to be related to
> thread local variables.  First, am I correct that
>
>     /home/projects/llvm/current/bin/clang -O2
>
> should work fine on the login nodes?  If it shouldn't work on the
> login nodes, you can stop reading here.
>
> The following details are fairly incomplete, since I haven't made any
> attempt to isolate or minimize the problem yet.  I can work through
> that if what I'm doing (__thread on the login nodes) is indeed
> supposed to work.
>
> On the compute nodes, using mpiclang -O3, everything seems to be working fine.
>
> This is not critical, since (1) it only breaks the fast post-compute
> verification, which can be run elsewhere or in debug mode and (2) I
> can presumably work around it with manual pthread calls.
>
> Thanks,
> Geoffrey
>
> ------------------------------------------------
>
> The function in question looks like
>
> // Thread local temporary buffer for local compression and decompression.
> static inline RawArray<Vector<super_t,2>> local_buffer() {
>   const int count =
> ceil_div(raw_max_fast_compressed_size,sizeof(Vector<super_t,2>));
>   static __thread Vector<super_t,2>* buffer = 0;
>   if (!buffer) {
>     buffer = (Vector<super_t,2>*)malloc(sizeof(Vector<super_t,2>)*count);
>     if (!buffer)
>       die("local_fast_compress/uncompress: failed to allocate thread
> local buffer of size %zu",sizeof(Vector<super_t,2>)*count);
>   }
>   return RawArray<Vector<super_t,2>>(count,buffer);
> }
>
> It works fine in unoptimized debug mode (-g, no -O), though I do get a
> warning during link:
>
> scons: building associated VariantDir targets: build/powerpc/debug
> /home/projects/llvm/current/bin/clang -o
> build/powerpc/debug/old/endgame.os -c -U__GXX_EXPERIMENTAL_CXX0X__
> -mcmodel=small -g -Wall -Winit-self -Woverloaded-virtual
> -Wsign-compare -fno-strict-aliasing -std=c++11 -Werror
> -Wno-array-bounds -Wno-unknown-pragmas -Wno-invalid-offsetof -fPIC
> -DOTHER_PYTHON -DOTHER_THREAD_SAFE=1 -DBOOST_EXCEPTION_DISABLE
> -DBUILDING_pentago_core -I/usr/include/python2.6
> -I/home/irving/.local/lib/python2.6/site-packages/numpy/core/include
> -Ibuild/include -Ibuild/powerpc/debug -I.
> -I/home/irving/download/boost_1_52_0 old/endgame.cpp
> /home/projects/llvm/current/bin/clang -o
> build/powerpc/debug/libpentago_core.so -shared -g -shared
> -Wl,-rpath=/home/irving/lib
> -Wl,-rpath=/gpfs/vesta_home/irving/otherlab/other/install/debug/lib
> build/powerpc/debug/bin/analyze.os
> build/powerpc/debug/data/block_cache.os
> build/powerpc/debug/data/compress.os
> build/powerpc/debug/data/filter.os
> build/powerpc/debug/data/supertensor.os build/powerpc/debug/module.os
> build/powerpc/debug/search/engine.os
> build/powerpc/debug/search/stat.os
> build/powerpc/debug/search/superengine.os
> build/powerpc/debug/search/supertable.os
> build/powerpc/debug/search/table.os
> build/powerpc/debug/search/trace.os
> build/powerpc/debug/end/block_store.os
> build/powerpc/debug/end/check.os build/powerpc/debug/end/compute.os
> build/powerpc/debug/end/fast_compress.os
> build/powerpc/debug/end/history.os build/powerpc/debug/end/line.os
> build/powerpc/debug/end/load_balance.os
> build/powerpc/debug/end/partition.os
> build/powerpc/debug/end/predict.os
> build/powerpc/debug/end/random_partition.os
> build/powerpc/debug/end/sections.os
> build/powerpc/debug/end/simple_partition.os
> build/powerpc/debug/end/sparse_store.os
> build/powerpc/debug/end/store_block_cache.os
> build/powerpc/debug/base/all_boards.os
> build/powerpc/debug/base/board.os build/powerpc/debug/base/count.os
> build/powerpc/debug/base/hash.os build/powerpc/debug/base/moves.os
> build/powerpc/debug/base/score.os build/powerpc/debug/base/section.os
> build/powerpc/debug/base/superscore.os
> build/powerpc/debug/base/symmetry.os
> build/powerpc/debug/utility/aligned.os
> build/powerpc/debug/utility/debug.os
> build/powerpc/debug/utility/large.os
> build/powerpc/debug/utility/memory.os
> build/powerpc/debug/utility/mmap.os
> build/powerpc/debug/utility/convert.os
> build/powerpc/debug/utility/thread.os
> build/powerpc/debug/old/endgame.os build/powerpc/debug/gen/tables.os
> -L/usr/lib64 -L/home/irving/lib
> -L/gpfs/vesta_home/irving/otherlab/other/install/debug/lib -llzma
> -lpython2.6 -lz -lsnappy -lother_core
> /usr/bin/ld: build/powerpc/debug/end/fast_compress.os(.debug_info+0x2510):
> R_PPC64_ADDR64 used with TLS symbol
> _ZZN7pentago3endL12local_bufferEvE6buffer
>
> In release mode (-O2, no -g), I get no linker warning, but it seems to
> be crashed in that function:
>
> /home/projects/llvm/current/bin/clang -o
> build/powerpc/release/old/endgame.os -c -U__GXX_EXPERIMENTAL_CXX0X__
> -mcmodel=small -O2 -Wall -Winit-self -Woverloaded-virtual
> -Wsign-compare -fno-strict-aliasing -std=c++11 -Werror
> -Wno-array-bounds -Wno-unknown-pragmas -Wno-invalid-offsetof -fPIC
> -DOTHER_PYTHON -DNDEBUG -DOTHER_THREAD_SAFE=1
> -DBOOST_EXCEPTION_DISABLE -DBUILDING_pentago_core
> -I/usr/include/python2.6
> -I/home/irving/.local/lib/python2.6/site-packages/numpy/core/include
> -Ibuild/include -Ibuild/powerpc/release -I.
> -I/home/irving/download/boost_1_52_0 old/endgame.cpp
> /home/projects/llvm/current/bin/clang -o
> build/powerpc/release/libpentago_core.so -shared -shared
> -Wl,-rpath=/home/irving/lib
> -Wl,-rpath=/gpfs/vesta_home/irving/otherlab/other/install/release/lib
> build/powerpc/release/bin/analyze.os
> build/powerpc/release/data/block_cache.os
> build/powerpc/release/data/compress.os
> build/powerpc/release/data/filter.os
> build/powerpc/release/data/supertensor.os
> build/powerpc/release/module.os build/powerpc/release/search/engine.os
> build/powerpc/release/search/stat.os
> build/powerpc/release/search/superengine.os
> build/powerpc/release/search/supertable.os
> build/powerpc/release/search/table.os
> build/powerpc/release/search/trace.os
> build/powerpc/release/end/block_store.os
> build/powerpc/release/end/check.os
> build/powerpc/release/end/compute.os
> build/powerpc/release/end/fast_compress.os
> build/powerpc/release/end/history.os build/powerpc/release/end/line.os
> build/powerpc/release/end/load_balance.os
> build/powerpc/release/end/partition.os
> build/powerpc/release/end/predict.os
> build/powerpc/release/end/random_partition.os
> build/powerpc/release/end/sections.os
> build/powerpc/release/end/simple_partition.os
> build/powerpc/release/end/sparse_store.os
> build/powerpc/release/end/store_block_cache.os
> build/powerpc/release/base/all_boards.os
> build/powerpc/release/base/board.os
> build/powerpc/release/base/count.os build/powerpc/release/base/hash.os
> build/powerpc/release/base/moves.os
> build/powerpc/release/base/score.os
> build/powerpc/release/base/section.os
> build/powerpc/release/base/superscore.os
> build/powerpc/release/base/symmetry.os
> build/powerpc/release/utility/aligned.os
> build/powerpc/release/utility/debug.os
> build/powerpc/release/utility/large.os
> build/powerpc/release/utility/memory.os
> build/powerpc/release/utility/mmap.os
> build/powerpc/release/utility/convert.os
> build/powerpc/release/utility/thread.os
> build/powerpc/release/old/endgame.os
> build/powerpc/release/gen/tables.os -L/usr/lib64 -L/home/irving/lib
> -L/gpfs/vesta_home/irving/otherlab/other/install/release/lib -llzma
> -lpython2.6 -lz -lsnappy -lother_core
>
> I'm not completely sure this is the location, since gdb bails out with
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000fffb0e782f4 in
> ._ZN7pentago3end21local_fast_uncompressEN5other8RawArrayIKhLi1EEEm ()
> from /home/irving/otherlab/other/install/flavor/lib/pentago_core.so
> During symbol reading, incomplete CFI data; unspecified registers
> (e.g., r2) at 0xfffb0e78264.
> [Thread 0xffeed7ff200 (LWP 9468) exited]
> ...
> [Thread 0xfffb074f200 (LWP 9419) exited]
>
> Program terminated with signal SIGSEGV, Segmentation fault.
> The program no longer exists.
> Missing separate debuginfos, use: debuginfo-install python-2.6.6-29.el6.ppc64


More information about the llvm-bgq-discuss mailing list