[Llvm-bgq-discuss] trouble with latest clang install

Thomas Heller thom.heller at gmail.com
Fri Feb 14 14:02:27 CST 2014


Hi all,

Ok, I think i tracked it down. 
If my suspicions are correct, the segfault isn't caused by bgclang or hpx 
directly. It looks like parts of boost can't deal with locales correctly on 
John's system. Here is how it happens:
On a regular BGQ compute node, you don't have interactive access and i think 
no locale information available. However, John's scenario is slightly 
different:
1) He uses SLURM to get on the nodes (interactively or through batch jobs)
2) He uses the BGAS nodes directly

Now, using 1) has the implication of a feature of SLURM which makes the bash 
it spawns once the job has enough resources inherit all the environment 
variables the job submission had set (this includes LANG. LC_*). It looks like 
some flavors of linux (especially in the embedded world) have a problem with 
this. I ran into a similar problem when porting HPX to the Xeon Phi. 
Everything was working nicely on our local machine (no job control, direct 
access through ssh etc.). I then moved on to Stampede, when logging into one 
of the Phis directly, everything still worked great. But only until i stopped 
using an interactive mode and started to submit jobs through the batch system. 
Which lead to similar problems John is running into right now ...
About 2) ... I am not exactly sure how this is related to the problem at hand 
...

Anyway, I was able to reproduce the problem on one of the CNK based compute 
nodes on JUQUEEN by using this jobscript:
# @ job_name = HPX_Hello_World
# @ comment = "HPX Hello World testrun"
# @ error = $(job_name).$(jobid).err
# @ output = $(job_name).$(jobid).out
# @ environment = COPY_ALL
# @ wall_clock_limit = 00:30:00
# @ notification = error
# @ notify_user = thom.heller at gmail.com
# @ job_type = bluegene
# @ bg_size = 32
# @ queue

APP="$HOME/build/hpx/debug/bin/hello_world"

ENVS="LANG=en_US LC_CTYPE=\"en_US\" LC_NUMERIC=\"en_US\" LC_TIME=en_GB 
LC_COLLATE=\"en_US\" LC_MONETARY=\"en_US\" LC_MESSAGES=\"en_US\" 
LC_PAPER=\"en_US\" LC_NAME=\"en_US\" LC_ADDRESS=\"en_US\" 
LC_TELEPHONE=\"en_US\" LC_MEASUREMENT=\"en_US\" LC_IDENTIFICATION=\"en_US\" 
LC_ALL=\"en_US\""

runjob --ranks-per-node 1 --exe $APP --args "-t1" --envs $ENVS

Which lead to the exact same error. What I am unsure about though is who's 
fault it is. The stack trace John posted earlier comes out of the static 
section of the binary which initializes some globals out of the boost 
filesystem library. So we have three candidates: 1) Boost.Filesystem 2) libc++ 
3) the libc/posix on the BGAS node.

The solution to this problem is btw to unset all those environment variables.
I commited a fix for HPX for working around this problem which should not 
require to manually unset those environment variables 
(https://github.com/STEllAR-GROUP/hpx/commit/65ce125466ae43e68e19e89b3e50ece0721786de).
Thanks for the patience.

Regards,
Thomas

On Friday, February 14, 2014 12:52:24 Biddiscombe, John A. wrote:
> Hal
> 
> Apologies, I didn’t realize I was using the wrong wrapper.
> 
> I recompiled using the bgclang++11 wrapper and things work much better.
> I first compiled boost ok, but had trouble linking to it - I ran into the
> cxxABI link error with boost program_options:: __1 etc etc
> 
> After a bit of goggling around explained to me the std c++ lib issues, so
> I had another go using the following settings …
> 
> export CC=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang
> export CXX=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang++11
> export PATH=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin:$PATH
> 
> I found some info about building boost with clang and followed
> instructions here 
> http://stackoverflow.com/questions/11081818/linking-troubles-with-boostprog
> ram-options-on-osx-using-llvm?lq=1
> I modified tools/build/v2/user-config.jam to include the clang-11 option
> using clang : 11
> 
>     : "/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang++11"
>     : <cxxflags>"-std=c++11 -stdlib=libc++ -ftemplate-depth=512"
> 
> <linkflags>"-stdlib=libc++"
>     ;
> 
> 
> And then proceeded to building boost using the following commands
> ./bootstrap.sh --with-toolset=clang-11
> ./b2 -j 16 toolset=clang-11 cxxflags="-fPIC" --threading=multi
> --without-mpi --without-python
> --prefix=/gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0
> 
> And boost compiles fine.
> "The Boost C++ Libraries were successfully built!"
> 
> To test, I compiled the boost serialisation demo from this page
> http://www.boost.org/doc/libs/1_42_0/libs/serialization/example/demo.cpp
> And also a simple boost::program_options demo and boost::filesystem demo
> they all run fine
> 
> Thank you very much for the help and all the work you’ve put in getting
> the clang stuff running..
> 
> But…
> 
> when I run simple demos from the HPX library
> 
> bbpbg2:~/bgas/build/hpx$ bin/hello_world
> terminate called after throwing an instance of 'std::__1::runtime_error'
>   what():  collate_byname<char>::collate_byname failed to construct for
> Aborted (core dumped)
> 
> 
> gdb shows me a trace …
> (gdb) where
> #0  0x00000fffb3458c5c in raise (sig=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:67
> #1  0x00000fffb345abd4 in abort () at abort.c:92
> #2  0x00000fffb3aa7b00 in __gnu_cxx::__verbose_terminate_handler ()
>     at 
> /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/libsupc++/
> vterminate.cc:93
> #3  0x00000fffb3aa4d74 in __cxxabiv1::__terminate (handler=<value
> optimized out>)
>     at 
> /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/libsupc++/
> eh_terminate.cc:38
> #4  0x00000fffb3aa4db8 in std::terminate () at
> /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/libsupc++/
> eh_terminate.cc:48
> #5  0x00000fffb47b1c14 in .__clang_call_terminate () from
> /gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0/lib/libboost_filesy
> stem.so.1.54.0
> #6  0x00000fffb47b48a0 in ._ZNK5boost10filesystem4path7compareERKS1_ ()
>    from 
> /gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0/lib/libboost_filesy
> stem.so.1.54.0
> Backtrace stopped: frame did not save the PC
> 
> 
> It looks very suspicious as there are some stdlib++ appearances in there.
> 
> Does anything here give you any idea of what might have gone wrong. I’ve
> tried a number of rebuilds and the error persists, whilst simple demos run
> ok. I’m not sure where to look to diagnose what’s up (I’ve contacted the
> HPX people as well). One question is why the shared clang libc++ links to
> the stdlibc++ one. If I do an
> 
> bbpbg2:~/bgas/build/c++test$ ldd
> /gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/libc++/lib/libc++.so.1.0
> 
> 	linux-vdso64.so.1 =>  (0x00000fff9ad40000)
> 	libpthread.so.0 =>
> /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libpthread.so
> .0 (0x00000fff9ab00000)
> 	librt.so.1 => 
> /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/librt.so.1
> (0x00000fff9a9d0000)
> 	libc.so.6 => 
> /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libc.so.6
> (0x00000fff9a790000)
> 	libstdc++.so.6 => 
> /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libstdc++.so.
> 6 (0x00000fff9a550000)
> 	/lib64/ld64.so.1 (0x0000000032420000)
> 	libm.so.6 => 
> /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libm.so.6
> (0x00000fff9a430000)
> 	libgcc_s.so.1 => 
> /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libgcc_s.so.1
> (0x00000fff9a320000)
> 
> 
> It seems odd. Could this be causing the trouble? (the demos run fine
> though, so I guess not).
> 
> Anyway, I’ll keep poking around, if anything comes to mind, I’m grateful
> for help.
> 
> Thanks
> 
> JB
>   
> 
> _______________________________________________
> llvm-bgq-discuss mailing list
> llvm-bgq-discuss at lists.alcf.anl.gov
> https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss



More information about the llvm-bgq-discuss mailing list