[Llvm-bgq-discuss] trouble with latest clang install

Thu Feb 20 13:38:54 CST 2014

----- Original Message -----
> From: "Thomas Heller" <thom.heller at gmail.com>
> To: "John A. Biddiscombe" <biddisco at cscs.ch>
> Cc: llvm-bgq-discuss at lists.alcf.anl.gov, "Hal Finkel" <hfinkel at anl.gov>
> Sent: Friday, February 14, 2014 2:02:27 PM
> Subject: Re: [Llvm-bgq-discuss] trouble with latest clang install
> 
> Hi all,
> 
> Ok, I think i tracked it down.
> If my suspicions are correct, the segfault isn't caused by bgclang or
> hpx
> directly. It looks like parts of boost can't deal with locales
> correctly on
> John's system. Here is how it happens:
> On a regular BGQ compute node, you don't have interactive access and
> i think
> no locale information available. However, John's scenario is slightly
> different:
> 1) He uses SLURM to get on the nodes (interactively or through batch
> jobs)
> 2) He uses the BGAS nodes directly
> 
> Now, using 1) has the implication of a feature of SLURM which makes
> the bash
> it spawns once the job has enough resources inherit all the
> environment
> variables the job submission had set (this includes LANG. LC_*). It
> looks like
> some flavors of linux (especially in the embedded world) have a
> problem with
> this. I ran into a similar problem when porting HPX to the Xeon Phi.
> Everything was working nicely on our local machine (no job control,
> direct
> access through ssh etc.). I then moved on to Stampede, when logging
> into one
> of the Phis directly, everything still worked great. But only until i
> stopped
> using an interactive mode and started to submit jobs through the
> batch system.
> Which lead to similar problems John is running into right now ...
> About 2) ... I am not exactly sure how this is related to the problem
> at hand
> ...
> 
> Anyway, I was able to reproduce the problem on one of the CNK based
> compute
> nodes on JUQUEEN by using this jobscript:
> # @ job_name = HPX_Hello_World
> # @ comment = "HPX Hello World testrun"
> # @ error = $(job_name).$(jobid).err
> # @ output = $(job_name).$(jobid).out
> # @ environment = COPY_ALL
> # @ wall_clock_limit = 00:30:00
> # @ notification = error
> # @ notify_user = thom.heller at gmail.com
> # @ job_type = bluegene
> # @ bg_size = 32
> # @ queue
> 
> APP="$HOME/build/hpx/debug/bin/hello_world"
> 
> ENVS="LANG=en_US LC_CTYPE=\"en_US\" LC_NUMERIC=\"en_US\"
> LC_TIME=en_GB
> LC_COLLATE=\"en_US\" LC_MONETARY=\"en_US\" LC_MESSAGES=\"en_US\"
> LC_PAPER=\"en_US\" LC_NAME=\"en_US\" LC_ADDRESS=\"en_US\"
> LC_TELEPHONE=\"en_US\" LC_MEASUREMENT=\"en_US\"
> LC_IDENTIFICATION=\"en_US\"
> LC_ALL=\"en_US\""
> 
> runjob --ranks-per-node 1 --exe $APP --args "-t1" --envs $ENVS
> 
> Which lead to the exact same error. What I am unsure about though is
> who's
> fault it is. The stack trace John posted earlier comes out of the
> static
> section of the binary which initializes some globals out of the boost
> filesystem library. So we have three candidates: 1) Boost.Filesystem
> 2) libc++
> 3) the libc/posix on the BGAS node.

This sounds right. CNK (and, specifically, its associated build of glibc) don't have locale support enabled. As a result, as I recall, only the default ('C') is supported.

If it turns out that this is a bug in libc++, then we should fix it there. Maybe it is worthwhile to have a Boost build that is patched to avoid this problem as well?

In any case, thanks for investigating this and sharing your findings!

 -Hal

> 
> The solution to this problem is btw to unset all those environment
> variables.
> I commited a fix for HPX for working around this problem which should
> not
> require to manually unset those environment variables
> (https://github.com/STEllAR-GROUP/hpx/commit/65ce125466ae43e68e19e89b3e50ece0721786de).
> Thanks for the patience.
> 
> Regards,
> Thomas
> 
> On Friday, February 14, 2014 12:52:24 Biddiscombe, John A. wrote:
> > Hal
> > 
> > Apologies, I didn’t realize I was using the wrong wrapper.
> > 
> > I recompiled using the bgclang++11 wrapper and things work much
> > better.
> > I first compiled boost ok, but had trouble linking to it - I ran
> > into the
> > cxxABI link error with boost program_options:: __1 etc etc
> > 
> > After a bit of goggling around explained to me the std c++ lib
> > issues, so
> > I had another go using the following settings …
> > 
> > export
> > CC=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang
> > export
> > CXX=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang++11
> > export
> > PATH=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin:$PATH
> > 
> > I found some info about building boost with clang and followed
> > instructions here
> > http://stackoverflow.com/questions/11081818/linking-troubles-with-boostprog
> > ram-options-on-osx-using-llvm?lq=1
> > I modified tools/build/v2/user-config.jam to include the clang-11
> > option
> > using clang : 11
> > 
> >     :
> >     "/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang++11"
> >     : <cxxflags>"-std=c++11 -stdlib=libc++ -ftemplate-depth=512"
> > 
> > <linkflags>"-stdlib=libc++"
> >     ;
> > 
> > 
> > And then proceeded to building boost using the following commands
> > ./bootstrap.sh --with-toolset=clang-11
> > ./b2 -j 16 toolset=clang-11 cxxflags="-fPIC" --threading=multi
> > --without-mpi --without-python
> > --prefix=/gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0
> > 
> > And boost compiles fine.
> > "The Boost C++ Libraries were successfully built!"
> > 
> > To test, I compiled the boost serialisation demo from this page
> > http://www.boost.org/doc/libs/1_42_0/libs/serialization/example/demo.cpp
> > And also a simple boost::program_options demo and boost::filesystem
> > demo
> > they all run fine
> > 
> > Thank you very much for the help and all the work you’ve put in
> > getting
> > the clang stuff running..
> > 
> > But…
> > 
> > when I run simple demos from the HPX library
> > 
> > bbpbg2:~/bgas/build/hpx$ bin/hello_world
> > terminate called after throwing an instance of
> > 'std::__1::runtime_error'
> >   what():  collate_byname<char>::collate_byname failed to construct
> >   for
> > Aborted (core dumped)
> > 
> > 
> > gdb shows me a trace …
> > (gdb) where
> > #0  0x00000fffb3458c5c in raise (sig=6) at
> > ../nptl/sysdeps/unix/sysv/linux/raise.c:67
> > #1  0x00000fffb345abd4 in abort () at abort.c:92
> > #2  0x00000fffb3aa7b00 in __gnu_cxx::__verbose_terminate_handler ()
> >     at
> > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/libsupc++/
> > vterminate.cc:93
> > #3  0x00000fffb3aa4d74 in __cxxabiv1::__terminate (handler=<value
> > optimized out>)
> >     at
> > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/libsupc++/
> > eh_terminate.cc:38
> > #4  0x00000fffb3aa4db8 in std::terminate () at
> > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/libsupc++/
> > eh_terminate.cc:48
> > #5  0x00000fffb47b1c14 in .__clang_call_terminate () from
> > /gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0/lib/libboost_filesy
> > stem.so.1.54.0
> > #6  0x00000fffb47b48a0 in
> > ._ZNK5boost10filesystem4path7compareERKS1_ ()
> >    from
> > /gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0/lib/libboost_filesy
> > stem.so.1.54.0
> > Backtrace stopped: frame did not save the PC
> > 
> > 
> > It looks very suspicious as there are some stdlib++ appearances in
> > there.
> > 
> > Does anything here give you any idea of what might have gone wrong.
> > I’ve
> > tried a number of rebuilds and the error persists, whilst simple
> > demos run
> > ok. I’m not sure where to look to diagnose what’s up (I’ve
> > contacted the
> > HPX people as well). One question is why the shared clang libc++
> > links to
> > the stdlibc++ one. If I do an
> > 
> > bbpbg2:~/bgas/build/c++test$ ldd
> > /gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/libc++/lib/libc++.so.1.0
> > 
> > 	linux-vdso64.so.1 =>  (0x00000fff9ad40000)
> > 	libpthread.so.0 =>
> > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libpthread.so
> > .0 (0x00000fff9ab00000)
> > 	librt.so.1 =>
> > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/librt.so.1
> > (0x00000fff9a9d0000)
> > 	libc.so.6 =>
> > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libc.so.6
> > (0x00000fff9a790000)
> > 	libstdc++.so.6 =>
> > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libstdc++.so.
> > 6 (0x00000fff9a550000)
> > 	/lib64/ld64.so.1 (0x0000000032420000)
> > 	libm.so.6 =>
> > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libm.so.6
> > (0x00000fff9a430000)
> > 	libgcc_s.so.1 =>
> > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libgcc_s.so.1
> > (0x00000fff9a320000)
> > 
> > 
> > It seems odd. Could this be causing the trouble? (the demos run
> > fine
> > though, so I guess not).
> > 
> > Anyway, I’ll keep poking around, if anything comes to mind, I’m
> > grateful
> > for help.
> > 
> > Thanks
> > 
> > JB
> >   
> > 
> > _______________________________________________
> > llvm-bgq-discuss mailing list
> > llvm-bgq-discuss at lists.alcf.anl.gov
> > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory