[Llvm-bgq-discuss] trouble with latest clang install

Hal Finkel hfinkel at anl.gov
Thu Feb 20 15:40:05 CST 2014


----- Original Message -----
> From: "Thomas Gooding" <tgooding at us.ibm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-bgq-discuss at lists.alcf.anl.gov, "thom heller" <thom.heller at gmail.com>
> Sent: Thursday, February 20, 2014 3:04:54 PM
> Subject: Re: [Llvm-bgq-discuss] trouble with latest clang install
> 
> 
> 
> Hi Hal,
> 
> CNK has support for .tbss/.tdata segments (thread specific), which is
> what glibc uses to track thread-specific locale information. The
> rest of the support is entirely within glibc, I don't recall that it
> was disabled. As I recall, if you don't have support for tbss/tdata,
> programs will crashes when printing floating-point values (glibc
> needs to know whether to print a comma or decimal per the locale).

I'll double-check the patchset again. As I recall, they only compile in the data tables for the 'C' locale, and nothing else is available (dynamically or otherwise). From a space-saving standpoint this probably makes sense.

 -Hal

> 
> Tom
> 
> Tom Gooding
> Senior Engineer / Blue Gene SW Lead / C2
> tgooding at us.ibm.com 507-253-0747
> 
> 
> Inactive hide details for Hal Finkel ---02/20/2014 01:39:09
> PM-------- Original Message ----- > From: "Thomas Heller"
> <thom.helHal Finkel ---02/20/2014 01:39:09 PM-------- Original
> Message ----- > From: "Thomas Heller" <thom.heller at gmail.com>
> 
> 
> 
> 
> From:
> Hal Finkel <hfinkel at anl.gov>
> 
> 
> 
> To:
> thom heller <thom.heller at gmail.com>
> 
> 
> 
> Cc:
> llvm-bgq-discuss at lists.alcf.anl.gov
> 
> 
> 
> Date:
> 02/20/2014 01:39 PM
> 
> 
> 
> Subject:
> Re: [Llvm-bgq-discuss] trouble with latest clang install
> 
> 
> 
> Sent by:
> llvm-bgq-discuss-bounces at lists.alcf.anl.gov
> 
> 
> 
> ----- Original Message -----
> > From: "Thomas Heller" <thom.heller at gmail.com>
> > To: "John A. Biddiscombe" <biddisco at cscs.ch>
> > Cc: llvm-bgq-discuss at lists.alcf.anl.gov, "Hal Finkel"
> > <hfinkel at anl.gov>
> > Sent: Friday, February 14, 2014 2:02:27 PM
> > Subject: Re: [Llvm-bgq-discuss] trouble with latest clang install
> > 
> > Hi all,
> > 
> > Ok, I think i tracked it down.
> > If my suspicions are correct, the segfault isn't caused by bgclang
> > or
> > hpx
> > directly. It looks like parts of boost can't deal with locales
> > correctly on
> > John's system. Here is how it happens:
> > On a regular BGQ compute node, you don't have interactive access
> > and
> > i think
> > no locale information available. However, John's scenario is
> > slightly
> > different:
> > 1) He uses SLURM to get on the nodes (interactively or through
> > batch
> > jobs)
> > 2) He uses the BGAS nodes directly
> > 
> > Now, using 1) has the implication of a feature of SLURM which makes
> > the bash
> > it spawns once the job has enough resources inherit all the
> > environment
> > variables the job submission had set (this includes LANG. LC_*). It
> > looks like
> > some flavors of linux (especially in the embedded world) have a
> > problem with
> > this. I ran into a similar problem when porting HPX to the Xeon
> > Phi.
> > Everything was working nicely on our local machine (no job control,
> > direct
> > access through ssh etc.). I then moved on to Stampede, when logging
> > into one
> > of the Phis directly, everything still worked great. But only until
> > i
> > stopped
> > using an interactive mode and started to submit jobs through the
> > batch system.
> > Which lead to similar problems John is running into right now ...
> > About 2) ... I am not exactly sure how this is related to the
> > problem
> > at hand
> > ...
> > 
> > Anyway, I was able to reproduce the problem on one of the CNK based
> > compute
> > nodes on JUQUEEN by using this jobscript:
> > # @ job_name = HPX_Hello_World
> > # @ comment = "HPX Hello World testrun"
> > # @ error = $(job_name).$(jobid).err
> > # @ output = $(job_name).$(jobid).out
> > # @ environment = COPY_ALL
> > # @ wall_clock_limit = 00:30:00
> > # @ notification = error
> > # @ notify_user = thom.heller at gmail.com
> > # @ job_type = bluegene
> > # @ bg_size = 32
> > # @ queue
> > 
> > APP="$HOME/build/hpx/debug/bin/hello_world"
> > 
> > ENVS="LANG=en_US LC_CTYPE=\"en_US\" LC_NUMERIC=\"en_US\"
> > LC_TIME=en_GB
> > LC_COLLATE=\"en_US\" LC_MONETARY=\"en_US\" LC_MESSAGES=\"en_US\"
> > LC_PAPER=\"en_US\" LC_NAME=\"en_US\" LC_ADDRESS=\"en_US\"
> > LC_TELEPHONE=\"en_US\" LC_MEASUREMENT=\"en_US\"
> > LC_IDENTIFICATION=\"en_US\"
> > LC_ALL=\"en_US\""
> > 
> > runjob --ranks-per-node 1 --exe $APP --args "-t1" --envs $ENVS
> > 
> > Which lead to the exact same error. What I am unsure about though
> > is
> > who's
> > fault it is. The stack trace John posted earlier comes out of the
> > static
> > section of the binary which initializes some globals out of the
> > boost
> > filesystem library. So we have three candidates: 1)
> > Boost.Filesystem
> > 2) libc++
> > 3) the libc/posix on the BGAS node.
> 
> This sounds right. CNK (and, specifically, its associated build of
> glibc) don't have locale support enabled. As a result, as I recall,
> only the default ('C') is supported.
> 
> If it turns out that this is a bug in libc++, then we should fix it
> there. Maybe it is worthwhile to have a Boost build that is patched
> to avoid this problem as well?
> 
> In any case, thanks for investigating this and sharing your findings!
> 
> -Hal
> 
> > 
> > The solution to this problem is btw to unset all those environment
> > variables.
> > I commited a fix for HPX for working around this problem which
> > should
> > not
> > require to manually unset those environment variables
> > (
> > https://github.com/STEllAR-GROUP/hpx/commit/65ce125466ae43e68e19e89b3e50ece0721786de
> > ).
> > Thanks for the patience.
> > 
> > Regards,
> > Thomas
> > 
> > On Friday, February 14, 2014 12:52:24 Biddiscombe, John A. wrote:
> > > Hal
> > > 
> > > Apologies, I didn’t realize I was using the wrong wrapper.
> > > 
> > > I recompiled using the bgclang++11 wrapper and things work much
> > > better.
> > > I first compiled boost ok, but had trouble linking to it - I ran
> > > into the
> > > cxxABI link error with boost program_options:: __1 etc etc
> > > 
> > > After a bit of goggling around explained to me the std c++ lib
> > > issues, so
> > > I had another go using the following settings …
> > > 
> > > export
> > > CC=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang
> > > export
> > > CXX=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang++11
> > > export
> > > PATH=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin:$PATH
> > > 
> > > I found some info about building boost with clang and followed
> > > instructions here
> > > http://stackoverflow.com/questions/11081818/linking-troubles-with-boostprog
> > > ram-options-on-osx-using-llvm?lq=1
> > > I modified tools/build/v2/user-config.jam to include the clang-11
> > > option
> > > using clang : 11
> > > 
> > > :
> > > "/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang++11"
> > > : <cxxflags>"-std=c++11 -stdlib=libc++ -ftemplate-depth=512"
> > > 
> > > <linkflags>"-stdlib=libc++"
> > > ;
> > > 
> > > 
> > > And then proceeded to building boost using the following commands
> > > ./bootstrap.sh --with-toolset=clang-11
> > > ./b2 -j 16 toolset=clang-11 cxxflags="-fPIC" --threading=multi
> > > --without-mpi --without-python
> > > --prefix=/gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0
> > > 
> > > And boost compiles fine.
> > > "The Boost C++ Libraries were successfully built!"
> > > 
> > > To test, I compiled the boost serialisation demo from this page
> > > http://www.boost.org/doc/libs/1_42_0/libs/serialization/example/demo.cpp
> > > And also a simple boost::program_options demo and
> > > boost::filesystem
> > > demo
> > > they all run fine
> > > 
> > > Thank you very much for the help and all the work you’ve put in
> > > getting
> > > the clang stuff running..
> > > 
> > > But…
> > > 
> > > when I run simple demos from the HPX library
> > > 
> > > bbpbg2:~/bgas/build/hpx$ bin/hello_world
> > > terminate called after throwing an instance of
> > > 'std::__1::runtime_error'
> > > what(): collate_byname<char>::collate_byname failed to construct
> > > for
> > > Aborted (core dumped)
> > > 
> > > 
> > > gdb shows me a trace …
> > > (gdb) where
> > > #0 0x00000fffb3458c5c in raise (sig=6) at
> > > ../nptl/sysdeps/unix/sysv/linux/raise.c:67
> > > #1 0x00000fffb345abd4 in abort () at abort.c:92
> > > #2 0x00000fffb3aa7b00 in __gnu_cxx::__verbose_terminate_handler
> > > ()
> > > at
> > > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/libsupc++/
> > > vterminate.cc:93
> > > #3 0x00000fffb3aa4d74 in __cxxabiv1::__terminate (handler=<value
> > > optimized out>)
> > > at
> > > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/libsupc++/
> > > eh_terminate.cc:38
> > > #4 0x00000fffb3aa4db8 in std::terminate () at
> > > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/libsupc++/
> > > eh_terminate.cc:48
> > > #5 0x00000fffb47b1c14 in .__clang_call_terminate () from
> > > /gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0/lib/libboost_filesy
> > > stem.so.1.54.0
> > > #6 0x00000fffb47b48a0 in
> > > ._ZNK5boost10filesystem4path7compareERKS1_ ()
> > > from
> > > /gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0/lib/libboost_filesy
> > > stem.so.1.54.0
> > > Backtrace stopped: frame did not save the PC
> > > 
> > > 
> > > It looks very suspicious as there are some stdlib++ appearances
> > > in
> > > there.
> > > 
> > > Does anything here give you any idea of what might have gone
> > > wrong.
> > > I’ve
> > > tried a number of rebuilds and the error persists, whilst simple
> > > demos run
> > > ok. I’m not sure where to look to diagnose what’s up (I’ve
> > > contacted the
> > > HPX people as well). One question is why the shared clang libc++
> > > links to
> > > the stdlibc++ one. If I do an
> > > 
> > > bbpbg2:~/bgas/build/c++test$ ldd
> > > /gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/libc++/lib/libc++.so.1.0
> > > 
> > > linux-vdso64.so.1 => (0x00000fff9ad40000)
> > > libpthread.so.0 =>
> > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libpthread.so
> > > .0 (0x00000fff9ab00000)
> > > librt.so.1 =>
> > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/librt.so.1
> > > (0x00000fff9a9d0000)
> > > libc.so.6 =>
> > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libc.so.6
> > > (0x00000fff9a790000)
> > > libstdc++.so.6 =>
> > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libstdc++.so.
> > > 6 (0x00000fff9a550000)
> > > /lib64/ld64.so.1 (0x0000000032420000)
> > > libm.so.6 =>
> > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libm.so.6
> > > (0x00000fff9a430000)
> > > libgcc_s.so.1 =>
> > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libgcc_s.so.1
> > > (0x00000fff9a320000)
> > > 
> > > 
> > > It seems odd. Could this be causing the trouble? (the demos run
> > > fine
> > > though, so I guess not).
> > > 
> > > Anyway, I’ll keep poking around, if anything comes to mind, I’m
> > > grateful
> > > for help.
> > > 
> > > Thanks
> > > 
> > > JB
> > > 
> > > 
> > > _______________________________________________
> > > llvm-bgq-discuss mailing list
> > > llvm-bgq-discuss at lists.alcf.anl.gov
> > > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> > 
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> llvm-bgq-discuss mailing list
> llvm-bgq-discuss at lists.alcf.anl.gov
> https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> 
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory


More information about the llvm-bgq-discuss mailing list