[Llvm-bgq-discuss] trouble with latest clang install

Biddiscombe, John A. biddisco at cscs.ch
Fri Feb 21 01:13:30 CST 2014


Hal,

Just as an update, we tracked down the cause of the (2nd) crash that I was getting and it turned out to be a stack overflow in the HPX initialization which was fixed by setting some hpx flags.
Hello world programs are running, but I've yet to test my newly rewritten main program - that should happen soon.

One thing I ought to mention is that although I'm compiling code to run on CNK, I'm also compiling code to run on the IO nodes which run linux. If there are flags which set CNK specific options that would break a linux (Red hat enterprise 6.4) build then do please warn me so that I can make sure I set things appropriately. In actual fact the majority of my code is running on IO nodes at the moment so it's more important for me to get things right there.

Everything is running as expected at the moment. (NB. I'm compiling my code using cmake and the bgclang++11 wrappers, and setting all the mpi stuff 'by hand' so NOT using the mpiclang++11 wrappers [mpi=mvapich on the IONs]).

yours

JB

> -----Original Message-----
> From: llvm-bgq-discuss-bounces at lists.alcf.anl.gov [mailto:llvm-bgq-discuss-
> bounces at lists.alcf.anl.gov] On Behalf Of Hal Finkel
> Sent: 20 February 2014 22:40
> To: Thomas Gooding
> Cc: llvm-bgq-discuss at lists.alcf.anl.gov
> Subject: Re: [Llvm-bgq-discuss] trouble with latest clang install
> 
> ----- Original Message -----
> > From: "Thomas Gooding" <tgooding at us.ibm.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: llvm-bgq-discuss at lists.alcf.anl.gov, "thom heller"
> > <thom.heller at gmail.com>
> > Sent: Thursday, February 20, 2014 3:04:54 PM
> > Subject: Re: [Llvm-bgq-discuss] trouble with latest clang install
> >
> >
> >
> > Hi Hal,
> >
> > CNK has support for .tbss/.tdata segments (thread specific), which is
> > what glibc uses to track thread-specific locale information. The rest
> > of the support is entirely within glibc, I don't recall that it was
> > disabled. As I recall, if you don't have support for tbss/tdata,
> > programs will crashes when printing floating-point values (glibc needs
> > to know whether to print a comma or decimal per the locale).
> 
> I'll double-check the patchset again. As I recall, they only compile in the data
> tables for the 'C' locale, and nothing else is available (dynamically or
> otherwise). From a space-saving standpoint this probably makes sense.
> 
>  -Hal
> 
> >
> > Tom
> >
> > Tom Gooding
> > Senior Engineer / Blue Gene SW Lead / C2 tgooding at us.ibm.com
> > 507-253-0747
> >
> >
> > Inactive hide details for Hal Finkel ---02/20/2014 01:39:09
> > PM-------- Original Message ----- > From: "Thomas Heller"
> > <thom.helHal Finkel ---02/20/2014 01:39:09 PM-------- Original Message
> > ----- > From: "Thomas Heller" <thom.heller at gmail.com>
> >
> >
> >
> >
> > From:
> > Hal Finkel <hfinkel at anl.gov>
> >
> >
> >
> > To:
> > thom heller <thom.heller at gmail.com>
> >
> >
> >
> > Cc:
> > llvm-bgq-discuss at lists.alcf.anl.gov
> >
> >
> >
> > Date:
> > 02/20/2014 01:39 PM
> >
> >
> >
> > Subject:
> > Re: [Llvm-bgq-discuss] trouble with latest clang install
> >
> >
> >
> > Sent by:
> > llvm-bgq-discuss-bounces at lists.alcf.anl.gov
> >
> >
> >
> > ----- Original Message -----
> > > From: "Thomas Heller" <thom.heller at gmail.com>
> > > To: "John A. Biddiscombe" <biddisco at cscs.ch>
> > > Cc: llvm-bgq-discuss at lists.alcf.anl.gov, "Hal Finkel"
> > > <hfinkel at anl.gov>
> > > Sent: Friday, February 14, 2014 2:02:27 PM
> > > Subject: Re: [Llvm-bgq-discuss] trouble with latest clang install
> > >
> > > Hi all,
> > >
> > > Ok, I think i tracked it down.
> > > If my suspicions are correct, the segfault isn't caused by bgclang
> > > or hpx directly. It looks like parts of boost can't deal with
> > > locales correctly on John's system. Here is how it happens:
> > > On a regular BGQ compute node, you don't have interactive access and
> > > i think no locale information available. However, John's scenario is
> > > slightly
> > > different:
> > > 1) He uses SLURM to get on the nodes (interactively or through batch
> > > jobs)
> > > 2) He uses the BGAS nodes directly
> > >
> > > Now, using 1) has the implication of a feature of SLURM which makes
> > > the bash it spawns once the job has enough resources inherit all the
> > > environment variables the job submission had set (this includes
> > > LANG. LC_*). It looks like some flavors of linux (especially in the
> > > embedded world) have a problem with this. I ran into a similar
> > > problem when porting HPX to the Xeon Phi.
> > > Everything was working nicely on our local machine (no job control,
> > > direct access through ssh etc.). I then moved on to Stampede, when
> > > logging into one of the Phis directly, everything still worked
> > > great. But only until i stopped using an interactive mode and
> > > started to submit jobs through the batch system.
> > > Which lead to similar problems John is running into right now ...
> > > About 2) ... I am not exactly sure how this is related to the
> > > problem at hand ...
> > >
> > > Anyway, I was able to reproduce the problem on one of the CNK based
> > > compute nodes on JUQUEEN by using this jobscript:
> > > # @ job_name = HPX_Hello_World
> > > # @ comment = "HPX Hello World testrun"
> > > # @ error = $(job_name).$(jobid).err # @ output =
> > > $(job_name).$(jobid).out # @ environment = COPY_ALL # @
> > > wall_clock_limit = 00:30:00 # @ notification = error # @ notify_user
> > > = thom.heller at gmail.com # @ job_type = bluegene # @ bg_size = 32 # @
> > > queue
> > >
> > > APP="$HOME/build/hpx/debug/bin/hello_world"
> > >
> > > ENVS="LANG=en_US LC_CTYPE=\"en_US\" LC_NUMERIC=\"en_US\"
> > > LC_TIME=en_GB
> > > LC_COLLATE=\"en_US\" LC_MONETARY=\"en_US\"
> LC_MESSAGES=\"en_US\"
> > > LC_PAPER=\"en_US\" LC_NAME=\"en_US\" LC_ADDRESS=\"en_US\"
> > > LC_TELEPHONE=\"en_US\" LC_MEASUREMENT=\"en_US\"
> > > LC_IDENTIFICATION=\"en_US\"
> > > LC_ALL=\"en_US\""
> > >
> > > runjob --ranks-per-node 1 --exe $APP --args "-t1" --envs $ENVS
> > >
> > > Which lead to the exact same error. What I am unsure about though is
> > > who's fault it is. The stack trace John posted earlier comes out of
> > > the static section of the binary which initializes some globals out
> > > of the boost filesystem library. So we have three candidates: 1)
> > > Boost.Filesystem
> > > 2) libc++
> > > 3) the libc/posix on the BGAS node.
> >
> > This sounds right. CNK (and, specifically, its associated build of
> > glibc) don't have locale support enabled. As a result, as I recall,
> > only the default ('C') is supported.
> >
> > If it turns out that this is a bug in libc++, then we should fix it
> > there. Maybe it is worthwhile to have a Boost build that is patched to
> > avoid this problem as well?
> >
> > In any case, thanks for investigating this and sharing your findings!
> >
> > -Hal
> >
> > >
> > > The solution to this problem is btw to unset all those environment
> > > variables.
> > > I commited a fix for HPX for working around this problem which
> > > should not require to manually unset those environment variables (
> > > https://github.com/STEllAR-
> GROUP/hpx/commit/65ce125466ae43e68e19e89b
> > > 3e50ece0721786de
> > > ).
> > > Thanks for the patience.
> > >
> > > Regards,
> > > Thomas
> > >
> > > On Friday, February 14, 2014 12:52:24 Biddiscombe, John A. wrote:
> > > > Hal
> > > >
> > > > Apologies, I didn’t realize I was using the wrong wrapper.
> > > >
> > > > I recompiled using the bgclang++11 wrapper and things work much
> > > > better.
> > > > I first compiled boost ok, but had trouble linking to it - I ran
> > > > into the cxxABI link error with boost program_options:: __1 etc
> > > > etc
> > > >
> > > > After a bit of goggling around explained to me the std c++ lib
> > > > issues, so I had another go using the following settings …
> > > >
> > > > export
> > > > CC=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang
> > > > export
> > > >
> CXX=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang++1
> > > > 1
> > > > export
> > > > PATH=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin:$PATH
> > > >
> > > > I found some info about building boost with clang and followed
> > > > instructions here
> > > > http://stackoverflow.com/questions/11081818/linking-troubles-with-
> > > > boostprog
> > > > ram-options-on-osx-using-llvm?lq=1
> > > > I modified tools/build/v2/user-config.jam to include the clang-11
> > > > option using clang : 11
> > > >
> > > > :
> > > > "/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang++11"
> > > > : <cxxflags>"-std=c++11 -stdlib=libc++ -ftemplate-depth=512"
> > > >
> > > > <linkflags>"-stdlib=libc++"
> > > > ;
> > > >
> > > >
> > > > And then proceeded to building boost using the following commands
> > > > ./bootstrap.sh --with-toolset=clang-11
> > > > ./b2 -j 16 toolset=clang-11 cxxflags="-fPIC" --threading=multi
> > > > --without-mpi --without-python
> > > > --prefix=/gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0
> > > >
> > > > And boost compiles fine.
> > > > "The Boost C++ Libraries were successfully built!"
> > > >
> > > > To test, I compiled the boost serialisation demo from this page
> > > > http://www.boost.org/doc/libs/1_42_0/libs/serialization/example/de
> > > > mo.cpp And also a simple boost::program_options demo and
> > > > boost::filesystem demo they all run fine
> > > >
> > > > Thank you very much for the help and all the work you’ve put in
> > > > getting the clang stuff running..
> > > >
> > > > But…
> > > >
> > > > when I run simple demos from the HPX library
> > > >
> > > > bbpbg2:~/bgas/build/hpx$ bin/hello_world terminate called after
> > > > throwing an instance of 'std::__1::runtime_error'
> > > > what(): collate_byname<char>::collate_byname failed to construct
> > > > for Aborted (core dumped)
> > > >
> > > >
> > > > gdb shows me a trace …
> > > > (gdb) where
> > > > #0 0x00000fffb3458c5c in raise (sig=6) at
> > > > ../nptl/sysdeps/unix/sysv/linux/raise.c:67
> > > > #1 0x00000fffb345abd4 in abort () at abort.c:92
> > > > #2 0x00000fffb3aa7b00 in __gnu_cxx::__verbose_terminate_handler
> > > > ()
> > > > at
> > > > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/l
> > > > ibsupc++/
> > > > vterminate.cc:93
> > > > #3 0x00000fffb3aa4d74 in __cxxabiv1::__terminate (handler=<value
> > > > optimized out>) at
> > > > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/l
> > > > ibsupc++/
> > > > eh_terminate.cc:38
> > > > #4 0x00000fffb3aa4db8 in std::terminate () at
> > > > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc++-v3/l
> > > > ibsupc++/
> > > > eh_terminate.cc:48
> > > > #5 0x00000fffb47b1c14 in .__clang_call_terminate () from
> > > > /gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0/lib/libboo
> > > > st_filesy
> > > > stem.so.1.54.0
> > > > #6 0x00000fffb47b48a0 in
> > > > ._ZNK5boost10filesystem4path7compareERKS1_ () from
> > > > /gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0/lib/libboo
> > > > st_filesy
> > > > stem.so.1.54.0
> > > > Backtrace stopped: frame did not save the PC
> > > >
> > > >
> > > > It looks very suspicious as there are some stdlib++ appearances in
> > > > there.
> > > >
> > > > Does anything here give you any idea of what might have gone
> > > > wrong.
> > > > I’ve
> > > > tried a number of rebuilds and the error persists, whilst simple
> > > > demos run ok. I’m not sure where to look to diagnose what’s up
> > > > (I’ve contacted the HPX people as well). One question is why the
> > > > shared clang libc++ links to the stdlibc++ one. If I do an
> > > >
> > > > bbpbg2:~/bgas/build/c++test$ ldd
> > > > /gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/libc++/lib/libc++.
> > > > so.1.0
> > > >
> > > > linux-vdso64.so.1 => (0x00000fff9ad40000)
> > > > libpthread.so.0 =>
> > > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libp
> > > > thread.so
> > > > .0 (0x00000fff9ab00000)
> > > > librt.so.1 =>
> > > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libr
> > > > t.so.1
> > > > (0x00000fff9a9d0000)
> > > > libc.so.6 =>
> > > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libc
> > > > .so.6
> > > > (0x00000fff9a790000)
> > > > libstdc++.so.6 =>
> > > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-
> linux/lib/libstdc++.so.
> > > > 6 (0x00000fff9a550000)
> > > > /lib64/ld64.so.1 (0x0000000032420000)
> > > > libm.so.6 =>
> > > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-
> linux/lib/libm
> > > > .so.6
> > > > (0x00000fff9a430000)
> > > > libgcc_s.so.1 =>
> > > > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libg
> > > > cc_s.so.1
> > > > (0x00000fff9a320000)
> > > >
> > > >
> > > > It seems odd. Could this be causing the trouble? (the demos run
> > > > fine though, so I guess not).
> > > >
> > > > Anyway, I’ll keep poking around, if anything comes to mind, I’m
> > > > grateful for help.
> > > >
> > > > Thanks
> > > >
> > > > JB
> > > >
> > > >
> > > > _______________________________________________
> > > > llvm-bgq-discuss mailing list
> > > > llvm-bgq-discuss at lists.alcf.anl.gov
> > > > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> > >
> > >
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> > _______________________________________________
> > llvm-bgq-discuss mailing list
> > llvm-bgq-discuss at lists.alcf.anl.gov
> > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
> >
> >
> >
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> llvm-bgq-discuss mailing list
> llvm-bgq-discuss at lists.alcf.anl.gov
> https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss


More information about the llvm-bgq-discuss mailing list