[Llvm-bgq-discuss] trouble with latest clang install

Thomas Gooding tgooding at us.ibm.com
Thu Feb 20 15:04:54 CST 2014


Hi Hal,

CNK has support for .tbss/.tdata segments (thread specific), which is what
glibc uses to track thread-specific locale information.  The rest of the
support is entirely within glibc, I don't recall that it was disabled.  As
I recall, if you don't have support for tbss/tdata, programs will crashes
when printing floating-point values (glibc needs to know whether to print a
comma or decimal per the locale).

Tom

Tom Gooding
Senior Engineer / Blue Gene SW Lead / C2
tgooding at us.ibm.com   507-253-0747



|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Hal Finkel <hfinkel at anl.gov>                                                                                                                      |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |thom heller <thom.heller at gmail.com>                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |llvm-bgq-discuss at lists.alcf.anl.gov                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |02/20/2014 01:39 PM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: [Llvm-bgq-discuss] trouble with latest clang install                                                                                          |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Sent by:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |llvm-bgq-discuss-bounces at lists.alcf.anl.gov                                                                                                       |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





----- Original Message -----
> From: "Thomas Heller" <thom.heller at gmail.com>
> To: "John A. Biddiscombe" <biddisco at cscs.ch>
> Cc: llvm-bgq-discuss at lists.alcf.anl.gov, "Hal Finkel" <hfinkel at anl.gov>
> Sent: Friday, February 14, 2014 2:02:27 PM
> Subject: Re: [Llvm-bgq-discuss] trouble with latest clang install
>
> Hi all,
>
> Ok, I think i tracked it down.
> If my suspicions are correct, the segfault isn't caused by bgclang or
> hpx
> directly. It looks like parts of boost can't deal with locales
> correctly on
> John's system. Here is how it happens:
> On a regular BGQ compute node, you don't have interactive access and
> i think
> no locale information available. However, John's scenario is slightly
> different:
> 1) He uses SLURM to get on the nodes (interactively or through batch
> jobs)
> 2) He uses the BGAS nodes directly
>
> Now, using 1) has the implication of a feature of SLURM which makes
> the bash
> it spawns once the job has enough resources inherit all the
> environment
> variables the job submission had set (this includes LANG. LC_*). It
> looks like
> some flavors of linux (especially in the embedded world) have a
> problem with
> this. I ran into a similar problem when porting HPX to the Xeon Phi.
> Everything was working nicely on our local machine (no job control,
> direct
> access through ssh etc.). I then moved on to Stampede, when logging
> into one
> of the Phis directly, everything still worked great. But only until i
> stopped
> using an interactive mode and started to submit jobs through the
> batch system.
> Which lead to similar problems John is running into right now ...
> About 2) ... I am not exactly sure how this is related to the problem
> at hand
> ...
>
> Anyway, I was able to reproduce the problem on one of the CNK based
> compute
> nodes on JUQUEEN by using this jobscript:
> # @ job_name = HPX_Hello_World
> # @ comment = "HPX Hello World testrun"
> # @ error = $(job_name).$(jobid).err
> # @ output = $(job_name).$(jobid).out
> # @ environment = COPY_ALL
> # @ wall_clock_limit = 00:30:00
> # @ notification = error
> # @ notify_user = thom.heller at gmail.com
> # @ job_type = bluegene
> # @ bg_size = 32
> # @ queue
>
> APP="$HOME/build/hpx/debug/bin/hello_world"
>
> ENVS="LANG=en_US LC_CTYPE=\"en_US\" LC_NUMERIC=\"en_US\"
> LC_TIME=en_GB
> LC_COLLATE=\"en_US\" LC_MONETARY=\"en_US\" LC_MESSAGES=\"en_US\"
> LC_PAPER=\"en_US\" LC_NAME=\"en_US\" LC_ADDRESS=\"en_US\"
> LC_TELEPHONE=\"en_US\" LC_MEASUREMENT=\"en_US\"
> LC_IDENTIFICATION=\"en_US\"
> LC_ALL=\"en_US\""
>
> runjob --ranks-per-node 1 --exe $APP --args "-t1" --envs $ENVS
>
> Which lead to the exact same error. What I am unsure about though is
> who's
> fault it is. The stack trace John posted earlier comes out of the
> static
> section of the binary which initializes some globals out of the boost
> filesystem library. So we have three candidates: 1) Boost.Filesystem
> 2) libc++
> 3) the libc/posix on the BGAS node.

This sounds right. CNK (and, specifically, its associated build of glibc)
don't have locale support enabled. As a result, as I recall, only the
default ('C') is supported.

If it turns out that this is a bug in libc++, then we should fix it there.
Maybe it is worthwhile to have a Boost build that is patched to avoid this
problem as well?

In any case, thanks for investigating this and sharing your findings!

 -Hal

>
> The solution to this problem is btw to unset all those environment
> variables.
> I commited a fix for HPX for working around this problem which should
> not
> require to manually unset those environment variables
> (
https://github.com/STEllAR-GROUP/hpx/commit/65ce125466ae43e68e19e89b3e50ece0721786de
).
> Thanks for the patience.
>
> Regards,
> Thomas
>
> On Friday, February 14, 2014 12:52:24 Biddiscombe, John A. wrote:
> > Hal
> >
> > Apologies, I didn’t realize I was using the wrong wrapper.
> >
> > I recompiled using the bgclang++11 wrapper and things work much
> > better.
> > I first compiled boost ok, but had trouble linking to it - I ran
> > into the
> > cxxABI link error with boost program_options:: __1 etc etc
> >
> > After a bit of goggling around explained to me the std c++ lib
> > issues, so
> > I had another go using the following settings …
> >
> > export
> > CC=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang
> > export
> > CXX=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang++11
> > export
> > PATH=/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin:$PATH
> >
> > I found some info about building boost with clang and followed
> > instructions here
> >
http://stackoverflow.com/questions/11081818/linking-troubles-with-boostprog
> > ram-options-on-osx-using-llvm?lq=1
> > I modified tools/build/v2/user-config.jam to include the clang-11
> > option
> > using clang : 11
> >
> >     :
> >     "/gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/bin/bgclang++11"
> >     : <cxxflags>"-std=c++11 -stdlib=libc++ -ftemplate-depth=512"
> >
> > <linkflags>"-stdlib=libc++"
> >     ;
> >
> >
> > And then proceeded to building boost using the following commands
> > ./bootstrap.sh --with-toolset=clang-11
> > ./b2 -j 16 toolset=clang-11 cxxflags="-fPIC" --threading=multi
> > --without-mpi --without-python
> > --prefix=/gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0
> >
> > And boost compiles fine.
> > "The Boost C++ Libraries were successfully built!"
> >
> > To test, I compiled the boost serialisation demo from this page
> >
http://www.boost.org/doc/libs/1_42_0/libs/serialization/example/demo.cpp
> > And also a simple boost::program_options demo and boost::filesystem
> > demo
> > they all run fine
> >
> > Thank you very much for the help and all the work you’ve put in
> > getting
> > the clang stuff running..
> >
> > But…
> >
> > when I run simple demos from the HPX library
> >
> > bbpbg2:~/bgas/build/hpx$ bin/hello_world
> > terminate called after throwing an instance of
> > 'std::__1::runtime_error'
> >   what():  collate_byname<char>::collate_byname failed to construct
> >   for
> > Aborted (core dumped)
> >
> >
> > gdb shows me a trace …
> > (gdb) where
> > #0  0x00000fffb3458c5c in raise (sig=6) at
> > ../nptl/sysdeps/unix/sysv/linux/raise.c:67
> > #1  0x00000fffb345abd4 in abort () at abort.c:92
> > #2  0x00000fffb3aa7b00 in __gnu_cxx::__verbose_terminate_handler ()
> >     at
> > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc+
+-v3/libsupc++/
> > vterminate.cc:93
> > #3  0x00000fffb3aa4d74 in __cxxabiv1::__terminate (handler=<value
> > optimized out>)
> >     at
> > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc+
+-v3/libsupc++/
> > eh_terminate.cc:38
> > #4  0x00000fffb3aa4db8 in std::terminate () at
> > /bgsys/drivers/V1R2M1/ppc64/toolchain/gnu/gcc-4.4.6/libstdc+
+-v3/libsupc++/
> > eh_terminate.cc:48
> > #5  0x00000fffb47b1c14 in .__clang_call_terminate () from
>
> /gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0/lib/libboost_filesy

> > stem.so.1.54.0
> > #6  0x00000fffb47b48a0 in
> > ._ZNK5boost10filesystem4path7compareERKS1_ ()
> >    from
>
> /gpfs/bbp.cscs.ch/home/biddisco/apps/clang/boost_1_54_0/lib/libboost_filesy

> > stem.so.1.54.0
> > Backtrace stopped: frame did not save the PC
> >
> >
> > It looks very suspicious as there are some stdlib++ appearances in
> > there.
> >
> > Does anything here give you any idea of what might have gone wrong.
> > I’ve
> > tried a number of rebuilds and the error persists, whilst simple
> > demos run
> > ok. I’m not sure where to look to diagnose what’s up (I’ve
> > contacted the
> > HPX people as well). One question is why the shared clang libc++
> > links to
> > the stdlibc++ one. If I do an
> >
> > bbpbg2:~/bgas/build/c++test$ ldd
> > /gpfs/bbp.cscs.ch/home/biddisco/bgas/apps/clang/libc++/lib/libc+
+.so.1.0
> >
> > 		 linux-vdso64.so.1 =>  (0x00000fff9ad40000)
> > 		 libpthread.so.0 =>
>
> /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libpthread.so

> > .0 (0x00000fff9ab00000)
> > 		 librt.so.1 =>
>
> /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/librt.so.1
> > (0x00000fff9a9d0000)
> > 		 libc.so.6 =>
> > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libc.so.6
> > (0x00000fff9a790000)
> > 		 libstdc++.so.6 =>
> > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libstdc+
+.so.
> > 6 (0x00000fff9a550000)
> > 		 /lib64/ld64.so.1 (0x0000000032420000)
> > 		 libm.so.6 =>
> > /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libm.so.6
> > (0x00000fff9a430000)
> > 		 libgcc_s.so.1 =>
>
> /bgsys/drivers/V1R2M1/ppc64/gnu-linux/powerpc64-bgq-linux/lib/libgcc_s.so.1

> > (0x00000fff9a320000)
> >
> >
> > It seems odd. Could this be causing the trouble? (the demos run
> > fine
> > though, so I guess not).
> >
> > Anyway, I’ll keep poking around, if anything comes to mind, I’m
> > grateful
> > for help.
> >
> > Thanks
> >
> > JB
> >
> >
> > _______________________________________________
> > llvm-bgq-discuss mailing list
> > llvm-bgq-discuss at lists.alcf.anl.gov
> > https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss
>
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
llvm-bgq-discuss mailing list
llvm-bgq-discuss at lists.alcf.anl.gov
https://lists.alcf.anl.gov/mailman/listinfo/llvm-bgq-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140220/6e85693c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140220/6e85693c/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20140220/6e85693c/attachment-0003.gif>


More information about the llvm-bgq-discuss mailing list