[Llvm-bgq-discuss] [alcf-support #325179] Opening application executable failed, errno 2 No such file or directory

Jozsef Bakosi jbakosi at lanl.gov
Fri Feb 3 14:52:23 CST 2017


Hi Hal,

I'm not sure how useful this will be but this is the backtrace I get from
coreprocessor:

0 : (IAR=Node)Node (2)
1 : (IAR=0x0000000000000000)    0000000000000000 (1)
2 : (IAR=0x0000000001fa5994)        .__libc_start_main (1)
3 : (IAR=0x0000000001fa5468)            .generic_start_main (1)
4 : (IAR=0x0000000001fa5f10)                .__libc_csu_init (1)
5 : (IAR=0x000000000100fc08)                    ._GLOBAL__sub_I_Parser.C (1)
6 : (IAR=0x000000000100fb80)                        .__cxx_global_var_init.53 (1)
7 : (IAR=0x00000000011a8348) .tk::Print::Print(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, std::__1::basic_ostream<char, std::__1::char_traits<char> >&) (1)
1 :    <traceback not fetched> (1)

tk::Print is my code, calling the default constructor of std::stringstream,
which I believe the segfault comes from, which is at
/soft/compilers/bgclang/r284961-stable/libc++/include/c++/v1/sstream:246
(coreprocessor's "Location" points to).

Jozsef

On 02.03.2017 14:42, Hal Finkel wrote:
> Hi Jozef,
> 
> [-support; cc'ing support and this mailing list is going to be confusing
> because not all of the messages will appear on the mailing list]
> 
> Can you provide the backtrace? I don't recall running into a problem in this
> specific place, but I have seen problems with streams in the past for
> various reasons (i.e. things, like basic locale support, that BG/Q does not
> support).
> 
>  -Hal
> 
> 
> On 02/01/2017 12:29 PM, Jozsef Bakosi wrote:
> > Hi Ramesh and Tim,
> > 
> > Thanks for your help. I recompiled with debug info, ran using a single core, and
> > used the coreprocessor to find that I get the segfault from the standard
> > library, libc++:
> > 
> > Location: /soft/compilers/bgclang/r284961-stable/libc++/include/c++/v1/sstream:246:
> > 
> > 241 template <class _CharT, class _Traits, class _Allocator>
> > 242 basic_stringbuf<_CharT, _Traits, _Allocator>::basic_stringbuf(ios_base::openmode __wch)
> > 243     : __hm_(0),
> > 244       __mode_(__wch)
> > 245 {
> > 246     str(string_type());
> > 247 }
> > 
> > I'm CCing the bgclang list. Has anyone ever seen this basic_stringbuf
> > constructor segfaulting at this location? Is there another libc++ version I can
> > try?
> > 
> > In the meantime, I will probably try using gnu stdlibc++ instead of libc++.
> > 
> > Thanks for all your help,
> > Jozsef
> > 
> > On 01.31.2017 22:16, Balakrishnan, Ramesh wrote:
> > >     We have a perl based tool called [1]coreprocessor.pl  Make sure you
> > >     compile your code with the -g flag (in addition to the others that you
> > >     use) and use this tool to look at the core files (assuming that you are
> > >     getting core files). If you are not getting core files, you may want to
> > >     force the job to produce core files by using [2]--env
> > >     BG_COREDUMPONEXIT=1 in your qsub invocation.
> > > 
> > >     Hope this helps.
> > > 
> > >     Ramesh
> > > 
> > >     On Jan 31, 2017, at 3:56 PM, Jozsef Bakosi <[3]jbakosi at lanl.gov> wrote:
> > > 
> > >     Hi Ramesh,
> > >     I have built the executable using mpic++11. Is there a way to get more
> > >     information than the following?
> > >     2017-01-31 21:41:37.936 (WARN ) [0x4000122bde0]
> > >     CET-02400-13731-128:1911876:ibm.runjob.client.Job: terminated by signal
> > >     11
> > >     2017-01-31 21:41:37.936 (WARN ) [0x4000122bde0]
> > >     CET-02400-13731-128:1911876:ibm.runjob.client.Job: abnormal termination
> > >     by signal 11 from rank 16
> > >     Thanks,
> > >     Jozsef
> > >     On 01.31.2017 21:32, Balakrishnan, Ramesh wrote:
> > > 
> > >         Jozsef,
> > >         I am not sure how you are building your code, but I noticed in
> > >       your
> > >         earlier email that you are using bgclang++11. bgclang++11 is fine
> > >       for
> > >         non-MPI builds, but you will need to pull in a long list of
> > >       libraries
> > >         if you want to use bgclang++11 for buildign MPI code, and this
> > >       route
> > >         can lead to runtime errors. Instead, can you try building your MPI
> > >       code
> > >         with mpiclang++11 as opposed to bgclang++11. The mpiclang++11
> > >       wrapper,
> > >         around the bgclang++11 compiler, will pull in all of the necessary
> > >         libraries necessary for your MPI code.
> > >         Ramesh
> > >         On Jan 31, 2017, at 2:00 PM, Jozsef Bakosi
> > >       <[1][4]jbakosi at lanl.gov> wrote:
> > >         Hi Ramesh,
> > >         Based on your qsub line I tried this:
> > >         $ qsub -t 10 -n 1 --mode c16
> > >         /home/jbakosi/code/quinoa/build/clang/Main/unittest -v
> > >         and beside 16 core files, I get, in the job error file:
> > >         2017-01-31 19:51:26.031 (INFO ) [0x4000122bde0]
> > >         CET-40000-51331-128:1911641:ibm.runjob.client.Job: job 1911641
> > >       started
> > >         2017-01-31 19:51:31.066 (INFO ) [0x40000c334e0]
> > >         15824:tatu.runjob.monitor: tracklib completed
> > >         2017-01-31 19:51:43.674 (WARN ) [0x4000122bde0]
> > >         CET-40000-51331-128:1911641:ibm.runjob.client.Job: terminated by
> > >       signal
> > >         11
> > >         2017-01-31 19:51:43.675 (WARN ) [0x4000122bde0]
> > >         CET-40000-51331-128:1911641:ibm.runjob.client.Job: abnormal
> > >       termination
> > >         by signal 11 from rank 4
> > >         2017-01-31 19:51:43.675 (INFO ) [0x4000122bde0]
> > >       tatu.runjob.client:
> > >         task terminated by signal 11
> > >         I guess it started fine, but it segfaults right away?
> > >         How can I get a more detailed output from my application? My job
> > >       output
> > >         file is
> > >         zero length.
> > >         Jozsef
> > >       References
> > >         1. [5]mailto:jbakosi at lanl.gov
> > > 
> > > References
> > > 
> > >     1. http://www.alcf.anl.gov/user-guides/coreprocessor
> > >     2. https://www.alcf.anl.gov/user-guides/core-file-settings
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory


More information about the llvm-bgq-discuss mailing list