[Llvm-bgq-discuss] [alcf-support #325179] Opening application executable failed, errno 2 No such file or directory
Jozsef Bakosi
jbakosi at lanl.gov
Fri Feb 3 14:52:23 CST 2017
Hi Hal,
I'm not sure how useful this will be but this is the backtrace I get from
coreprocessor:
0 : (IAR=Node)Node (2)
1 : (IAR=0x0000000000000000) 0000000000000000 (1)
2 : (IAR=0x0000000001fa5994) .__libc_start_main (1)
3 : (IAR=0x0000000001fa5468) .generic_start_main (1)
4 : (IAR=0x0000000001fa5f10) .__libc_csu_init (1)
5 : (IAR=0x000000000100fc08) ._GLOBAL__sub_I_Parser.C (1)
6 : (IAR=0x000000000100fb80) .__cxx_global_var_init.53 (1)
7 : (IAR=0x00000000011a8348) .tk::Print::Print(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, std::__1::basic_ostream<char, std::__1::char_traits<char> >&) (1)
1 : <traceback not fetched> (1)
tk::Print is my code, calling the default constructor of std::stringstream,
which I believe the segfault comes from, which is at
/soft/compilers/bgclang/r284961-stable/libc++/include/c++/v1/sstream:246
(coreprocessor's "Location" points to).
Jozsef
On 02.03.2017 14:42, Hal Finkel wrote:
> Hi Jozef,
>
> [-support; cc'ing support and this mailing list is going to be confusing
> because not all of the messages will appear on the mailing list]
>
> Can you provide the backtrace? I don't recall running into a problem in this
> specific place, but I have seen problems with streams in the past for
> various reasons (i.e. things, like basic locale support, that BG/Q does not
> support).
>
> -Hal
>
>
> On 02/01/2017 12:29 PM, Jozsef Bakosi wrote:
> > Hi Ramesh and Tim,
> >
> > Thanks for your help. I recompiled with debug info, ran using a single core, and
> > used the coreprocessor to find that I get the segfault from the standard
> > library, libc++:
> >
> > Location: /soft/compilers/bgclang/r284961-stable/libc++/include/c++/v1/sstream:246:
> >
> > 241 template <class _CharT, class _Traits, class _Allocator>
> > 242 basic_stringbuf<_CharT, _Traits, _Allocator>::basic_stringbuf(ios_base::openmode __wch)
> > 243 : __hm_(0),
> > 244 __mode_(__wch)
> > 245 {
> > 246 str(string_type());
> > 247 }
> >
> > I'm CCing the bgclang list. Has anyone ever seen this basic_stringbuf
> > constructor segfaulting at this location? Is there another libc++ version I can
> > try?
> >
> > In the meantime, I will probably try using gnu stdlibc++ instead of libc++.
> >
> > Thanks for all your help,
> > Jozsef
> >
> > On 01.31.2017 22:16, Balakrishnan, Ramesh wrote:
> > > We have a perl based tool called [1]coreprocessor.pl Make sure you
> > > compile your code with the -g flag (in addition to the others that you
> > > use) and use this tool to look at the core files (assuming that you are
> > > getting core files). If you are not getting core files, you may want to
> > > force the job to produce core files by using [2]--env
> > > BG_COREDUMPONEXIT=1 in your qsub invocation.
> > >
> > > Hope this helps.
> > >
> > > Ramesh
> > >
> > > On Jan 31, 2017, at 3:56 PM, Jozsef Bakosi <[3]jbakosi at lanl.gov> wrote:
> > >
> > > Hi Ramesh,
> > > I have built the executable using mpic++11. Is there a way to get more
> > > information than the following?
> > > 2017-01-31 21:41:37.936 (WARN ) [0x4000122bde0]
> > > CET-02400-13731-128:1911876:ibm.runjob.client.Job: terminated by signal
> > > 11
> > > 2017-01-31 21:41:37.936 (WARN ) [0x4000122bde0]
> > > CET-02400-13731-128:1911876:ibm.runjob.client.Job: abnormal termination
> > > by signal 11 from rank 16
> > > Thanks,
> > > Jozsef
> > > On 01.31.2017 21:32, Balakrishnan, Ramesh wrote:
> > >
> > > Jozsef,
> > > I am not sure how you are building your code, but I noticed in
> > > your
> > > earlier email that you are using bgclang++11. bgclang++11 is fine
> > > for
> > > non-MPI builds, but you will need to pull in a long list of
> > > libraries
> > > if you want to use bgclang++11 for buildign MPI code, and this
> > > route
> > > can lead to runtime errors. Instead, can you try building your MPI
> > > code
> > > with mpiclang++11 as opposed to bgclang++11. The mpiclang++11
> > > wrapper,
> > > around the bgclang++11 compiler, will pull in all of the necessary
> > > libraries necessary for your MPI code.
> > > Ramesh
> > > On Jan 31, 2017, at 2:00 PM, Jozsef Bakosi
> > > <[1][4]jbakosi at lanl.gov> wrote:
> > > Hi Ramesh,
> > > Based on your qsub line I tried this:
> > > $ qsub -t 10 -n 1 --mode c16
> > > /home/jbakosi/code/quinoa/build/clang/Main/unittest -v
> > > and beside 16 core files, I get, in the job error file:
> > > 2017-01-31 19:51:26.031 (INFO ) [0x4000122bde0]
> > > CET-40000-51331-128:1911641:ibm.runjob.client.Job: job 1911641
> > > started
> > > 2017-01-31 19:51:31.066 (INFO ) [0x40000c334e0]
> > > 15824:tatu.runjob.monitor: tracklib completed
> > > 2017-01-31 19:51:43.674 (WARN ) [0x4000122bde0]
> > > CET-40000-51331-128:1911641:ibm.runjob.client.Job: terminated by
> > > signal
> > > 11
> > > 2017-01-31 19:51:43.675 (WARN ) [0x4000122bde0]
> > > CET-40000-51331-128:1911641:ibm.runjob.client.Job: abnormal
> > > termination
> > > by signal 11 from rank 4
> > > 2017-01-31 19:51:43.675 (INFO ) [0x4000122bde0]
> > > tatu.runjob.client:
> > > task terminated by signal 11
> > > I guess it started fine, but it segfaults right away?
> > > How can I get a more detailed output from my application? My job
> > > output
> > > file is
> > > zero length.
> > > Jozsef
> > > References
> > > 1. [5]mailto:jbakosi at lanl.gov
> > >
> > > References
> > >
> > > 1. http://www.alcf.anl.gov/user-guides/coreprocessor
> > > 2. https://www.alcf.anl.gov/user-guides/core-file-settings
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
More information about the llvm-bgq-discuss
mailing list