[Llvm-bgq-discuss] [alcf-support #325179] Opening application executable failed, errno 2 No such file or directory

Jozsef Bakosi jbakosi at lanl.gov
Wed Feb 1 12:29:23 CST 2017


Hi Ramesh and Tim,

Thanks for your help. I recompiled with debug info, ran using a single core, and
used the coreprocessor to find that I get the segfault from the standard
library, libc++:

Location: /soft/compilers/bgclang/r284961-stable/libc++/include/c++/v1/sstream:246:

241 template <class _CharT, class _Traits, class _Allocator>
242 basic_stringbuf<_CharT, _Traits, _Allocator>::basic_stringbuf(ios_base::openmode __wch)
243     : __hm_(0),
244       __mode_(__wch)
245 {
246     str(string_type());
247 }

I'm CCing the bgclang list. Has anyone ever seen this basic_stringbuf
constructor segfaulting at this location? Is there another libc++ version I can
try?

In the meantime, I will probably try using gnu stdlibc++ instead of libc++.

Thanks for all your help,
Jozsef

On 01.31.2017 22:16, Balakrishnan, Ramesh wrote:
>    We have a perl based tool called [1]coreprocessor.pl  Make sure you
>    compile your code with the -g flag (in addition to the others that you
>    use) and use this tool to look at the core files (assuming that you are
>    getting core files). If you are not getting core files, you may want to
>    force the job to produce core files by using [2]--env
>    BG_COREDUMPONEXIT=1 in your qsub invocation.
> 
>    Hope this helps.
> 
>    Ramesh
> 
>    On Jan 31, 2017, at 3:56 PM, Jozsef Bakosi <[3]jbakosi at lanl.gov> wrote:
> 
>    Hi Ramesh,
>    I have built the executable using mpic++11. Is there a way to get more
>    information than the following?
>    2017-01-31 21:41:37.936 (WARN ) [0x4000122bde0]
>    CET-02400-13731-128:1911876:ibm.runjob.client.Job: terminated by signal
>    11
>    2017-01-31 21:41:37.936 (WARN ) [0x4000122bde0]
>    CET-02400-13731-128:1911876:ibm.runjob.client.Job: abnormal termination
>    by signal 11 from rank 16
>    Thanks,
>    Jozsef
>    On 01.31.2017 21:32, Balakrishnan, Ramesh wrote:
> 
>        Jozsef,
>        I am not sure how you are building your code, but I noticed in
>      your
>        earlier email that you are using bgclang++11. bgclang++11 is fine
>      for
>        non-MPI builds, but you will need to pull in a long list of
>      libraries
>        if you want to use bgclang++11 for buildign MPI code, and this
>      route
>        can lead to runtime errors. Instead, can you try building your MPI
>      code
>        with mpiclang++11 as opposed to bgclang++11. The mpiclang++11
>      wrapper,
>        around the bgclang++11 compiler, will pull in all of the necessary
>        libraries necessary for your MPI code.
>        Ramesh
>        On Jan 31, 2017, at 2:00 PM, Jozsef Bakosi
>      <[1][4]jbakosi at lanl.gov> wrote:
>        Hi Ramesh,
>        Based on your qsub line I tried this:
>        $ qsub -t 10 -n 1 --mode c16
>        /home/jbakosi/code/quinoa/build/clang/Main/unittest -v
>        and beside 16 core files, I get, in the job error file:
>        2017-01-31 19:51:26.031 (INFO ) [0x4000122bde0]
>        CET-40000-51331-128:1911641:ibm.runjob.client.Job: job 1911641
>      started
>        2017-01-31 19:51:31.066 (INFO ) [0x40000c334e0]
>        15824:tatu.runjob.monitor: tracklib completed
>        2017-01-31 19:51:43.674 (WARN ) [0x4000122bde0]
>        CET-40000-51331-128:1911641:ibm.runjob.client.Job: terminated by
>      signal
>        11
>        2017-01-31 19:51:43.675 (WARN ) [0x4000122bde0]
>        CET-40000-51331-128:1911641:ibm.runjob.client.Job: abnormal
>      termination
>        by signal 11 from rank 4
>        2017-01-31 19:51:43.675 (INFO ) [0x4000122bde0]
>      tatu.runjob.client:
>        task terminated by signal 11
>        I guess it started fine, but it segfaults right away?
>        How can I get a more detailed output from my application? My job
>      output
>        file is
>        zero length.
>        Jozsef
>      References
>        1. [5]mailto:jbakosi at lanl.gov
> 
> References
> 
>    1. http://www.alcf.anl.gov/user-guides/coreprocessor
>    2. https://www.alcf.anl.gov/user-guides/core-file-settings


More information about the llvm-bgq-discuss mailing list