[Llvm-bgq-discuss] [alcf-support #325179] Opening application executable failed, errno 2 No such file or directory

Hal Finkel hfinkel at anl.gov
Fri Feb 3 14:42:04 CST 2017


Hi Jozef,

[-support; cc'ing support and this mailing list is going to be confusing 
because not all of the messages will appear on the mailing list]

Can you provide the backtrace? I don't recall running into a problem in 
this specific place, but I have seen problems with streams in the past 
for various reasons (i.e. things, like basic locale support, that BG/Q 
does not support).

  -Hal


On 02/01/2017 12:29 PM, Jozsef Bakosi wrote:
> Hi Ramesh and Tim,
>
> Thanks for your help. I recompiled with debug info, ran using a single core, and
> used the coreprocessor to find that I get the segfault from the standard
> library, libc++:
>
> Location: /soft/compilers/bgclang/r284961-stable/libc++/include/c++/v1/sstream:246:
>
> 241 template <class _CharT, class _Traits, class _Allocator>
> 242 basic_stringbuf<_CharT, _Traits, _Allocator>::basic_stringbuf(ios_base::openmode __wch)
> 243     : __hm_(0),
> 244       __mode_(__wch)
> 245 {
> 246     str(string_type());
> 247 }
>
> I'm CCing the bgclang list. Has anyone ever seen this basic_stringbuf
> constructor segfaulting at this location? Is there another libc++ version I can
> try?
>
> In the meantime, I will probably try using gnu stdlibc++ instead of libc++.
>
> Thanks for all your help,
> Jozsef
>
> On 01.31.2017 22:16, Balakrishnan, Ramesh wrote:
>>     We have a perl based tool called [1]coreprocessor.pl  Make sure you
>>     compile your code with the -g flag (in addition to the others that you
>>     use) and use this tool to look at the core files (assuming that you are
>>     getting core files). If you are not getting core files, you may want to
>>     force the job to produce core files by using [2]--env
>>     BG_COREDUMPONEXIT=1 in your qsub invocation.
>>
>>     Hope this helps.
>>
>>     Ramesh
>>
>>     On Jan 31, 2017, at 3:56 PM, Jozsef Bakosi <[3]jbakosi at lanl.gov> wrote:
>>
>>     Hi Ramesh,
>>     I have built the executable using mpic++11. Is there a way to get more
>>     information than the following?
>>     2017-01-31 21:41:37.936 (WARN ) [0x4000122bde0]
>>     CET-02400-13731-128:1911876:ibm.runjob.client.Job: terminated by signal
>>     11
>>     2017-01-31 21:41:37.936 (WARN ) [0x4000122bde0]
>>     CET-02400-13731-128:1911876:ibm.runjob.client.Job: abnormal termination
>>     by signal 11 from rank 16
>>     Thanks,
>>     Jozsef
>>     On 01.31.2017 21:32, Balakrishnan, Ramesh wrote:
>>
>>         Jozsef,
>>         I am not sure how you are building your code, but I noticed in
>>       your
>>         earlier email that you are using bgclang++11. bgclang++11 is fine
>>       for
>>         non-MPI builds, but you will need to pull in a long list of
>>       libraries
>>         if you want to use bgclang++11 for buildign MPI code, and this
>>       route
>>         can lead to runtime errors. Instead, can you try building your MPI
>>       code
>>         with mpiclang++11 as opposed to bgclang++11. The mpiclang++11
>>       wrapper,
>>         around the bgclang++11 compiler, will pull in all of the necessary
>>         libraries necessary for your MPI code.
>>         Ramesh
>>         On Jan 31, 2017, at 2:00 PM, Jozsef Bakosi
>>       <[1][4]jbakosi at lanl.gov> wrote:
>>         Hi Ramesh,
>>         Based on your qsub line I tried this:
>>         $ qsub -t 10 -n 1 --mode c16
>>         /home/jbakosi/code/quinoa/build/clang/Main/unittest -v
>>         and beside 16 core files, I get, in the job error file:
>>         2017-01-31 19:51:26.031 (INFO ) [0x4000122bde0]
>>         CET-40000-51331-128:1911641:ibm.runjob.client.Job: job 1911641
>>       started
>>         2017-01-31 19:51:31.066 (INFO ) [0x40000c334e0]
>>         15824:tatu.runjob.monitor: tracklib completed
>>         2017-01-31 19:51:43.674 (WARN ) [0x4000122bde0]
>>         CET-40000-51331-128:1911641:ibm.runjob.client.Job: terminated by
>>       signal
>>         11
>>         2017-01-31 19:51:43.675 (WARN ) [0x4000122bde0]
>>         CET-40000-51331-128:1911641:ibm.runjob.client.Job: abnormal
>>       termination
>>         by signal 11 from rank 4
>>         2017-01-31 19:51:43.675 (INFO ) [0x4000122bde0]
>>       tatu.runjob.client:
>>         task terminated by signal 11
>>         I guess it started fine, but it segfaults right away?
>>         How can I get a more detailed output from my application? My job
>>       output
>>         file is
>>         zero length.
>>         Jozsef
>>       References
>>         1. [5]mailto:jbakosi at lanl.gov
>>
>> References
>>
>>     1. http://www.alcf.anl.gov/user-guides/coreprocessor
>>     2. https://www.alcf.anl.gov/user-guides/core-file-settings

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory



More information about the llvm-bgq-discuss mailing list