[Llvm-bgq-discuss] [alcf-support #325179] Opening application executable failed, errno 2 No such file or directory
Jozsef Bakosi
jbakosi at lanl.gov
Wed Feb 1 12:29:23 CST 2017
Hi Ramesh and Tim,
Thanks for your help. I recompiled with debug info, ran using a single core, and
used the coreprocessor to find that I get the segfault from the standard
library, libc++:
Location: /soft/compilers/bgclang/r284961-stable/libc++/include/c++/v1/sstream:246:
241 template <class _CharT, class _Traits, class _Allocator>
242 basic_stringbuf<_CharT, _Traits, _Allocator>::basic_stringbuf(ios_base::openmode __wch)
243 : __hm_(0),
244 __mode_(__wch)
245 {
246 str(string_type());
247 }
I'm CCing the bgclang list. Has anyone ever seen this basic_stringbuf
constructor segfaulting at this location? Is there another libc++ version I can
try?
In the meantime, I will probably try using gnu stdlibc++ instead of libc++.
Thanks for all your help,
Jozsef
On 01.31.2017 22:16, Balakrishnan, Ramesh wrote:
> We have a perl based tool called [1]coreprocessor.pl Make sure you
> compile your code with the -g flag (in addition to the others that you
> use) and use this tool to look at the core files (assuming that you are
> getting core files). If you are not getting core files, you may want to
> force the job to produce core files by using [2]--env
> BG_COREDUMPONEXIT=1 in your qsub invocation.
>
> Hope this helps.
>
> Ramesh
>
> On Jan 31, 2017, at 3:56 PM, Jozsef Bakosi <[3]jbakosi at lanl.gov> wrote:
>
> Hi Ramesh,
> I have built the executable using mpic++11. Is there a way to get more
> information than the following?
> 2017-01-31 21:41:37.936 (WARN ) [0x4000122bde0]
> CET-02400-13731-128:1911876:ibm.runjob.client.Job: terminated by signal
> 11
> 2017-01-31 21:41:37.936 (WARN ) [0x4000122bde0]
> CET-02400-13731-128:1911876:ibm.runjob.client.Job: abnormal termination
> by signal 11 from rank 16
> Thanks,
> Jozsef
> On 01.31.2017 21:32, Balakrishnan, Ramesh wrote:
>
> Jozsef,
> I am not sure how you are building your code, but I noticed in
> your
> earlier email that you are using bgclang++11. bgclang++11 is fine
> for
> non-MPI builds, but you will need to pull in a long list of
> libraries
> if you want to use bgclang++11 for buildign MPI code, and this
> route
> can lead to runtime errors. Instead, can you try building your MPI
> code
> with mpiclang++11 as opposed to bgclang++11. The mpiclang++11
> wrapper,
> around the bgclang++11 compiler, will pull in all of the necessary
> libraries necessary for your MPI code.
> Ramesh
> On Jan 31, 2017, at 2:00 PM, Jozsef Bakosi
> <[1][4]jbakosi at lanl.gov> wrote:
> Hi Ramesh,
> Based on your qsub line I tried this:
> $ qsub -t 10 -n 1 --mode c16
> /home/jbakosi/code/quinoa/build/clang/Main/unittest -v
> and beside 16 core files, I get, in the job error file:
> 2017-01-31 19:51:26.031 (INFO ) [0x4000122bde0]
> CET-40000-51331-128:1911641:ibm.runjob.client.Job: job 1911641
> started
> 2017-01-31 19:51:31.066 (INFO ) [0x40000c334e0]
> 15824:tatu.runjob.monitor: tracklib completed
> 2017-01-31 19:51:43.674 (WARN ) [0x4000122bde0]
> CET-40000-51331-128:1911641:ibm.runjob.client.Job: terminated by
> signal
> 11
> 2017-01-31 19:51:43.675 (WARN ) [0x4000122bde0]
> CET-40000-51331-128:1911641:ibm.runjob.client.Job: abnormal
> termination
> by signal 11 from rank 4
> 2017-01-31 19:51:43.675 (INFO ) [0x4000122bde0]
> tatu.runjob.client:
> task terminated by signal 11
> I guess it started fine, but it segfaults right away?
> How can I get a more detailed output from my application? My job
> output
> file is
> zero length.
> Jozsef
> References
> 1. [5]mailto:jbakosi at lanl.gov
>
> References
>
> 1. http://www.alcf.anl.gov/user-guides/coreprocessor
> 2. https://www.alcf.anl.gov/user-guides/core-file-settings
More information about the llvm-bgq-discuss
mailing list