[Llvm-bgq-discuss] Details behind MPI wrapper for bgclang++

Jack Poulson jack.poulson at gmail.com
Fri Mar 1 14:22:10 CST 2013


On Fri, Mar 1, 2013 at 12:04 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Jack Poulson" <jack.poulson at gmail.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: "Jeff Hammond" <jhammond at alcf.anl.gov>,
> llvm-bgq-discuss at lists.alcf.anl.gov
> > Sent: Friday, March 1, 2013 10:16:24 AM
> > Subject: Re: [Llvm-bgq-discuss] Details behind MPI wrapper for bgclang++
> >
> > On Thu, Feb 28, 2013 at 10:15 PM, Hal Finkel < hfinkel at anl.gov >
> > wrote:
> >
> >
> >
> >
> >
> > Not a problem! Thanks for being a beta tester :) I've updated the
> > installed libc++ libraries to use CLOCK_REALTIME instead of
> > CLOCK_MONOTONIC. Please try again.
> >
> > -Hal
> >
> >
> >
> >
> > One more problem taken care of it seems. Unfortunately my program now
> > segfaults in an MPI_Gather call (and the trace still seems a bit
> > corrupted, see core.13). There is really only one instance in my
> > program where MPI_Gather is called, and it looks like this:
> >
> >
> > vector<int> myCoords(d), coords(1);
> > // <fill myCoords here>
> > if( commRank == 0 )
> > coords.resize( d*commSize );
> > MPI_Gather( &myCoords[0], d, MPI_INT, &coords[0], d, MPI_INT, 0, comm
> > );
> >
> >
> > In the above snippet, 'd' is the dimension of the domain, which is
> > two for the executable in question, and space for storing every
> > process's coordinates is only allocated on the root process. This is
> > pretty straightforward MPI in my opinion, so I am skeptical that I
> > have a bug here.
>
> Unfortunately, the debug into seems completely useless here. Some of our
> IBM contributors have been working on fixing problems with debug info, so
> hopefully this will improve soon.
>
> In any case, the actual crash is in:
> dbf::bfly::PotentialField<float, 2ul,
> 8ul>::Evaluate(std::__1::array<float, 2ul> const&) const
>
> just after a call to:
> dbf::bfly::Context<float, 2ul, 8ul>::Lagrange(unsigned long,
> std::__1::array<float, 2ul> const&) const
>
> Does that give enough context to guess at the source location? Also, can
> you try linking the executable statically? I wonder if this is some kind of
> PIC problem.
>
>
That is infinitely more information than I had before. What did you do to
find this out?

The latter routine heavily used restrict, but after removing all usages of
restrict from my entire program and recompiling I received an essentially
identical coredump file (though I suppose that it is possible that the
crash occurred somewhere else).

Jack
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20130301/3f894dcc/attachment-0001.html>


More information about the llvm-bgq-discuss mailing list