[Llvm-bgq-discuss] Details behind MPI wrapper for bgclang++

Hal Finkel hfinkel at anl.gov
Fri Mar 1 14:04:17 CST 2013


----- Original Message -----
> From: "Jack Poulson" <jack.poulson at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Jeff Hammond" <jhammond at alcf.anl.gov>, llvm-bgq-discuss at lists.alcf.anl.gov
> Sent: Friday, March 1, 2013 10:16:24 AM
> Subject: Re: [Llvm-bgq-discuss] Details behind MPI wrapper for bgclang++
> 
> On Thu, Feb 28, 2013 at 10:15 PM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> 
> 
> 
> Not a problem! Thanks for being a beta tester :) I've updated the
> installed libc++ libraries to use CLOCK_REALTIME instead of
> CLOCK_MONOTONIC. Please try again.
> 
> -Hal
> 
> 
> 
> 
> One more problem taken care of it seems. Unfortunately my program now
> segfaults in an MPI_Gather call (and the trace still seems a bit
> corrupted, see core.13). There is really only one instance in my
> program where MPI_Gather is called, and it looks like this:
> 
> 
> vector<int> myCoords(d), coords(1);
> // <fill myCoords here>
> if( commRank == 0 )
> coords.resize( d*commSize );
> MPI_Gather( &myCoords[0], d, MPI_INT, &coords[0], d, MPI_INT, 0, comm
> );
> 
> 
> In the above snippet, 'd' is the dimension of the domain, which is
> two for the executable in question, and space for storing every
> process's coordinates is only allocated on the root process. This is
> pretty straightforward MPI in my opinion, so I am skeptical that I
> have a bug here.

Unfortunately, the debug into seems completely useless here. Some of our IBM contributors have been working on fixing problems with debug info, so hopefully this will improve soon.

In any case, the actual crash is in:
dbf::bfly::PotentialField<float, 2ul, 8ul>::Evaluate(std::__1::array<float, 2ul> const&) const

just after a call to:
dbf::bfly::Context<float, 2ul, 8ul>::Lagrange(unsigned long, std::__1::array<float, 2ul> const&) const

Does that give enough context to guess at the source location? Also, can you try linking the executable statically? I wonder if this is some kind of PIC problem.

Thanks again,
Hal

> 
> 
> Jack


More information about the llvm-bgq-discuss mailing list