[Llvm-bgq-discuss] Details behind MPI wrapper for bgclang++

Jack Poulson jack.poulson at gmail.com
Fri Mar 1 13:12:08 CST 2013


Hi Jeff,

I think I could have been clearer in my last message. I am skeptical that
Darshan is the issue but would like to spend a minute looking through the
code for MPI_Gather on BGQ. Is this accessible?

Jack

On Fri, Mar 1, 2013 at 11:01 AM, Jeff Hammond <jhammond at alcf.anl.gov> wrote:

> https://www.alcf.anl.gov/resource-guides/darshan doesn't touch the MPI
> source.  It's an IO profiling library that is interposed by the MPI
> wrappers.  If you don't see , then it might not be included.
>
> As you know, I'm at SIAM, but I'll try to look at this next week
> during MiraCon (which I know you are not attending).
>
> Jeff
>
> On Fri, Mar 1, 2013 at 1:52 PM, Jack Poulson <jack.poulson at gmail.com>
> wrote:
> > Hi Jeff,
> >
> > Yes, this is on Vesta. There doesn't seem to be anything in your .soft
> file
> > different from mine, other than you specifying the Nov 2012 IBM
> compilers.
> >
> > Is it possible for me to browse through the current source for the BGQ
> MPICH
> > modifications?
> >
> > Jack
> >
> >
> > On Fri, Mar 1, 2013 at 10:30 AM, Jeff Hammond <jhammond at alcf.anl.gov>
> wrote:
> >>
> >> This is BGQ @ ALCF, right?  Might be MPI calls inside of Darshan.  I
> >> disable it because of issues like this.
> >>
> >> I think my ~/.soft is world-readable.  Use the @mpi-wrappers script
> >> and try to verify that you aren't getting Darshan in your build.
> >>
> >> Jeff
> >>
> >> On Fri, Mar 1, 2013 at 11:16 AM, Jack Poulson <jack.poulson at gmail.com>
> >> wrote:
> >> > On Thu, Feb 28, 2013 at 10:15 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> >>
> >> >>
> >> >> Not a problem! Thanks for being a beta tester :) I've updated the
> >> >> installed libc++ libraries to use CLOCK_REALTIME instead of
> >> >> CLOCK_MONOTONIC.
> >> >> Please try again.
> >> >>
> >> >>  -Hal
> >> >>
> >> >
> >> > One more problem taken care of it seems. Unfortunately my program now
> >> > segfaults in an MPI_Gather call (and the trace still seems a bit
> >> > corrupted,
> >> > see core.13). There is really only one instance in my program where
> >> > MPI_Gather is called, and it looks like this:
> >> >
> >> > vector<int> myCoords(d), coords(1);
> >> > // <fill myCoords here>
> >> > if( commRank == 0 )
> >> >     coords.resize( d*commSize );
> >> > MPI_Gather( &myCoords[0], d, MPI_INT, &coords[0], d, MPI_INT, 0, comm
> );
> >> >
> >> > In the above snippet, 'd' is the dimension of the domain, which is two
> >> > for
> >> > the executable in question, and space for storing every process's
> >> > coordinates is only allocated on the root process. This is pretty
> >> > straightforward MPI in my opinion, so I am skeptical that I have a bug
> >> > here.
> >> >
> >> > Jack
> >>
> >>
> >>
> >> --
> >> Jeff Hammond
> >> Argonne Leadership Computing Facility
> >> University of Chicago Computation Institute
> >> jhammond at alcf.anl.gov / (630) 252-5381
> >> http://www.linkedin.com/in/jeffhammond
> >> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
> >
> >
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alcf.anl.gov/pipermail/llvm-bgq-discuss/attachments/20130301/aedcc762/attachment.html>


More information about the llvm-bgq-discuss mailing list