[Llvm-bgq-discuss] Details behind MPI wrapper for bgclang++

Jeff Hammond jhammond at alcf.anl.gov
Fri Mar 1 13:18:48 CST 2013


Look at https://wiki.alcf.anl.gov/parts/index.php/MPICH_on_Blue_Gene/Q#Building_BGQ-MPI_from_source
and swap V1R2M0.

Note that the MPICH2 1.5 stock release should be identical to BGQ for
your purposes.  Mike told me there were some patches that weren't
accept by the MPICH guys but I suspect those are in the build system
and glue code.

Anyways, getting the driver source is just slightly hardware than
mpich.org but I'm sure you can figure them both out.

Jeff

On Fri, Mar 1, 2013 at 2:12 PM, Jack Poulson <jack.poulson at gmail.com> wrote:
> Hi Jeff,
>
> I think I could have been clearer in my last message. I am skeptical that
> Darshan is the issue but would like to spend a minute looking through the
> code for MPI_Gather on BGQ. Is this accessible?
>
> Jack
>
>
> On Fri, Mar 1, 2013 at 11:01 AM, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>>
>> https://www.alcf.anl.gov/resource-guides/darshan doesn't touch the MPI
>> source.  It's an IO profiling library that is interposed by the MPI
>> wrappers.  If you don't see , then it might not be included.
>>
>> As you know, I'm at SIAM, but I'll try to look at this next week
>> during MiraCon (which I know you are not attending).
>>
>> Jeff
>>
>> On Fri, Mar 1, 2013 at 1:52 PM, Jack Poulson <jack.poulson at gmail.com>
>> wrote:
>> > Hi Jeff,
>> >
>> > Yes, this is on Vesta. There doesn't seem to be anything in your .soft
>> > file
>> > different from mine, other than you specifying the Nov 2012 IBM
>> > compilers.
>> >
>> > Is it possible for me to browse through the current source for the BGQ
>> > MPICH
>> > modifications?
>> >
>> > Jack
>> >
>> >
>> > On Fri, Mar 1, 2013 at 10:30 AM, Jeff Hammond <jhammond at alcf.anl.gov>
>> > wrote:
>> >>
>> >> This is BGQ @ ALCF, right?  Might be MPI calls inside of Darshan.  I
>> >> disable it because of issues like this.
>> >>
>> >> I think my ~/.soft is world-readable.  Use the @mpi-wrappers script
>> >> and try to verify that you aren't getting Darshan in your build.
>> >>
>> >> Jeff
>> >>
>> >> On Fri, Mar 1, 2013 at 11:16 AM, Jack Poulson <jack.poulson at gmail.com>
>> >> wrote:
>> >> > On Thu, Feb 28, 2013 at 10:15 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> >> >>
>> >> >>
>> >> >> Not a problem! Thanks for being a beta tester :) I've updated the
>> >> >> installed libc++ libraries to use CLOCK_REALTIME instead of
>> >> >> CLOCK_MONOTONIC.
>> >> >> Please try again.
>> >> >>
>> >> >>  -Hal
>> >> >>
>> >> >
>> >> > One more problem taken care of it seems. Unfortunately my program now
>> >> > segfaults in an MPI_Gather call (and the trace still seems a bit
>> >> > corrupted,
>> >> > see core.13). There is really only one instance in my program where
>> >> > MPI_Gather is called, and it looks like this:
>> >> >
>> >> > vector<int> myCoords(d), coords(1);
>> >> > // <fill myCoords here>
>> >> > if( commRank == 0 )
>> >> >     coords.resize( d*commSize );
>> >> > MPI_Gather( &myCoords[0], d, MPI_INT, &coords[0], d, MPI_INT, 0, comm
>> >> > );
>> >> >
>> >> > In the above snippet, 'd' is the dimension of the domain, which is
>> >> > two
>> >> > for
>> >> > the executable in question, and space for storing every process's
>> >> > coordinates is only allocated on the root process. This is pretty
>> >> > straightforward MPI in my opinion, so I am skeptical that I have a
>> >> > bug
>> >> > here.
>> >> >
>> >> > Jack
>> >>
>> >>
>> >>
>> >> --
>> >> Jeff Hammond
>> >> Argonne Leadership Computing Facility
>> >> University of Chicago Computation Institute
>> >> jhammond at alcf.anl.gov / (630) 252-5381
>> >> http://www.linkedin.com/in/jeffhammond
>> >> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>> >
>> >
>>
>>
>>
>> --
>> Jeff Hammond
>> Argonne Leadership Computing Facility
>> University of Chicago Computation Institute
>> jhammond at alcf.anl.gov / (630) 252-5381
>> http://www.linkedin.com/in/jeffhammond
>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>
>



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond


More information about the llvm-bgq-discuss mailing list