[dcmf] romio updates

Robert Latham robl at mcs.anl.gov
Thu Feb 14 10:08:03 CST 2008


On Thu, Feb 14, 2008 at 09:40:48AM -0600, Bob Cernohous wrote:
> I was taking a quick look at updating ad_bgl to 1.0.6p1 (or later). There 
> have been some aio changes - perhaps someone can brief me and suggest a 
> direction for ad_bgl?

yeah, we've done a big reworking of the asyncronous I/O in ROMIO in
1.0.6

> >From what I see, you've given up on nfs aio:
> 
> -    ADIOI_NFS_IreadContig, /* IreadContig */
> -    ADIOI_NFS_IwriteContig, /* IwriteContig */
> +    /* Even with lockd running and NFS mounted 'noac', we have been 
> unable to
> +     * gaurantee correct behavior over NFS with asyncronous I/O 
> operations */
> +    ADIOI_FAKE_IreadContig, /* IreadContig */
> +    ADIOI_FAKE_IwriteContig, /* IwriteContig */
> 
> As has been discussed on this list, ad_bgl was apparently based on ad_nfs. 
>   I'm trying to decide if we need anything more for our GPFS support?

ROMIO's had a couple of different ways of implementing nonblocking i/o

old way: ROMIO implemented its own (incompatible) MPIO_Request
objects.  Instead of MPI_TEST/MPI_WAIT/etc, callers had to use
MPIO_TEST and MPIO_WAIT.  ROMIO did its own bookeeping. 

MPICH2 way:  I converted all the nonblocking operations to use MPI-2
generalized requests.  ROMIO hands back real, compatible MPI request
objects, and callers can use MPI_TEST/MPI_WAIT and friends.  One
problem: generalized requests require a caller to spawn a thread to
make progress, and since we wanted to work on BGL (where threads were
a no-go) and BGP (where the thread model is restrictive and only
available in some modes), we had our "nonblocking" I/O operations just
do all the I/O up front.  It's correct behavior but means there's zero
overlap between I/O and computation

New way:  We came up with a better generalized request extension and
added that to MPICH2.  This extension provides a means for making
progress on a generalized request at test and wait time.  Best of both
worlds:  ROMIO doesn't have to track state, ROMIO hands back
standard-compliant request objects, and, if the underlying OS supports
it, we get overlap between computation and I/O. It's not
standard (yet).  If you're really fired up to read more about this,
check out
http://www.mcs.anl.gov/~robl/papers/latham_grequest-enhance.pdf

> We don't have aio support, so I'm thinking we do something similar,  But I 
> see some status processing in the old code that has gone away.  Just 
> delete this?  Go with ADIOI_GEN_IODone or ADIOI_NFS_ReadDone?

Do you mean aio_read/aio_write/aio_suspend/aio_error/aio_return are
ENOSYS on BlueGene or are they just wrapers around blocking I/O?  

The easiest thing to use is just ADIOI_FAKE_IreadContig and
ADIOI_FAKE_IwriteContig, which will carry out ADIO_ReadContig and
return an already-completed MPI request object.  

> in ADIOI_BGL_ReadDone()
> 
> #ifndef ROMIO_HAVE_WORKING_AIO
> #ifdef HAVE_STATUS_SET_BYTES
>     MPIR_Status_set_bytes(status, (*request)->datatype, 
> (*request)->nbytes);
> #endif
>     (*request)->fd->async_count--;
>     ADIOI_Free_request((ADIOI_Req_node *) (*request));
>     *request = ADIO_REQUEST_NULL;
>     *error_code = MPI_SUCCESS;
>     return 1;
> #endif 

Yeah, this is the old way.  The ADIOI_XXX_IODone and IOComplete
functions are stubs in 1.0.6 and newer.  I should just remove them
altogether now.  What used to be done in IODone and IOComplete is now
done in the generalized request callbacks.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B



More information about the dcmf mailing list