[dcmf] romio updates
Robert Latham
robl at mcs.anl.gov
Thu Feb 14 10:08:03 CST 2008
On Thu, Feb 14, 2008 at 09:40:48AM -0600, Bob Cernohous wrote:
> I was taking a quick look at updating ad_bgl to 1.0.6p1 (or later). There
> have been some aio changes - perhaps someone can brief me and suggest a
> direction for ad_bgl?
yeah, we've done a big reworking of the asyncronous I/O in ROMIO in
1.0.6
> >From what I see, you've given up on nfs aio:
>
> - ADIOI_NFS_IreadContig, /* IreadContig */
> - ADIOI_NFS_IwriteContig, /* IwriteContig */
> + /* Even with lockd running and NFS mounted 'noac', we have been
> unable to
> + * gaurantee correct behavior over NFS with asyncronous I/O
> operations */
> + ADIOI_FAKE_IreadContig, /* IreadContig */
> + ADIOI_FAKE_IwriteContig, /* IwriteContig */
>
> As has been discussed on this list, ad_bgl was apparently based on ad_nfs.
> I'm trying to decide if we need anything more for our GPFS support?
ROMIO's had a couple of different ways of implementing nonblocking i/o
old way: ROMIO implemented its own (incompatible) MPIO_Request
objects. Instead of MPI_TEST/MPI_WAIT/etc, callers had to use
MPIO_TEST and MPIO_WAIT. ROMIO did its own bookeeping.
MPICH2 way: I converted all the nonblocking operations to use MPI-2
generalized requests. ROMIO hands back real, compatible MPI request
objects, and callers can use MPI_TEST/MPI_WAIT and friends. One
problem: generalized requests require a caller to spawn a thread to
make progress, and since we wanted to work on BGL (where threads were
a no-go) and BGP (where the thread model is restrictive and only
available in some modes), we had our "nonblocking" I/O operations just
do all the I/O up front. It's correct behavior but means there's zero
overlap between I/O and computation
New way: We came up with a better generalized request extension and
added that to MPICH2. This extension provides a means for making
progress on a generalized request at test and wait time. Best of both
worlds: ROMIO doesn't have to track state, ROMIO hands back
standard-compliant request objects, and, if the underlying OS supports
it, we get overlap between computation and I/O. It's not
standard (yet). If you're really fired up to read more about this,
check out
http://www.mcs.anl.gov/~robl/papers/latham_grequest-enhance.pdf
> We don't have aio support, so I'm thinking we do something similar, But I
> see some status processing in the old code that has gone away. Just
> delete this? Go with ADIOI_GEN_IODone or ADIOI_NFS_ReadDone?
Do you mean aio_read/aio_write/aio_suspend/aio_error/aio_return are
ENOSYS on BlueGene or are they just wrapers around blocking I/O?
The easiest thing to use is just ADIOI_FAKE_IreadContig and
ADIOI_FAKE_IwriteContig, which will carry out ADIO_ReadContig and
return an already-completed MPI request object.
> in ADIOI_BGL_ReadDone()
>
> #ifndef ROMIO_HAVE_WORKING_AIO
> #ifdef HAVE_STATUS_SET_BYTES
> MPIR_Status_set_bytes(status, (*request)->datatype,
> (*request)->nbytes);
> #endif
> (*request)->fd->async_count--;
> ADIOI_Free_request((ADIOI_Req_node *) (*request));
> *request = ADIO_REQUEST_NULL;
> *error_code = MPI_SUCCESS;
> return 1;
> #endif
Yeah, this is the old way. The ADIOI_XXX_IODone and IOComplete
functions are stubs in 1.0.6 and newer. I should just remove them
altogether now. What used to be done in IODone and IOComplete is now
done in the generalized request callbacks.
==rob
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
More information about the dcmf
mailing list