[dcmf] Lockless ROMIO driver
Rob Ross
rross at mcs.anl.gov
Wed Feb 13 11:24:42 CST 2008
Hi Bob,
ROMIO has two "layers" to it, an upper layer that implements the MPI-2
I/O functions (aka. MPI-IO functions), and a lower layer (called the
ADIO layer) that implements a common set of basic I/O operations in
terms of a specific underlying file system or storage system. The ADIO
implementation that IBM developed, called "ad_bgl", is a derivative of
the "ad_nfs" driver from the ANL ROMIO implementation. This particular
driver aggressively uses locking operations to force flush of cache in
NFS file systems, for example forcing write locks even for certain
read operations.
For GPFS, the result is that more locking is performed than is
strictly necessary. In fact, the "ad_ufs" driver (for which RobL just
provided a patch to add back to the IBM ROMIO) is a perfect match for
GPFS, using locks only when necessary to ensure correctness.
Specifically, locks are used when the "data sieving" optimizations are
performed and in the MPI-IO atomic mode. However, MPI-IO operations to
GPFS file systems will operate correctly with the existing "ad_bgl"
driver, so unless the performance degradation of the additional locks
is a serious problem, there's no reason to fix this for GPFS. It would
be good to assess the overhead here, but I think we can do that at our
leisure.
A problem arises for PVFS, however, which is specific to BG/[LP]. On
other systems, we use a specific "ad_pvfs2" driver that understands
the semantics of PVFS, uses special PVFS library calls, and does not
use locks (because PVFS doesn't implement them). On BG, all I/O is
forwarded through the ciod, so we do not have the option of using the
PVFS library calls (without massive hacking or replacement of the
ciod). Further, the IBM ROMIO isn't detecting the underlying file
system -- it's always using "ad_bgl". Worse, the "ad_bgl" driver is
attempting to apply an optimization (specifically the data sieving
write optimization) that produces incorrect results for PVFS, because
PVFS does not implement the locks necessary to ensure that the data
sieving write is performed atomically.
RobL's patch is designed to work around this problem by detecting an
underlying PVFS file system and switching to the use of a slightly
modified "ad_ufs" driver that does not attempt to apply the data
sieving write optimization. For other file systems, the existing (and
tested) "ad_bgl" driver is used.
This is an important patch, because until it is applied PVFS will not
produce consistently correct results for acceptance tests using MPI-IO
that perform noncontiguous I/O. We at Argonne are continuing to do our
best to address all the PVFS-related acceptance test issues, but in
this case we need IBM's help to adopt the solution into their code. We
will be testing this patch locally, but my expectation would be that
we would not consider acceptance tests as passed if we can't pass them
with IBM-supplied versions of BG/P software.
Regards,
Rob
On Feb 13, 2008, at 11:02 AM, Bob Cernohous wrote:
>
> Could someone give me a brief synopsis or pointer to the "locking
> issues" with NFS? I'd like to understand it better and maybe
> discuss this with our GPFS experts. I just wonder if it's better,
> long term, to go lockless in ad_bgl and build bgl+nfs as a fallback
> for NFS? It's certainly something we could explore on this list,
> but I don't know if/when it would make it into an IBM release.
> There's certainly some "process" we're still inventing/exploring
> here on this list.
>
> Rob, I didn't have time yet to dig into your patch. But I suspect
> you have not applied my recent patches from this list to your
> source. I think there might be a conflict. I'll look into it
> later. I'm moving my office today, plus I've been swamped with some
> side issues.
>
> Bob Cernohous: (T/L 553) 507-253-6093
>
> BobC at us.ibm.com
> IBM Rochester, Building 030-2(C335), Department 61L
> 3605 Hwy 52 North, Rochester, MN 55901-7829
>
> > Chaos reigns within.
> > Reflect, repent, and reboot.
> > Order shall return.
>
>
>
> > These patches re-introduce ad_ufs to ROMIO, but do so in a lock-
> freeway. This
> > is my first git experience, so sorry for the messy subject lines.
> >
> _______________________________________________
> dcmf mailing list
> dcmf at lists.anl-external.org
> http://lists.anl-external.org/cgi-bin/mailman/listinfo/dcmf
> http://dcmf.anl-external.org/wiki
More information about the dcmf
mailing list