[dcmf] Lockless ROMIO driver

Rob Ross rross at mcs.anl.gov
Wed Feb 13 11:24:42 CST 2008


Hi Bob,

ROMIO has two "layers" to it, an upper layer that implements the MPI-2  
I/O functions (aka. MPI-IO functions), and a lower layer (called the  
ADIO layer) that implements a common set of basic I/O operations in  
terms of a specific underlying file system or storage system. The ADIO  
implementation that IBM developed, called "ad_bgl", is a derivative of  
the "ad_nfs" driver from the ANL ROMIO implementation. This particular  
driver aggressively uses locking operations to force flush of cache in  
NFS file systems, for example forcing write locks even for certain  
read operations.

For GPFS, the result is that more locking is performed than is  
strictly necessary. In fact, the "ad_ufs" driver (for which RobL just  
provided a patch to add back to the IBM ROMIO) is a perfect match for  
GPFS, using locks only when necessary to ensure correctness.  
Specifically, locks are used when the "data sieving" optimizations are  
performed and in the MPI-IO atomic mode. However, MPI-IO operations to  
GPFS file systems will operate correctly with the existing "ad_bgl"  
driver, so unless the performance degradation of the additional locks  
is a serious problem, there's no reason to fix this for GPFS. It would  
be good to assess the overhead here, but I think we can do that at our  
leisure.

A problem arises for PVFS, however, which is specific to BG/[LP]. On  
other systems, we use a specific "ad_pvfs2" driver that understands  
the semantics of PVFS, uses special PVFS library calls, and does not  
use locks (because PVFS doesn't implement them). On BG, all I/O is  
forwarded through the ciod, so we do not have the option of using the  
PVFS library calls (without massive hacking or replacement of the  
ciod). Further, the IBM ROMIO isn't detecting the underlying file  
system -- it's always using "ad_bgl". Worse, the "ad_bgl" driver is  
attempting to apply an optimization (specifically the data sieving  
write optimization) that produces incorrect results for PVFS, because  
PVFS does not implement the locks necessary to ensure that the data  
sieving write is performed atomically.

RobL's patch is designed to work around this problem by detecting an  
underlying PVFS file system and switching to the use of a slightly  
modified "ad_ufs" driver that does not attempt to apply the data  
sieving write optimization. For other file systems, the existing (and  
tested) "ad_bgl" driver is used.

This is an important patch, because until it is applied PVFS will not  
produce consistently correct results for acceptance tests using MPI-IO  
that perform noncontiguous I/O. We at Argonne are continuing to do our  
best to address all the PVFS-related acceptance test issues, but in  
this case we need IBM's help to adopt the solution into their code. We  
will be testing this patch locally, but my expectation would be that  
we would not consider acceptance tests as passed if we can't pass them  
with IBM-supplied versions of BG/P software.

Regards,

Rob

On Feb 13, 2008, at 11:02 AM, Bob Cernohous wrote:

>
> Could someone give me a brief synopsis or pointer to the "locking  
> issues" with NFS?   I'd like to understand it better and maybe  
> discuss this with our GPFS experts.   I just wonder if it's better,  
> long term, to go lockless in ad_bgl and build bgl+nfs as a fallback  
> for NFS?   It's certainly something we could explore on this list,  
> but I don't know if/when it would make it into an IBM release.    
> There's certainly some "process" we're still inventing/exploring  
> here on this list.
>
> Rob, I didn't have time yet to dig into your patch.  But I suspect  
> you have not applied my recent patches from this list to your  
> source.  I think there might be a conflict.  I'll look into it  
> later.  I'm moving my office today, plus I've been swamped with some  
> side issues.
>
> Bob Cernohous:  (T/L 553) 507-253-6093
>
> BobC at us.ibm.com
> IBM Rochester, Building 030-2(C335), Department 61L
> 3605 Hwy 52 North, Rochester,  MN 55901-7829
>
> > Chaos reigns within.
> > Reflect, repent, and reboot.
> > Order shall return.
>
>
>
> > These patches re-introduce ad_ufs to ROMIO, but do so in a lock- 
> freeway.  This
> > is my first git experience, so sorry for the messy subject lines.
> >
> _______________________________________________
> dcmf mailing list
> dcmf at lists.anl-external.org
> http://lists.anl-external.org/cgi-bin/mailman/listinfo/dcmf
> http://dcmf.anl-external.org/wiki




More information about the dcmf mailing list