[alcf-discuss] [Fwd: Re: [Fwd: Re: Parallel hdf5 I/O of strided data]]

Jason Kraftcheck kraftche at cae.wisc.edu
Wed May 6 16:27:50 CDT 2009


Tim Tautges wrote:
> ???
> 

a) No, the MOAB parallel writer does no collective IO right now.

b) The discussion below seems to include the erroneous assumption that the
   independent/collective property is specified at file open.

> - tim
> 
> -------- Original Message --------
> Subject: Re: [Fwd: Re: [alcf-discuss] Parallel hdf5 I/O of strided data]
> Date: Wed, 06 May 2009 15:52:53 -0500
> From: Jason Kraftcheck <kraftche at cae.wisc.edu>
> To: Tim Tautges <tautges at mcs.anl.gov>
> References: <4A01F6CB.3040107 at mcs.anl.gov>
> 
> Tim Tautges wrote:
>> Are you using the collective property in the parallel hdf5 writer?  (See
>> below...)
>>
> 
> The comment below seems to be inconsistent with the documentation.  If you
> follow the link below to the description of H5Pset_dxpl_mpio, note the part
> where it says "The property list can then be used to control the I/O
> transfer mode during data I/O operations."  That seems to imply that the
> property is to be passed to things like H5Dread and H5Dwrite (data I/O)
> rather than H5Fcreate.
> 
> The Parallel HDF5 documentation (what there is of it) also says that this
> property gets passed to H5Dread and H5Dwrite
> (http://www.hdfgroup.org/HDF5/Tutor/pcrtaccd.html).  It never mentions the
> property in connection with H5Fcreate or H5Fopen, nor can I find such a
> usage in any of the examples.
> 
> 
> - jason
> 
> 
>> - tim
>>
>> -------- Original Message --------
>> Subject: Re: [alcf-discuss] Parallel hdf5 I/O of strided data
>> Date: Wed, 6 May 2009 14:48:48 -0500
>> From: Rob Latham <robl at mcs.anl.gov>
>> To: John R. Cary <cary at txcorp.com>
>> CC: discuss at lists.alcf.anl.gov
>> References: <4A01BF23.3030500 at txcorp.com>
>> <20090506181859.GD9315 at mcs.anl.gov>    <4A01E33E.9070702 at txcorp.com>
>>
>> On Wed, May 06, 2009 at 12:21:34PM -0700, John R. Cary wrote:
>>> First, Thanks for setting up this list.  I wonder whether it
>>> makes sense to have Reply-To list turned
>>> on for this?  I usually just hit reply, and then the list does not get
>>> this.
>>
>> There are a lot of Robs at argonne.  I didn't set up this list:
>> another Rob did :>   I don't like reply-to-list myself, but like
>> arguing about which is better even less! :>
>>
>>> /gpfs1/cary
>>
>> Do you see a similar result if you write to /pvfs-surveyor ?
>>
>>>> - what, if any, HDF5 property list tunables are you setting
>>>>   
>>> On file open:
>>>
>>>  MPI_Comm comm = MPI_COMM_WORLD;
>>>  MPI_Info info = MPI_INFO_NULL;
>>>  hid_t plistId = H5Pcreate(H5P_FILE_ACCESS);
>>>  H5Pset_fapl_mpio(plistId, comm, info);
>>>  hid_t res = H5Fcreate(filename.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT,
>>>        plistId);
>>
>> OK, so you have turned on parallel I/O, but not collective I/O.  This
>> is orthogonal to the correctness issue, but if you enable collective
>> I/O, you'll get much much better performance.
>>
>> http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetDxplMpio
>>
>> set the H5FD_MPIO_COLLECTIVE flag and you'll exercise a different code
>> path.  Maybe it will make your problem go away, too?
>>
>> As to your correctness issue, independent I/O is a pretty simple
>> workload.  I'm surprised you're seeing errors with that.
>>
>> Do you have a small testcase for this, or do you have to run your
>> entire application to see the corruption?
>>
>> ==rob
>>
> 
> 


-- 
"A foolish consistency is the hobgoblin of little minds" - Ralph Waldo Emerson



More information about the discuss mailing list