[alcf-discuss] [Fwd: Re: [Fwd: Re: Parallel hdf5 I/O of strided data]]
Tim Tautges
tautges at mcs.anl.gov
Wed May 6 16:16:02 CDT 2009
???
- tim
-------- Original Message --------
Subject: Re: [Fwd: Re: [alcf-discuss] Parallel hdf5 I/O of strided data]
Date: Wed, 06 May 2009 15:52:53 -0500
From: Jason Kraftcheck <kraftche at cae.wisc.edu>
To: Tim Tautges <tautges at mcs.anl.gov>
References: <4A01F6CB.3040107 at mcs.anl.gov>
Tim Tautges wrote:
> Are you using the collective property in the parallel hdf5 writer? (See
> below...)
>
The comment below seems to be inconsistent with the documentation. If you
follow the link below to the description of H5Pset_dxpl_mpio, note the part
where it says "The property list can then be used to control the I/O
transfer mode during data I/O operations." That seems to imply that the
property is to be passed to things like H5Dread and H5Dwrite (data I/O)
rather than H5Fcreate.
The Parallel HDF5 documentation (what there is of it) also says that this
property gets passed to H5Dread and H5Dwrite
(http://www.hdfgroup.org/HDF5/Tutor/pcrtaccd.html). It never mentions the
property in connection with H5Fcreate or H5Fopen, nor can I find such a
usage in any of the examples.
- jason
> - tim
>
> -------- Original Message --------
> Subject: Re: [alcf-discuss] Parallel hdf5 I/O of strided data
> Date: Wed, 6 May 2009 14:48:48 -0500
> From: Rob Latham <robl at mcs.anl.gov>
> To: John R. Cary <cary at txcorp.com>
> CC: discuss at lists.alcf.anl.gov
> References: <4A01BF23.3030500 at txcorp.com>
> <20090506181859.GD9315 at mcs.anl.gov> <4A01E33E.9070702 at txcorp.com>
>
> On Wed, May 06, 2009 at 12:21:34PM -0700, John R. Cary wrote:
>> First, Thanks for setting up this list. I wonder whether it
>> makes sense to have Reply-To list turned
>> on for this? I usually just hit reply, and then the list does not get
>> this.
>
> There are a lot of Robs at argonne. I didn't set up this list:
> another Rob did :> I don't like reply-to-list myself, but like
> arguing about which is better even less! :>
>
>> /gpfs1/cary
>
> Do you see a similar result if you write to /pvfs-surveyor ?
>
>>> - what, if any, HDF5 property list tunables are you setting
>>>
>> On file open:
>>
>> MPI_Comm comm = MPI_COMM_WORLD;
>> MPI_Info info = MPI_INFO_NULL;
>> hid_t plistId = H5Pcreate(H5P_FILE_ACCESS);
>> H5Pset_fapl_mpio(plistId, comm, info);
>> hid_t res = H5Fcreate(filename.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT,
>> plistId);
>
> OK, so you have turned on parallel I/O, but not collective I/O. This
> is orthogonal to the correctness issue, but if you enable collective
> I/O, you'll get much much better performance.
>
> http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetDxplMpio
>
> set the H5FD_MPIO_COLLECTIVE flag and you'll exercise a different code
> path. Maybe it will make your problem go away, too?
>
> As to your correctness issue, independent I/O is a pretty simple
> workload. I'm surprised you're seeing errors with that.
>
> Do you have a small testcase for this, or do you have to run your
> entire application to see the corruption?
>
> ==rob
>
--
"A foolish consistency is the hobgoblin of little minds" - Ralph Waldo
Emerson
--
================================================================
"You will keep in perfect peace him whose mind is
steadfast, because he trusts in you." Isaiah 26:3
Tim Tautges Argonne National Laboratory
(tautges at mcs.anl.gov) (telecommuting from UW-Madison)
phone: (608) 263-8485 1500 Engineering Dr.
fax: (608) 263-4499 Madison, WI 53706
More information about the discuss
mailing list