[alcf-discuss] Parallel hdf5 I/O of strided data
Rob Latham
robl at mcs.anl.gov
Wed May 6 14:48:48 CDT 2009
On Wed, May 06, 2009 at 12:21:34PM -0700, John R. Cary wrote:
> First, Thanks for setting up this list. I wonder whether it
> makes sense to have Reply-To list turned
> on for this? I usually just hit reply, and then the list does not get
> this.
There are a lot of Robs at argonne. I didn't set up this list:
another Rob did :> I don't like reply-to-list myself, but like
arguing about which is better even less! :>
> /gpfs1/cary
Do you see a similar result if you write to /pvfs-surveyor ?
>> - what, if any, HDF5 property list tunables are you setting
>>
> On file open:
>
> MPI_Comm comm = MPI_COMM_WORLD;
> MPI_Info info = MPI_INFO_NULL;
> hid_t plistId = H5Pcreate(H5P_FILE_ACCESS);
> H5Pset_fapl_mpio(plistId, comm, info);
> hid_t res = H5Fcreate(filename.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT,
> plistId);
OK, so you have turned on parallel I/O, but not collective I/O. This
is orthogonal to the correctness issue, but if you enable collective
I/O, you'll get much much better performance.
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetDxplMpio
set the H5FD_MPIO_COLLECTIVE flag and you'll exercise a different code
path. Maybe it will make your problem go away, too?
As to your correctness issue, independent I/O is a pretty simple
workload. I'm surprised you're seeing errors with that.
Do you have a small testcase for this, or do you have to run your
entire application to see the corruption?
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the discuss
mailing list