[dcmf] [PATCH 2/3] Use naive strided routines for af_ufs.

Robert Latham robl at mcs.anl.gov
Fri Feb 15 12:48:50 CST 2008


On Fri, Feb 15, 2008 at 12:30:04PM -0600, Bob Cernohous wrote:
> diff --git a/lib/mpi/mpich2/src/mpi/romio/adio/ad_bgl/ad_bgl_hints.c b/lib/mpi/mpich2/src/mpi/romio/adio/ad_bgl/ad_bgl_hints.c
> index 144e722..aa933a3 100644
> --- a/lib/mpi/mpich2/src/mpi/romio/adio/ad_bgl/ad_bgl_hints.c
> +++ b/lib/mpi/mpich2/src/mpi/romio/adio/ad_bgl/ad_bgl_hints.c
> @@ -101,13 +101,24 @@ void ADIOI_BGL_SetInfo(ADIO_File fd, MPI_Info users_info, int *error_code)
>  	MPI_Info_set(info, "ind_wr_buffer_size", ADIOI_BGL_IND_WR_BUFFER_SIZE_DFLT);
>  	fd->hints->ind_wr_buffer_size = atoi(ADIOI_BGL_IND_WR_BUFFER_SIZE_DFLT);
>  
> -	/* default is to let romio automatically decide when to use data
> -	 * sieving
> -	 */
> -	MPI_Info_set(info, "romio_ds_read", "automatic"); 
> -	fd->hints->ds_read = ADIOI_HINT_AUTO;
> -	MPI_Info_set(info, "romio_ds_write", "automatic"); 
> -	fd->hints->ds_write = ADIOI_HINT_AUTO;
> +  if(fd->file_system == ADIO_UFS)
> +  {
> +    /* default for ufs/pvfs is to disable data sieving  */
> +    MPI_Info_set(info, "romio_ds_read", "disable"); 
> +    fd->hints->ds_read = ADIOI_HINT_DISABLE;
> +    MPI_Info_set(info, "romio_ds_write", "disable"); 
> +    fd->hints->ds_write = ADIOI_HINT_DISABLE;
> +  }
> +  else
> +  {
> +    /* default is to let romio automatically decide when to use data
> +     * sieving
> +     */
> +    MPI_Info_set(info, "romio_ds_read", "automatic"); 
> +    fd->hints->ds_read = ADIOI_HINT_AUTO;
> +    MPI_Info_set(info, "romio_ds_write", "automatic"); 
> +    fd->hints->ds_write = ADIOI_HINT_AUTO;
> +  }
>  
>  	fd->hints->initialized = 1;
>      }

I see what you're doing here.  I like that the hints are being set in
case a caller wants to examine the state of the MPI_INFO objects (it
just kills me that "bgl_nodes_pset" is invisible to end-users...)

"automatic" means that romio will do independent I/O if the file views
are contiguous and non-overlapping.  This is a great heuristic for
linux clusters but i'm not so sure about bluegene.  My gut says that
we should use collective I/O all the time so we can concentrate the
I/O on a few aggregators.   Maybe that makes less sense now on BGP
with the io proxies.

> diff --git a/lib/mpi/mpich2/src/mpi/romio/adio/ad_ufs/ad_ufs.c b/lib/mpi/mpich2/src/mpi/romio/adio/ad_ufs/ad_ufs.c
> old mode 100755
> new mode 100644
> index ce0f6a5..a13ef78
> --- a/lib/mpi/mpich2/src/mpi/romio/adio/ad_ufs/ad_ufs.c
> +++ b/lib/mpi/mpich2/src/mpi/romio/adio/ad_ufs/ad_ufs.c
> @@ -20,8 +20,8 @@ struct ADIOI_Fns_struct ADIO_UFS_operations = {
>      ADIOI_GEN_SeekIndividual, /* SeekIndividual */
>      ADIOI_GEN_Fcntl, /* Fcntl */
>      ADIOI_BGL_SetInfo, /* SetInfo */
> -    ADIOI_GEN_ReadStrided, /* ReadStrided */
> -    ADIOI_NOLOCK_WriteStrided, /* WriteStrided */
> +    ADIOI_GEN_ReadStrided_naive, /*ADIOI_GEN_ReadStrided, * ReadStrided */
> +    ADIOI_GEN_WriteStrided_naive, /*ADIOI_NOLOCK_WriteStrided, * WriteStrided */
>      ADIOI_BGL_Close, /* Close */
>  #ifdef ROMIO_HAVE_WORKING_AIO
>      ADIOI_GEN_IreadContig, /* IreadContig */

NOLOCK is a litte smarter than Naiive in the "noncontig in memory
contig in file" case.  In that situation, NOLOCK will perform write
combining and do fewer writes.  

GEN_ReadStrided is safe for PVFS.  We're just reading additional data,
not trying to do an atomic read-modify-write.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B



More information about the dcmf mailing list