PnetCDF Design

MPI_File_set_view() is a collective call, and we don't like having to call it as often as we do. However, that's the way MPI is written, so we end up needing nearly every MPI_File_read_all() and MPI_File_write_all() call to partner with an MPI_File_set_view.

Prior version 1.6.0, we also reset the file view right after each MPI read/write call, because root process may need to write to file header, e.g. to update the number of records, and this write is an independent I/O (for this we cannot reset fileview because MPI_File_set_view() is collective). Starting from 1.6.0, the call to MPI_File_set_view() right after each MPI read/write call is drop, since root process's fileview has been changed to always include file header.

We do try to limit the number of MPI_File_set_view calls: we can avoid setting the fileview entirely when the request is a single contiguous file region. In this case, MPI-IO calls with explicit offset functions are used.

Some clarifications:

PnetCDF keeps two MPI file handles internally:
- One is for use in collective data mode. This file handle is created using the MPI communicator passed into ncmpi_create() or ncmpi_open().
- The other is for use in independent data mode. This file handle is created using MPI_COMM_SELF.

When switching between collective and independent data modes, PnetCDF will select the right handle to call MPI-IO functions.

Why must PnetCDF call MPI_File_set_view in the collective data mode?
- Setting an MPI fileview allows PnetCDF to let MPI-IO to carry out non-contiguous requests
- PnetCDF calls MPI_File_set_view before and after a call to a collective MPI-IO function. The after is to clear the view to make the entire file visible.
- The after is necessary because some PnetCDF APIs may write to file header, for example the root process to update the number of records in the file. Without reseting the fileview, the root process may not be able to see the file header (at least for the current implementation of PnetCDF, i.e. 1.5.0).
- MPI_File_set_view is collective, i.e. all processes must participate to call.
- A possible optimization to avoid the second MPI_File_set_view is to construct root's fileview data type, so the entire header is always visible to the root. However, this will change all MPI collective I/O calls used in PnetCDF from the ones without explicit offset to the one with explicit offset, e.g. change from MPI_File_write_all to MPI_File_write_at_all.

Why must PnetCDF call MPI_File_set_view in the independent data mode?
- Setting an MPI fileview allows PnetCDF to let MPI-IO to carry out non-contiguous requests.
- PnetCDF calls MPI_File_set_view before and after a call to an independent MPI-IO function. The after is to clear the view to make the entire file visible.
- The MPI_File_set_view after MPI-IO function may not be necessary, but is implemented in 1.5.0. Fortunately, since the independent file handle is created using MPI_COMM_SELF, no communication will occur. Removing the second call is in a to-do list.

Why can't we just reset the file view in ncmpi_begin_indep_data?
- Because we would like to use the wonderful feature of MPI-IO that sets the fileview for carrying out non-contiguous requests.
- Every PnetCDF independent API calls may access different subarray locations and/or variables, so the fileviews can be different.

Return to PnetCDF Home