ARCDI 2.2 - PRISM coupler and I/O library

06/05/2002 - Version 1.0
S. Valcke, CERFACS


Summary

The coupler drives the whole coupled model, ensuring the synchronisation of the different component models and the exchange of the coupling fields directly between the components or via additional coupling processes. When needed, the coupler performs transformations on the coupling fields. Another important part of the coupler is the model interface library linked to each component model which interfaces it to the rest of the coupled model. As I/O and coupling data share many characteristics, it was decided to develop one common model library for both purposes.

The different constituents of the PRISM coupler and I/O library are therefore the Driver, the Transformer, and the PRISM System Model Interface Library (PSMILe). The PSMILe includes the Data Exchange Library, which performs the exchanges of coupling data, the I/O library, and some coherence check and local transformation routines.

In this paragraph, the PRISM coupled model high level architecture is first presented. The functionalities of each constituent and their priority of development are detailed in the second section.


Outline

1. Coupled model high level architecture
2. Detailed functionalities for the PRISM coupler and I/O library
  2.1. General requirements
  2.2. Driver functionalities
  2.3. Transformer functionalities and parallelisation
  2.4. PSMILe functionalities
        2.4.0 PSMILe general characteristics
        2.4.1 Data Exchange Library (DEL) functionalities
        2.4.2 I/O library
        2.4.3 Coherence check routines
        2.4.4 Local transformation routines


1. Coupled model high level architecture
 

An overview of a coupled model is presented here. The following graphical view details the different parts of the system :

The elements of a coupled model are the following:

As detailed in the REDOC document paragraph II.2 section 4.3, I/O data, i.e. data coming from or going to disk, and coupling data, i.e. data coming from or going to another model, share many characteristics, and it was therefore decided to develop one common model library for both purposes. Both types of data are concerned by the present high level architecture.

The different elements a coupled model are detailed hereafter by describing the three basic phases of its construction and execution:

A - Definition phase

In the definition phase, the different elements of the coupled system are prepared:

All input files containing data required for the run have to be generated.

The Driver, which monitors the whole coupled simulation, and the Transformer separate entity, which performs required transformation on the data, have to be available.

B - Composition phase

In the composition phase, a particular user assembles a particular coupled model.

  • Selection of component models:

  • The user first chooses the component models he wants to couple for one particular experiment.
     
  • Input file selections

  • The user selects the input files containing information that will be used during the simulation, such as forcing fields.
     
  • Driver and Transformer separate entity selection:

  • The user selects the PRISM Driver and Transformer separate entity.
     
  • Constitution of each model Specific Model Input and Output Configuration (SMIOC):

  • Based on each model PMIOD, the user generates for each model a Specific Model Input and Output Configuration (SMIOC). The SMIOC describes the relations the model will effectively have with his external environment through inputs and outputs for a specific experiment.

    For transient input and output variables, the user may decide that a particular data will 1- have no role in the simulation, 2- be read from a file or written to a file (I/O data), or 3-be exchanged between to component models (coupling data). For I/O data, the user indicates in the SMIOC, the name(s) of the respective file(s), the input or output frequency, and possibly the local and non-local transformations required on the data (see 2.3). For coupling data, the user just refers to the Specific Coupling Configuration (SCC).

    For restart variables, the user is only allowed to indicate the name of the restart file and, possibly, the restart saving frequency. However, this last parameter should have the same value for all component models and should therefore be treated as a universal parameter in the SCC.

    The value of persistent input parameters are read at run-time in the SMIOC. The user may be allowed to change the default value therein. For persistent input parameters which are also universal parameters, the value taken into consideration is the one indicated in the SCC.
     

  • Constitution of the Specific Coupling Configuration (SCC):
  • The user constitutes only one Specific Coupling Configuration (SCC) for each particular coupled model simulation. The SCC centralises the description of all activated coupling fields and all related coupling parameters chosen by the user (source and target models, coupling frequencies, local and non-local transformations, etc.) for one particular experiment. The SCC also contains the universal parameters prescribed by the user.

    It is proposed that the PMIOD, SMIOC, and the SCC containers be implemented as  XLM files.
     

    C - Deployment phase

    At run-time, the different parts of the system will play different roles. A more detailed description of the functionalities of each constituent is presented in part 2.

  • The Driver: launches the component models, monitor their execution and termination.

  •  
  • The Transformer separate entity T: performs required transformations on the I/O and coupling data.

  •  
  • The PRISM System Model Interface Library (PSMILe):

  • The PSMILe includes the Data Exchange Library, which performs the exchanges of coupling data directly between the component models or between the component models and the separate Transformer entity, the I/O library, and some coherence check and local transformation routines. At run-time, specific PSMILe instructions will perform the following actions:

    Initialisation:

  • Declaration of PSMILe internal data structure.
  • Message passing initialisation.
  • I/O initialisation.
  • Initialisation of persistent input parameters, read directly in the SMIOC.
  • Initialisation of universal parameters, either received from the Driver or read directly in the SCC.

  •  
    Metadata declaration and initialisation:
     
  • Definition of the metadata describing input or output data (for example the grid coordinates, mesh areas, mask, partitioning), and definition of associated identificators.

  •  
    Declaration of transient and restart variables:
     
  • Association to the relevant metadata identificators (see below).
  • Access to user-defined data information: for each data declaration, the PSMILe consults the SMIOC and identifies the user's choice for that particular experiment (coupling or I/O data , input or output frequency, source and target models, source or target file, transformations, etc.)

  •  
    Sending and receiving of data
     
    The actions performed by the PSMILe below each sending or receiving instruction depend on the user's choices read in the declaration phase in the SMIOC and in the SCC: the library may simply return, or perform local transformations, and/or perform the exchanges between the models , and/or perform the reading or writing into files, etc.
    Coupling termination
     
    All actions related to finalizing the run.



    2. Detailed functionalities for the PRISM coupler and I/O library

    As detailed above, the different constituents of the PRISM coupler are: the Driver; the Transformer; and the PRISM System Model Interface Library (PSMILe), linked to the component models and which interfaces the component model with the rest of the coupled model. The PSMILe includes the Data Exchange Library, the I/O library, and some coherence check and local transformation routines.

    For each of these constituents, the list of possible requirements established in the REDOC II.2 paragraph was revised and choices of functionalities that should be implemented in the different versions of the PRISM coupler were made, considering the answers to the REDOC I.4 template. These choices are detailed below.

    For each functionality, a priority of implementation is given: "1" means that the functionality should be provided for the PRISM coupler first version (D3a1, month 12),  "2" for PRISM coupler second version to be used in the demonstration runs (D3a2, month 24), and "3" means that the functionality may be provided for the PRISM coupler final version (D3a3, month 36).


    2.1. General requirements

  • The overhead associated to the global system modularity and flexibility is acceptable. (2, 3)
  • The whole system is portable and efficient on the different hardware architectures used for climate modelling, on dedicated or shared hardware resources. Standard and portable solutions should be preferred. However, for critical issues  for which a portable solution would not exist or would lead to very low efficiency, machine dependent options could be offered. (3)
  • The design and implementation lead to code easy to maintain and can be easily modified to support future model or coupling functionalities. (2, 3)
  • Design reflects a clear separation of responsibilities for the different parts of the coupler. (2, 3)
  • The PRISM System infrastructure can be used to technically assemble a coupled system based on any component models, even if these models do not conform to the PRISM physical interfaces given that they include the PRISM System Model Interface Library. (1, 2, 3)
  • The PRISM System infrastructure can be used to couple an arbitrary number of component models; any component can be one-way or two-way coupled with any other component. (1, 2, 3)

  • 2.2. Driver functionalities
     

     The Driver manages the whole coupled application. It launches the component models, monitor their execution and termination, centralise and distribute universal parameters which require a consistent definition among all component models, and centralize and distribute information on the component model status during the simulation.

    The driver could keep a central role during the whole simulation and manage also the exchanges of coupling data. The preferred design option here is to decentralize the coupling functionalities as much as possible in the Data Exchange Library and in the Transformer, and therefore to reduce as much as possible the role of the Driver. This option is probably applicable only for static coupled simulations and allows an easier evolution toward heterogeneous coupling (different component models running on different machines).

    As detailed below, the choice of a static Driver was also made. The workload of a static driver is likely to be small, even more if the decentralizing option is followed. The Driver could be one separate process used only for it, but could also sit in one separate coupling process used also for the separate Transformer entity, or even could be part of the PSMILe master process of a master model started by the user initially. The first two options are still open regarding the Driver implementation.

    Model execution and control:

    Information management: Coupling exchange management: Termination and restart: 2.3. Transformer functionalities and parallelisation

    This paragraph first gives some definitions. In the second section, the preferred design options for the PRISM coupler Transformer location and parallelisation are presented. In the third section, an exhaustive list of transformations and grids on which these transformations should be performed is presented, together with other specific requirements, and associated priority and calendar.

    2.3.1 - Definitions

    2.3.2 - Preferred design options for Transformer location and parallelisation
     
    Location for the different types of transformations

    The preferred design option is the one in which non-local transformations are performed in the separate Transformer entity (T), as they require information coming from different models. Point-wise and local transformations will be workable in the PRISM Model Interface Library (PSMILe) linked to the model before sending or after receiving the data. However, point-wise and local transformations will also be available in the separate Transformer entity T, for example, to combine coupling fields coming from different source models after their interpolation on the target grid.

    The same rules apply for two component models assembled into one executable: all point-wise and local transformations will be performed directly in the PSMILe, while the data will have to be treated by the separate Transformer entity T if non-local transformations are required. This last case however is not likely to happen, as two components assembled into one executable will in most cases share the same grid and same partitioning.

    Ideally, the choice of whether the transformation is performed by the PSMILe or in the separate Transformer entity T should be decided automatically by the coupler and this should be transparent for the user (3).

    Transformer parallelisation

    As detailed above, the transformation routines included in the PSMILe will perform local transformations, and not only point-wise transformations; their full parallelisation is therefore required when the PSMILe is linked to a fully parallel component model (3).

    Non-local transformations will be performed in the separate Transformer entity T. Different options of parallelisation are possible. The "one-executable full parallelisation" option presented in REDCO II.2, Section 3.3 is the preferred one (3). A fall back solution would be a simpler parallelisation of the separate Transformer entity T as one executable with openMP.


    2.3.3 - List of transformations, grids, and associated priority and calendar

    List of transformations

    A list of relevant transformations is given hereafter. For each transformation, it is specified whether the transformation is "point-wise", "local" or "non-local".

    List of grids

    The following grids should be supported for the above scheme. These grids have the following common characteristics:

    Other specific requirements Priority and calendar

    The following paragraph gives the priority of development (1, 2, or 3 -the meaning of each number is given in the introductory paragraph of section 2) for the different transformations on the different grids listed above. When two numbers are given, it means that parts of the functionality will be provided for the respectives coupler versions.

    Transformations on 2D scalar coupling fields
    H1 - lat-lon H2 - log. rect. H3 - reduced H4 - unstruc.
    S1 - near.neigh
    1
    1
    1
    1
    S2 - Gaussian
    1
    1
    1
    1
    S3 - 1st O interp.
    1
    1
     3
     -
    S4 - 2nd O interp.
    1
    1
     3
     -
    S5 - 1st O cons rem
    1
    1
    1
    1
    S6 - 2nd O cons rem
     3
     3
     3
     3
    S7 - user remapping
    1
    1
    1
    1
    S8 - conservation
    1
    1
    1
    1
    S9 - combination
    1
    2
    2-3
    2-3
    S10 - masking 
    1
    1
    1
    1
    S11 - scattering
    2
    2
    2
    2
    S12 - gathering
     2
     2
     2
     2
    S13 - collapse
    2
    2
    2-3
    2-3
    S14 - subspace
     2
     2
     3
     3
    S15 - algebra
    1-2
    1-2
    1-2
    1-2
    S16a - 1st O extrap.
    1
    1
    1
    -
    S16b - 2nd O extrap.
    2
    2
    3
    -
    T1 - time operation
    2-3
    2-3
    2-3
    2-3

    2.4. PSMILe functionalities

    The  PRISM System Model Interface Library (PSMILe) is the set of routines implemented in a component model code to interface it with the rest of the coupled model. The classes of PSMILe instructions that will be  invoked in the component model code at run-time are described in Section 1, C - Deployement phase. Here, the functionalities of the different PSMILe constituents (i.e. the Data Exchange Library, the I/O library, and the coherence check and local transformation routines) are presented in mode details.

    2.4.0 PSMILe general characteristics


    2.4.1 Data Exchange Library (DEL) functionalities

    The Data Exchange library (DEL) performs the exchanges of coupling data between the component models, or between the component models and the separate transformation entity. The DEL must therefore be included as the most external layer in the PSMILe.

    Data transfer between separate processes will be implemented using the message passing interface MPI, which is a widely used and portable standard. MPI implementations completely supporting the MPI standard are available for every architecture used by the climate modelling community either as open source public domain code or as proprietary software optimised and installed on high performance computer system. Furthermore MPI is best suited for the close coupling between separate processes, as in climate system modelling, since individual MPI implementations are designed to use the most efficient network on a specific architecture.

    Since all parallel climate model codes support communication via MPI the introduction of alternative approaches like CORBA requires additional software like Fortran ORBs.  Another possibility is wrapping the Fortran codes using a C++ ORB which can require major changes to the involved Fortran codes as well.  (For experiences gained with wrapping Fortran code see http://accl.grc.nasa.gov/IPG/CORBA/wrap_fortran.html).

    In addition, alternative approaches such as CORBA handle data transfer via TCP/IP, which is not well suited for a fast and efficient parallel data transfer. MPI processes in contrast may communicate simultaneously without interfering the communication of other processes, while the same kind of communication will cause conflicts on a TCP/IP connection. Transfer rates between two processes can differ by a factor 10^5 to 10^6 when comparing CORBA with MPI. Furthermore a complete CORBA standard is not available for every architecture.

    The DEL detailed functionalities are:

    2.4.2 I/O library

    The I/O library performs the exchanges with files stored on disk. Activated variables, regional selection, temporal and geographical transformation, and file names are chosen by the user in the SMIOC (see Section 1, B - Composition phase). Metadata and run time information are provided at run-time by the component model through the PSMILe. The data and the associated metadata will be read or written to the disk files.

    For data access, calls to the NetCDF (http://www.unidata.ucar.edu/packages/netcdf/) library will be implemented.  Support of formats other than NetCDF will not be implemented, but entry points for reading and writing other file formats will be provided.

    Execution on parallel machines will have to work efficiently. MPI-IO is the standard solution and will be evaluated. In a first step, we will avoid parallel I/O by doing regional selection for input data and by doing postprocessing operation after a simulation to combine multiple outputs files provided by a parallel execution.

    2.4.3 Coherence check routines

    The PSMILe will perform some checks of coherence on coupling and I/O data, according to a coherence check level defined by the user in the Specific Coupling Configuration (SCC) file. The coherence check instance will:


    2.4.4 Local transformation routines

    The PSMILe will perform the following transformations locally. The priority for including these local transformations in the PSMILe is given here :

  • Combination of different data produced by one model (S9) (2).
  • Masking (S10) (2).
  • Scattering  (S11) (3).
  • Gathering (S12) (3).
  • Collapsing (S13) (3).
  • Subspace (S14) (3).
  • Algebraic operations (S15) (2).
  • 1st and 2nd order extrapolation (S16a, S16b) (3).
  • Time integration, average, variance, extrema, linear interpolation (T1) (2).

  •