REDOC II.2 - PRISM coupler

06/05/2002 - Version 1.0
S. Valcke, CERFACS


Summary

After introducing some general coupling concepts, the following document gives a list of possible requirements and design options for the PRISM coupler. These requirements will clearly not be all fulfilled and these options will clearly not be all implemented in the future PRISM coupler. This exhaustive list of possible requirements and options will help the coupler developers to identify the relevant functionalities for the future PRISM coupler and to establish a list of priorities for the next 3-year developments. A review of existing couplers and coupling applications is then presented in the last section.


Outline

Part I. Introduction - Definitions
  1. Static, dynamic, or interactive coupled simulation
  2. Possible coupling relations between two components

Part II. Possible requirements and design options for the PRISM coupler
  1. General requirements
  2. Driver requirements and design options
  3. Transformer requirements and design options
      3.1 List of possible transformations and other requirements
      3.2 Design options for the Transformer location
      3.3 Design options for the Transformer parallelisation
  4. PRISM System Model Interface Library & Data Exchange library requirements and design options
      4.1 PSMILe general requirements
      4.2. Data Exchange Library requirements
      4.3 An important design option: a common Model Interface Library for I/O and coupling data

Part III. Coupler review
  1. OASIS
  2. Palm
  3. MpCCI
  4. Calcium
  5. CCSM Coupler 6
  6. Distributed Data Broker
  7. Flexible Modeling System
  8. Coumehy

Annexe I - Technical advantages and disadvantages of a dynamic Driver


Part I. Introduction - Definitions

This section introduces some general coupling concepts. The nature of a coupled simulation is first discussed; it can be static, dynamic, or interactive. The possible coupling relations between two component models is then analysed.

1. Static, dynamic, or interactive coupled simulation

An important concept relates to the possibility given, or not, to the  coupling parameters to evolve during the coupled simulation. The coupling parameters include:

Different options can be defined: a coupled simulation can be static, dynamic or interactive (with respect to the process management, or with respect to the coupling exchange characteristics, or with respect to the coupling field characteristics). All coupling parameters are fixed initially and do not change during the whole simulation. All information given by the models (coupling field units, grid, partitioning, etc.) or prescribed by the user (components, coupling fields, coupling frequencies, etc.) is defined only once initially. The component model processes and their corresponding rank and location (processor, node) are fixed from the beginning to the end of the simulation.


2. Possible coupling relations between two components

The coupler will co-ordinate the execution of several major climate component models. It is firstly important to analyse the coupling relations that can exist between any two of these components.

Two components can be sequential by nature or concurrent by nature. It is also possible to force two components sequential by nature to run concurrently, or two components concurrent by nature to run sequentially; we will refer to two components having one of these relations respectively as concurrent by construction, or sequential by construction.

 

Part II. Possible requirements and design options for the PRISM coupler

The main  constituents of the coupler are: the Driver, the Transformer, and the PRISM System Model Interface Library (PSMILe) which interfaces the model with the rest of the coupled system, and therefore includes the Data Exchange library (DEL).

1. General requirements

2. Driver requirements and design options

The Driver manages the whole coupled application. It may launch the component models, monitor their execution and termination, orchestrate the exchanges of coupling data, centralise and distribute simulation parameters which require a consistent definition among all component models, and centralize and distribute information on the component model status during the simulation.

A design option is to decentralize the coupling functionalities as much as possible in the Data Exchange library included in the different model interface libraries and in the Transformer, and therefore to reduce as much as possible the role of the Driver. This option is probably applicable only for  static coupled simulations and allows an easier evolution toward heterogeneous coupling (different component models running on different machines).

Model execution and control:


Information management:

Coupling exchange management: Termination and restart: 3. Transformer requirements and design options

The Transformer performs on the coupling data all transformations required between two component models.

3.1 List of possible transformations and other requirements

The Transformer may provide the following transformations:
3.2 Design options for the Transformer location 3.3 Design options for the Transformer parallelisation
In the case the option of performing point-wise and local transformations directly in the PSMILe is chosen, the full parallelisation of the local transformation routines is required as the PSMILe will in some cases be linked to fully parallel component models.

For the separate Transformer entity performing non-local transformations, the following parallelisation options are possible:

Pseudo-parallelisation:

Between any two models, there are more than one separate sequential Transformer processes, each one treating an equal number of fields exchanged in both directions between any two models. This approach may ensure a better load balance but implies that each process has to calculate and store the information required for the transformation, e.g. the interpolation matrix of weights and addresses, in both directions. Furthermore, it also implies that each model information has to be duplicated in the different related Transformer processes. 4. PRISM System Model Interface Library & Data Exchange library requirements and design options   The PRISM System Model Interface Library (PSMILe) is the set of routines implemented in the model code to interface it with the rest of the PRISM System (other component models or additional coupling processes).

4.1 PSMILe general requirements

The possible requirements specific to the PSMILe are the following: 4.2. Data Exchange Library requirements

The Data Exchange library (DEL) performs the exchanges of coupling data between the component models, or between the component models and the separate Transformer entity. The DEL must therefore be included as the most external layer in the PSMILe.

The possible characteristics of the coupling data exchanges are:

  • "End-point" data exchange: when producing coupling data, the source model does not know what other model will consume it; when asking for coupling data a target model does not know what other model produces it.
  • The coupling data  can be of different types: integer, real, character, 1D-2D-3D-xD arrays, structures, operators, functions, ...).
  • The coupling data are exchanged but also possibly on their metadata, i.e. the description of the data (e.g. units, grid coordinates, mask, distribution, ...).
  • The coupling fields characteristics, and therefore the associated metadata, may change over time as the simulation develops (grid, resolution, ...)
  • Coupling data produced by a source model can be consumed by more than one target model.
  • Coupling data produced by one model may be only partially consumed by the target model; extraction of subspaces, hyperslabs or indexed grid points may be required before the exchange.
  • Different coupling data produced by one model may have to be combined before the exchange.
  • Algebraic operations may have to be performed on the coupling data before the exchange.
  • Coupling data produced by a source model can be consumed by the target model at the different frequency  (i.e. one "put" will not necessarily match one "get" -time integration/interpolation will be required).
  • Occurrence of the exchange can be different for the different coupling fields.
  • Occurrence of exchange is flexible (exchange can occur at a fixed frequency, at different pre-defined timesteps, on given dates of a physical calendar -Julian, Gregorian, ...-, etc.).
  • Coupling data produced from one model at a particular time may be required as input coupling data for another model at another time.
  • Occurrence of the exchange is not necessarily defined initially by the user; it can depend on parameters dynamically calculated during the simulation (conditional occurrence).
  • Exchange points can be placed anywhere in the source and target code and possibly at different location for the different coupling fields.
  • The exchange can occur directly between two component models without going through additional coupling processes.
  • When the component models are parallel and have different data partitioning, repartitioning associated to direct communication is required; all type of distributions usually used in model component codes are supported. In a static coupled simulation, the characteristics of the repartitioning required between any two component models are fixed, while in a  PM or CF dynamic coupled simulation, they may change during the simulation.
  • Other specific requirements are:
  • Data exchange implementation:

  • The DEL offers efficient data exchange implementations for loose and strong coupling. Loose coupling is the configuration in which the two component models are run sequentially or concurrently as two separate executables. Strong coupling is the configuration in which the two component models are run within the same executable.
     
  • I/O and access to data files:

  • In some cases, input coupling data will not be provided by another model but should be read into a file indicated by the user in the coupling configuration file. This should be transparent for the component model and managed automatically.
    The format of these data files could be a standard PRISM fixed format. At a later stage, different formats could be supported for these data files; this would imply that the instance reading the file can interpret their content.
    For parallel component models, the I/O library will have to address the parallel I/O issue. One option is simply to avoid parallel I/O by doing regional selection for input data and by doing postprocessing operation after a simulation to recombine multiple output files provided by the parallel execution. MPI-IO is another option. Finally a third option is to set up a dummy application or I/O demon, which just acts as data source by reading the file and behaves like a regular model with respect to the coupled system. This last option is particularly interesting when the data present in the file need transformation, interpolation or repartitioning before being used by the model, and therefore is particularly interesting for parallel models. It is also interesting from the performance point-of-view if the I/O demon can perform the  access to disk concurrently with the model calculations. However, it supposes that a Driver and an external I/O demon are active even for a component model running in a totally uncoupled mode.
     
  • Matching of output and input coupling data from different component models

  • As discussed above, the DEL could perform, for static simulations, the matching between output coupling data produced by one model and input coupling data requested by another model. The matching may be based on the user's choices indicated in the configuration file, or may be done automatically when there is only one matching possibility. For dynamic simulations, information coming from the Driver is required.
    4.3 An important design option: a common Model Interface Library for I/O and coupling data

    I/O and coupling data present many common characteristics and should therefore, in principle, share a common Model Interface Library. It should be evaluated further if this ideal concept can in fact be implemented without too many constraints.

    The following list of characteristics shared by both I/O and coupling data was established:

  • Data requested or made available by a model. Some data may be I/O and coupling data at the same time.
  • For available data, not all will be effectively delivered by the model. For each particular simulation, the user has to activate some of them externally through a configuration file created with a GUI or any other mean.
  • Data for which the "end-point data exchange" principle is applicable. The model itself does not know where the data come from or where  they go to. The source/target models (for coupling data) or the source/target files (for I/O data) are defined externally by the user for each particular simulation.
  • Data for which transformations may be required. These transformations are prescribed externally by the user.
  • Some data required from another model in the coupled mode, may in fact be forcing data read directly from files. In that case,  the coupling library is faced with the same parallel I/O and metadata interpretation difficulties.
  • The following list of differences was established:
  • List of coupling fields is generally smaller that the list of diagnostic output .
  • One PRISM objective is to define a standard physical coupling interface between any two components, i.e. the nature of the coupling fields exchanged; standardisation of the nature of the diagnostic output will be much more limited.
  • Some local transformations required for I/O may not be required for coupling, and vice-versa.
  • I/O may require more or different metadata to be transferred from the model.
  • I/O data needs some mechanism to translate metadata given by the model into CF-style description. This is required for coupling data only if the coupler is asked to generate its own coupling diagnostics files.

  •   Part III. Coupler review

    This section surveys existing couplers or coupling applications, targeted or not to climate:

  • OASIS from CERFACS
  • Palm from CERFACS
  • MpCCI from FhG-SCAI
  • Calcium from EDF
  • CCSM Coupler 6 from NCAR
  • Distributed Data Broker from UCLA
  • Flexible Modeling System from GFDL
  • Coumehy from LTHE and IDRIS
  • For additional information on projects targeting or involving coupling aspects, on potential underpinning technologies, and on model developments related to coupling, the reader is invited to consult the document entitled "Met Office FLUME project  - Model Coupling Review" (http://www.metoffice.com/research/interproj/flume/1_d3_r8.pdf), written by R. W. Ford and G. D. Riley from the University of Manchester.
     

    1. OASIS (http://www.cerfacs.fr/globc/software/oasis/oasis.html)

    OASIS is the coupler developed at CERFACS, primarily designed for coupling climate models, which will be the base of the PRISM coupler developments.

    The initial work on OASIS began in 1991 when the ``Climate Modelling and Global Change'' team at CERFACS was commissioned to build up a French Coupled Model from existing General Circulation Models (GCMs) developed independently by several laboratories (LODYC, CNRM, LMD). Quite clearly, the only way to answer these specifications was to create a very modular and flexible tool.

    OASIS is a complete, self consistent and portable set of Fortran 77, Fortran 90 and C routines. It can run on any usual target for scientific computing (IBM RS6000 and SPs, SPARCs, SGIs, CRAY series, Fujitsu VPP series, NEC SX series, COMPAQ, etc.). OASIS can couple any number of models and exchange an arbitrary number of fields between these models at possibly different coupling frequencies. All the coupling parameters for OASIS (models, coupling fields, coupling frequencies, etc.) of the simulation are defined by the user in an input file namcouple read at run-time by OASIS.  Each component model of the coupled system remains a separate, possibly parallel, executable and is unchanged with respect to its own main options (like I/O or multitasking) compared to the uncoupled mode. OASIS handles only static simulations, in the sense that all component models are started from the beginning and run for the entire simulation. The models need to include few low-level coupling routines to deal with the export and import of coupling fields to/from OASIS.

    The main tasks of OASIS are:

  • Communication between the models:

  • To exchange the coupling fields between the models and the coupler in a synchronized way, four different types of communication are included in OASIS. In the PIPE technique, named CRAY pipes are used for synchronization of the models and the coupling fields are written and read in simple binary files. In the CLIM technique, the synchronization and the transfer of the coupling data are done by message passing based on PVM 3.3 or MPI2. In particular, this technique allows heterogeneous coupling. In the SIPC technique, using UNIX System V Inter Process Communication possibilities, the synchronization is ensured by semaphores and shared memory segments are used to exchange the coupling fields. The GMEM technique works similarly as the SIPC one but is based on the NEC global memory concept.
     
  • Transformation and interpolation of the coupling fields:

  • The fields given by one model to OASIS have to be processed and transformed so that they can be read and used directly by the receiving model. These transformations, or analyses, can be different for the different fields. First a pre-processing takes place which deals with rearranging the arrays according to OASIS convention, treating possible sea-land mismatch, and correcting the fields with external data if required. Then follows the interpolation of the fields required to go from one model grid to the other model grid. Many interpolation schemes are available: nearest neighbour, bilinear, bicubic, mesh averaging, Gaussian. Additional transformations ensuring for example field conservation occur afterwards if required. Finally, the post processing puts the fields into the receiving model format.


    2. Palm (http://www.cerfacs.fr/globc/PALM_WEB/)

    The PALM project, currently underway at CERFACS, aims to provide a coupler allowing a modular implementation of a data assimilation system. In this system, a data assimilation algorithm is split up into elementary "units" such as the observation operator, the computation of the correlation matrix of observational errors, the forecast model, etc. PALM ensures the synchronization of the units and drives the communication of the fields exchanged by the units and performs elementary algebra if required.

    This goal has to be achieved without a significant loss of performances if compared to a standard data assimilation implementation. It is therefore necessary to design the PALM software in view of the following objectives and constraints:

  • modularity: PALM provides a mechanism for synchronization of pre-defined functional units that can be executed in sequence, in concurrence, or in a mix of these two modes. One key aspect of PALM is also that dynamic execution of the units (i.e. units can be launched and stopped at any point during the simulation) or conditional executions of the units are allowed. PALM also performs the required exchange of information between these units.
  • portability: PALM aims to run on all the existing high-performance platforms and, if possible, on the next generation super-computers. This effort of "clairvoyance" can be accomplished only through the adoption of standard.
  • performances: PALM will be used in two modes: research and operational. The research mode will be used for the design of new algorithms and will prioritise the flexibility; on the contrary, the operational mode will work with a fixed configuration of the algorithm and will prioritise the performances optimisation and the monitoring of the process.
  • PALM is a very flexible and efficient, but somewhat complex, tool. For PRISM, it remains to be evaluated if this flexibility, and associated complexity, are required for coupled climate modelling.
     

    3. MpCCI (http://www.mpcci.org/)

    The Mesh-based parallel Code Coupling Interface (MpCCI) is a coupler written for multidisciplinary applications by the Fraunhofer Gesellschaft Institute for Algorithms and Scientific Computing (FhG SCAI). MpCCI enables different industrial users and code owners to combine different simulation tools. MpCCI can be used for a variety of coupled applications like fluid-structure, fluid-fluid, structure-thermo, fluid-acoustics-vibration, but was not explicitly designed for geophysical applications.

    MpCCI is based on COCOLIB developed during the CISPAR project, funded by European Commission, and on GRISSLi-CI developed during the GRISSLi project, funded by the German Federal Ministry for Education and Research. MpCCI is not an open source product, but the compiled library offering basic functionality can be downloaded for free from the web site; special agreements apply for add-on features like e.g. more sophisticated interpolation schemes..

    MpCCI is written in C++ and is based on MPI1. MpCCI is mainly a parallel model interface library which provides the usual coupling functionality: 1- the interpolation of the coupling fields (including the neighbourhood search and calculation of weights and addresses), and 2- the exchange of coupling data between the codes (including data repartitioning when required). MpCCI also consists of a separate control process which performs only a simple monitoring the different codes, as MpCCI handles only static couplings.

    The coupling is performed by placing MPI like sending and receiving instructions in the coupled codes. MpCCI does not fully adhere to the principles of "end-point data exchanges" as each model has to know the target/source of its sending/receiving instructions. However, each code simply works with its own local mesh and needs no specific knowledge of the other code characteristics.

    As MpCCI is based on MPI1, heterogeneous is supported as long as the MPI1 implementations of the different platform implied in the coupling allows it.
     

    4. Calcium (http://www.irisa.fr/orap/Publications/Forum8/berthou.pdf)

    Calcium is a coupler of codes developed by Electricite De France (EDF), and written, as MpCCI, for multidisciplinary applications. Calcium ensures the exchanges of coupling data between the codes in a synchronized way. The exchanges are based on PVM and heterogeneous coupling is allowed. Furthermore, Calcium automatically performs the temporal interpolation of the coupling data when the sending frequency of the source code does not match the receiving frequency of the target code. Calcium is used by about 10 different research or industrial groups mainly in France and is implemented in about 20 codes.
     

    5. CCSM Coupler 6 (http://www.ccsm.ucar.edu/models/cpl6)

    The Next Generation Coupler (NGC) - also called cpl6- is the coupler being currently developed at NCAR, for the next version of the Community Climate System Model (CCSM), in the framework of the Accelerated Climate Prediction Initiative (ACPI) Avant Garde Project. They have performed the requirement capture, have described a design and are presently in the development phase.

    The main characteristics of the NGCc are:

  • The coupler is written in F90 and explicitly designed to couple four models: atmosphere, land, ocean, sea ice.  No flexibility concerning the number of models or their nature is included. Due to the nature of these components, exchange of 2D fields only is supported. These four models can run concurrently, sequentially or in a mix of these two strategies. Each component and the coupler are separate executables (MPMD paradigm).
  • The coupler can run in parallel decomposed into an arbitrary number of processors, and supports the following type of parallelism: pure shared-memory, pure message-passing, and hybrid parallelism incorporating threading on multiprocessor nodes and message passing between the nodes.
  • The transformations performed by the coupler include interpolation (conservative remapping using the SCRIP library), merging of coupling data originating from multiple source grids, time-accumulation and time averaging, diagnostic computing, and writing of history data sets, and also computing of certain interfacial fluxes between components. This choice was made as the fluxes need to be calculated at the higher resolution AND at the higher required frequency which may be characteristics belonging to different models.
  • All coupling data exchanges are performed with MPI. Parallel exchange of coupling fields and repartitioning is possible. However, all coupling field exchanged between any two components have to go through the coupler where the transformations are performed; direct component to component exchanges are not allowed.
  • One can note here that the goal of achieving efficient vector processing performance was not identified as a mandatory requirement for the NGC.

    6. UCLA Distributed Data Broker (http://www.atmos.ucla.edu/~drummond/DDB/)

    The Distributed Data Broker (DDB) is a software tool designed to handle distributed data exchanges between the UCLA Earth System Model (ESM) components. These components are: Atmosphere General Circulation Model, Atmospheric Chemistry Model, Ocean General Circulation Model, Ocean Chemistry Model, and are run as separate executables. The DDB is composed of the Registration Broker (RB) and of three libraries linked to the component models: the Model Communication Library (MCL), the Communication Library (CL), and the Data Translation Library (DTL).

    The Registration Broker is a process that collects model information from the models initially. The RB is only active at the beginning of the coupled run; thus, any model process implied in the coupling can take this role and later resume with normal model operation. The DDB follows the "end-point data exchange" principle, which they call the producer-consumer paradigm. In the registration phase, the different models register their "production" and "consumption" of coupling data and RB performs the appropriate matching which will be effective at run-time.

    The MCL contains a set of callable routines that are used by the different component models to register at the beginning of a run, and perform the exchanges of data at run-time.

    The CL is a set of routines used by the DDB to manage data exchanges based on the communication libraries available on the computer platforms. At run-time, the model producing the data sends the data directly to the consuming model at a given time interval; the consuming model will later receive the data at a rate dictated by its internal computations. Heterogeneous coupling is allowed as long as the communication libraries available on the computer platforms support it.

    The DTL transforms data in a given grid domain to the domain of the requesting model. This library will include routines ranging from simple linear interpolation to high order data translation routines
     

    7. GFDL Flexible Modeling System (FMS) (http://www.gfdl.noaa.gov/~fms/)

    The design of the FMS is geared toward coupled climate models running as a single executable. The component models that can be included into the FMS are atmosphere, land, ocean and ice models. In the FMS, the coupler is the main program which drives the components models. To interact, these components communicate only with the coupler. They may be on different grids and have different data decompositions and the coupler manages the transformations required between them. Recently, some parallelisation concepts were experimented on the component models themselves, using abstract parallel dynamical kernels: the parallelism is in fact built into basic operators invoked in the model, including arithmetic operators as well as differential operators such as curl, div, grad and laplacian.

    8. COUMEHY (contact: C. Messager, messager@hmg.inpg.fr)

    COUMEHY is a ungoing French coupling project, involving the "Laboratoire des Transferts en Hydrologie et Environnement", "Hydrosciences Montpellier", the MEVHySA team from the "Institut de Recherche pour le Developpement" from Montpellier, and the "Institut du Developpement et des Ressources en Informatique Scientifique". The objective of this scientific and technical project is to couple one atmospheric model to different hydrological models running on different platforms, in order to evaluate the importance of coupling processes between atmospheric and continental hydrological cycles, in the climate global change perspective. The inter-operability of different codes running on different machines was in that case a strong requirement: the choice was therefore made to base the communication on CORBA (Common Object Request Broker Architecture).

     

    Annexe I

    Technical advantages and disadvantages of a dynamic Driver




    Technical advantages and disadvantages of a dynamic Driver with respect to the process management (PM dynamic Driver) are discussed here. A PM dynamic Driver may technically be required for the following reasons:

  • To avoid waste of computing resources:

  • Two component models, being two different executables, are run sequentially. In a static configuration, when the first model runs, the other is simply waiting, and vice-versa, this alternation occurring at each coupling timestep. For computing platforms on which the Operating System (OS) cannot efficiently swap the waiting model, this results in a waste of computing resources.

    To avoid this waste, a dynamic Driver could launch each model in turn at the coupling frequency. This option could especially be considered for models or coupling processes having a comparatively small computing load as illustrated below. This option implies that the temporal coupling loop is controlled by the Driver. The disadvantage is that the restart procedure of the model and the loading of appropriate data have to be done each time.

         
  • To run a coupled model which total memory is greater then available:

  • The memory required for a coupled model including all PRISM components may be greater than the total memory available, but not all models may be not active at the same time. Once again the static option is sufficient if an efficient OS swap functionality exists on the machine, but if this is not the case, a dynamic Driver, launching the components when required, may be useful. As above, the disadvantage is that the restart procedure of the model and the loading of appropriate data have to be done each time; furthermore, it will probably be difficult in most cases to predict precisely which component will be active at the same time or at different times.
  • If the Driver has to manage dynamically its own buffering processes:

  • If message passing is used to exchange the coupling data, and if two components exchanging a high number of 3D fields are run sequentially or simply not well synchronised, the messages will pile up into the message passing mailbox which capacity may then be exceeded. In that situation, the Driver may have to manage dynamically its own buffering processes. However, on most platforms, the message passing  mailbox can be as large as the available memory; if this is the cases, this third argument is not longer valid.