01/07/2002 - Version 1.1
S. Valcke, CERFACS
Summary
The coupler drives the whole coupled model, ensuring the synchronisation of the different component models and the exchange of the coupling fields directly between the components or via additional coupling processes. When needed, the coupler performs transformations on the coupling fields. Another important part of the coupler is the model interface library linked to each component model which interfaces it to the rest of the coupled model. As I/O and coupling data share many characteristics, it was decided to develop one common model library for both purposes.
The different constituents of the PRISM coupler and I/O library are therefore the Driver, the Transformer, and the PRISM System Model Interface Library (PSMILe). The PSMILe includes the Data Exchange Library, which performs the exchanges of coupling data, the I/O library, and some coherence check and local transformation routines.
In this paragraph, the PRISM coupled model high level architecture is first presented. The functionalities of each constituent and their priority of development are detailed in the second section.
Outline
1. Coupled model high level architecture
2. Detailed functionalities for the PRISM coupler and I/O library
2.1. General requirements
2.2. Driver functionalities
2.3. Transformer functionalities and parallelisation
2.4. PSMILe functionalities
2.4.0 PSMILe general characteristics
2.4.1 Data Exchange Library
(DEL) functionalities
2.4.2 I/O library
2.4.3 Coherence check routines
2.4.4 Local transformation
routines
1. Coupled model high level architecture
An overview of a coupled model is presented here. The following graphical view details the different parts of the system :
The elements of a coupled model are the following:
The different elements a coupled model are detailed hereafter by describing the three basic phases of its construction and execution:
In the definition phase, the different elements of the coupled system are prepared:
Each component model has to include specific PRISM System Model Interface
Library (PSMILe) instructions that will allow the component model to interact
with the rest of the PRISM System at run-time. The PSMILe, represented
here by the blue and purple squares, includes the Data Exchange Library,
which performs the exchanges of coupling data directly between the component
models or between the component models and other coupling processes, the
I/O library, and some coherence check and local transformation routines.
For each model, the Potential Model Input and Output Description (PMIOD)
describes the relations the model is able to establish with his external
environment. The PMIOD contains a short description of the model, its grids,
and the list of all data requested or produced by that particular component
model and their description (the metadata). The input and output data can
be divided into 3 categories.
Each model and its respective PMIOD is made available to potential
PRISM users by the model administrator.
Note: one code may implement different interfaces under certain conditions.
But one PMIOD is not related to one implementation (one executable)
B - Composition phase![]()
All input files containing data required for the run have to be generated.
- The coupler Driver and Transformer separate entity:
![]()
The Driver, which monitors the whole coupled simulation, and the Transformer separate entity, which performs required transformation on the data, have to be available.
In the composition phase, a particular user assembles a particular coupled model.
C - Deployment phaseSelection of component models:
The user first chooses the component models he wants to couple for one particular experiment.
Input file selections
The user selects the input files containing information that will be used during the simulation, such as forcing fields.
Driver and Transformer separate entity selection:
The user selects the PRISM Driver and Transformer separate entity.
Constitution of each model Specific Model Input and Output Configuration (SMIOC): ![]()
Based on each model PMIOD, the user generates for each model a Specific Model Input and Output Configuration (SMIOC). The SMIOC describes the relations the model will effectively have with his external environment through inputs and outputs for a specific experiment.
For transient input and output variables, the user may decide that a particular data will 1- have no role in the simulation, 2- be read from a file or written to a file (I/O data), or 3-be exchanged between to component models (coupling data). For I/O data, the user indicates in the SMIOC, the name(s) of the respective file(s), the input or output frequency, and possibly the local and non-local transformations required on the data (see 2.3). For coupling data, the user just refers to the Specific Coupling Configuration (SCC).
Note: The Transformer will never perform directly the reading/writing of a transient input/output variable in/to a disk file. Pseudo-model performing the I/O will have to be written for transient input or ouput I/O variables requiring transformation(s) to be performed by the Transformer. These I/O variables will therefore become coupling variables exchanged between pseudo-models and "real" models, and will be treated as such. Like a "real" model, each pseudo-model will come with its PMIOD. The only case for which the Transfomer may be asked to performed directly an I/O is to read a field that it will combine with a standard coupling field. The separation coupling-variable-info-in-SCC vs I/O-variable-info-in-SMIOC is therefore OK.For restart variables, the user is only allowed to indicate the name of the restart file and, possibly, the restart saving frequency. However, this last parameter should have the same value for all component models and should therefore be treated as a universal parameter in the SCC.
The value of persistent input parameters are read at run-time in the SMIOC. The user may be allowed to change the default value therein. For persistent input parameters which are also universal parameters, the value taken into consideration is the one indicated in the SCC.
Constitution of the Specific Coupling Configuration (SCC): The user constitutes only one Specific Coupling Configuration (SCC) for each particular coupled model simulation. The SCC centralises the description of all activated coupling fields and all related coupling parameters chosen by the user (source and target models, coupling frequencies, local and non-local transformations, etc.) for one particular experiment. The SCC also contains the universal parameters prescribed by the user.
SL 13/06/02: However, parameters like gravitational constant are also called universal constants. Perhaps I am wrong? If not: For these parameters I prefer a f90-module solution where they are declared either by '
REAL PARAMETER :: g=9.81 or by '
REAL :: g=9.81
depending on the case, and where this module MUST (coding rule) be USEd in the model (or IS USEd in PRISM_Init). This of course means that recompiles are necessary when we
calculate the climate on jupiter. However, to pass e.g. g by the coupler is overdoing to my feeling.These universal parameters will be present in each model SMIOC and in the SCC. In stand-alone mode, the appropriate PSMILe function will automatically read these information in the SMIOC. In non stand-alone mode, the PSMILe will automatically request the information from the Driver who will have read the information in the SCC. One universal parameter is the initial date of the simulation. Another universal parameter is the initial date of the run, as one simulation may have to be split into many runs to fit the limits of he job queuing system. The PRISM System will automatically chain the different runs and will automatically change the value of the initial date of the run in the SCC or in the SMIOC. Note that this is different from what was concluded at the wp4b meeting in May.
It is proposed that the PMIOD, SMIOC, and the SCC containers be implemented as XLM files.
Note: Some deployment information will have to be in the SCC.
See also http://www.cerfacs.fr/PRISM/MTCI/notes_PSMILe_V1.html
SV 28/10/2002:
Usually one code represents one component model (ocean, biogeo, atmos, ice, land, atmos chemistry) and will come with one PMIOD.
However two component models (e.g. ocean + biogeo) could be assembled into one code which would them come with two PMIODs. In that case, there will be two SMIOCs and each component model would have to call prism_init_comp with its own "model_name" as argument.
It could even be that two components, formally defined as such in PRISM, be assembled into one code which would come with only one PMIOD in which all Potential Input and Output of both components would be described. In that case, there will be one related SMIOC and all processes of that code would have to call the prism_init_comp with the same "model_name" as argument. In other words, a prism_init_comp with one particular "model_name" refers to one particular SMIOC.
As soon as coupling relation(s) have to be established between different PMIODs, whether these PMIODs are associated to the same code or not, a SCC will be created and the PRISM Driver will be required.
If forcing data need interpolation by the Transformer before being used by a component model, then a pseudo-model reading these data will have to be present; in this case, the component model and the pseudo model will come with their PMIOD which will be used to write the SMIOCs and one SCC (the Driver is required). If the forcing data need no interpolation, the PSMILe will read directly the data in the files. In this case, there is only one PMIOD, and
therefore one SMIOC (no SCC nor the Driver are required.)
We could therefore have the following situations (and many others, of course!):
-One code, two component models, two PMIODs -> two SMIOCs, no coupling relation between the component, therefore no SCC, no Driver.
-One code, two component models, two PMIODs -> two SMIOCs, coupling relations between the components, therefore a SCC and a Driver.
At run-time, the different parts of the system will play different roles. A more detailed description of the functionalities of each constituent is presented in part 2.
The Driver: launches the component models, monitor their execution and termination. The Transformer separate entity T: performs required transformations on the I/O and coupling data.
The PRISM System Model Interface Library (PSMILe):
The PSMILe includes the Data Exchange Library, which performs the exchanges of coupling data directly between the component models or between the component models and the separate Transformer entity, the I/O library, and some coherence check and local transformation routines. At run-time, specific PSMILe instructions will perform the following actions:Initialisation:
Metadata declaration and initialisation:Declaration of PSMILe internal data structure. Message passing initialisation. I/O initialisation. Initialisation of persistent input parameters, read directly in the SMIOC. Initialisation of universal parameters, either received from the Driver or read directly in the SCC.
Declaration of transient and restart variables:Definition of the metadata describing input or output data (for example the grid coordinates, mesh areas, mask, partitioning), and definition of associated identificators.
Sending and receiving of dataAssociation to the relevant metadata identificators (see below). Access to user-defined data information: for each data declaration, the PSMILe consults the SMIOC and identifies the user's choice for that particular experiment (coupling or I/O data , input or output frequency, source and target models, source or target file, transformations, etc.)
The actions performed by the PSMILe below each sending or receiving instruction depend on the user's choices read in the declaration phase in the SMIOC and in the SCC: the library may simply return, or perform local transformations, and/or perform the exchanges between the models , and/or perform the reading or writing into files, etc.Coupling termination
All actions related to finalizing the run.
2. Detailed functionalities for the PRISM coupler and I/O library
As detailed above, the different constituents of the PRISM coupler are: the Driver; the Transformer; and the PRISM System Model Interface Library (PSMILe), linked to the component models and which interfaces the component model with the rest of the coupled model. The PSMILe includes the Data Exchange Library, the I/O library, and some coherence check and local transformation routines.
For each of these constituents, the list of possible requirements established in the REDOC II.2 paragraph was revised and choices of functionalities that should be implemented in the different versions of the PRISM coupler were made, considering the answers to the REDOC I.4 template. These choices are detailed below.
For each functionality, a priority of implementation is given: "1" means that the functionality should be provided for the PRISM coupler first version (D3a1, month 12), "2" for PRISM coupler second version to be used in the demonstration runs (D3a2, month 24), and "3" means that the functionality may be provided for the PRISM coupler final version (D3a3, month 36).
Note: See also REDOC I.3.2
2.1. General requirements
The overhead associated to the global system modularity and flexibility is acceptable. (2, 3) The whole system is portable and efficient on the different hardware architectures used for climate modelling, on dedicated or shared hardware resources. Standard and portable solutions should be preferred. However, for critical issues for which a portable solution would not exist or would lead to very low efficiency, machine dependent options could be offered. (3) The design and implementation lead to code easy to maintain and can be easily modified to support future model or coupling functionalities. (2, 3) Design reflects a clear separation of responsibilities for the different parts of the coupler. (2, 3) The PRISM System infrastructure can be used to technically assemble a coupled system based on any component models, even if these models do not conform to the PRISM physical interfaces given that they include the PRISM System Model Interface Library. (1, 2, 3) The PRISM System infrastructure can be used to couple an arbitrary number of component models; any component can be one-way or two-way coupled with any other component. (1, 2, 3)
2.2. Driver functionalities
The Driver manages the whole coupled application. It launches the component models, monitor their execution and termination, centralise and distribute universal parameters which require a consistent definition among all component models, and centralize and distribute information on the component model status during the simulation.
The driver could keep a central role during the whole simulation and manage also the exchanges of coupling data. The preferred design option here is to decentralize the coupling functionalities as much as possible in the Data Exchange Library and in the Transformer, and therefore to reduce as much as possible the role of the Driver. This option is probably applicable only for static coupled simulations and allows an easier evolution toward heterogeneous coupling (different component models running on different machines).
As detailed below, the choice of a static Driver was also made. The workload of a static driver is likely to be small, even more if the decentralizing option is followed. The Driver could be one separate process used only for it, but could also sit in one separate coupling process used also for the separate Transformer entity, or even could be part of the PSMILe master process of a master model started by the user initially. The first two options are still open regarding the Driver implementation.
Model execution and control:
see also plan_inter_230402.html
This paragraph first gives some definitions. In the second section, the preferred design options for the PRISM coupler Transformer location and parallelisation are presented. In the third section, an exhaustive list of transformations and grids on which these transformations should be performed is presented, together with other specific requirements, and associated priority and calendar.
2.3.1 - Definitions
Location for the different types of transformationsThe preferred design option is the one in which non-local transformations are performed in the separate Transformer entity (T), as they require information coming from different models. Point-wise and local transformations will be workable in the PRISM Model Interface Library (PSMILe) linked to the model before sending or after receiving the data. However, point-wise and local transformations will also be available in the separate Transformer entity T, for example, to combine coupling fields coming from different source models after their interpolation on the target grid.
The same rules apply for two component models assembled into one executable: all point-wise and local transformations will be performed directly in the PSMILe, while the data will have to be treated by the separate Transformer entity T if non-local transformations are required. This last case however is not likely to happen, as two components assembled into one executable will in most cases share the same grid and same partitioning.
Ideally, the choice of whether the transformation is performed by the PSMILe or in the separate Transformer entity T should be decided automatically by the coupler and this should be transparent for the user (3).
Note 04/2002: For coupling variables, the list of transformations will be listed in the SCC by the user. Non-local transformation will necessarily be performed by the Transformer. Local transformations can be performed in the PSMILe or by the Transformer; for each simulation, the Driver will decide where they will be performed, based on efficiency criteria.
Transformer parallelisation
As detailed above, the transformation routines included in the PSMILe will perform local transformations, and not only point-wise transformations; their full parallelisation is therefore required when the PSMILe is linked to a fully parallel component model (3).
Non-local transformations will be performed in the separate Transformer entity T. Different options of parallelisation are possible. The "one-executable full parallelisation" option presented in REDCO II.2, Section 3.3 is the preferred one (3). A fall back solution would be a simpler parallelisation of the separate Transformer entity T as one executable with openMP.
Note on I/O 04/2002: The Transformer will never read/write directly a transient input/output variable in/to a disk file. Pseudo-model performing the I/O will have to be written for transient input or ouput I/O variables requiring transformation(s) to be performed by the Transformer. These I/O variables will therefore become coupling variables exchanged between pseudo-models and "real" models, and will be treated as such. Like a "real" model, each pseudo-model will come with its PMIOD. The only case for which the Transfomer will perform directly an I/O is to read a field that will be combined with a standard coupling field.2.3.3 - List of transformations, grids, and associated priority and calendar
List of transformations
A list of relevant transformations is given hereafter. For each transformation, it is specified whether the transformation is "point-wise", "local" or "non-local".
All these transformations are non-local.
The following grids should be supported for the above scheme. These grids have the following common characteristics:
2D grids
The following paragraph gives the priority of development (1, 2, or 3 -the meaning of each number is given in the introductory paragraph of section 2) for the different transformations on the different grids listed above. When two numbers are given, it means that parts of the functionality will be provided for the respectives coupler versions.
Transformations on 2D scalar coupling fields
| H1 - lat-lon | H2 - log. rect. | H3 - reduced | H4 - unstruc. | |
| S1 - near.neigh |
|
|
|
|
| S2 - Gaussian |
|
|
|
|
| S3 - 1st O interp. |
|
|
|
|
| S4 - 2nd O interp. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| S8 - conservation |
|
|
|
|
| S9 - combination |
|
|
|
|
| S10 - masking |
|
|
|
|
| S11 - scattering |
|
|
|
|
| S12 - gathering |
|
|
|
|
| S13 - collapse |
|
|
|
|
| S14 - subspace |
|
|
|
|
| S15 - algebra |
|
|
|
|
| S16a - 1st O extrap. |
|
|
|
|
| S16b - 2nd O extrap. |
|
|
|
|
| T1 - time operation |
|
|
|
|
2.4. PSMILe functionalities
The PRISM System Model Interface Library (PSMILe) is the set of routines implemented in a component model code to interface it with the rest of the coupled model. The classes of PSMILe instructions that will be invoked in the component model code at run-time are described in Section 1, C - Deployement phase. Here, the functionalities of the different PSMILe constituents (i.e. the Data Exchange Library, the I/O library, and the coherence check and local transformation routines) are presented in mode details.
2.4.0 PSMILe general characteristics
2.4.1 Data Exchange Library (DEL) functionalities
The Data Exchange library (DEL) performs the exchanges of coupling data between the component models, or between the component models and the separate transformation entity. The DEL must therefore be included as the most external layer in the PSMILe.
Data transfer between separate processes will be implemented using the message passing interface MPI, which is a widely used and portable standard. MPI implementations completely supporting the MPI standard are available for every architecture used by the climate modelling community either as open source public domain code or as proprietary software optimised and installed on high performance computer system. Furthermore MPI is best suited for the close coupling between separate processes, as in climate system modelling, since individual MPI implementations are designed to use the most efficient network on a specific architecture.
Since all parallel climate model codes support communication via MPI the introduction of alternative approaches like CORBA requires additional software like Fortran ORBs. Another possibility is wrapping the Fortran codes using a C++ ORB which can require major changes to the involved Fortran codes as well. (For experiences gained with wrapping Fortran code see http://accl.grc.nasa.gov/IPG/CORBA/wrap_fortran.html).
In addition, alternative approaches such as CORBA handle data transfer via TCP/IP, which is not well suited for a fast and efficient parallel data transfer. MPI processes in contrast may communicate simultaneously without interfering the communication of other processes, while the same kind of communication will cause conflicts on a TCP/IP connection. Transfer rates between two processes can differ by a factor 10^5 to 10^6 when comparing CORBA with MPI. Furthermore a complete CORBA standard is not available for every architecture.
The DEL detailed functionalities are:
For the glossary: a "partition" is the set of grid points treated by one process. One partition may be composed of one or several segments, or of one or several blocks.
Each model process will initially define (or re-define during the run) its grid and partition by calling one PRISM_DefineBlock2D (or PSMILe_def_horigrid) function. The partition will be described in a vector of integers, indexList. PRISM_DefineBlock2D (or PSMILe_def_horigrid) will return a handle for the grid and partition, gridId (or hori_grid_id).
Now the question is: which type(s) of partition should be supported and therefeore possibly described in indexList.
In OASIS, 3 types of partitioning are supported (see documentation p.49-50-51,
http://www.cerfacs.fr/globc/software/oasis/doc_oasis2.4.ps):
0) Serial: no partitioning
1) Apple: each process partition is composed of one segment
2) Box: each process partition is composed of one block
3) Orange: each process partition is composed of many segments.
In Palm, 2 types of partitioning are supported (see http://www.cerfacs.fr/globc/PALM_WEB/doc/external/QIKGUIDE/qikgu260.html):
0) SINGLE_PROC: no partitioning
1) scaLAPAK REGULAR: each process partition is composed of many
blocks of same shape
2) CUSTOM: each process partition is composed of many blocks,
each one having possibly its own shape.
I think that Rene and I agreed that the PSMILe should at least support the CUSTOM partition, including a description of each block halo (this is not included in Palm). This means that one element of indexList has to be the number of blocks; the description of each block, given by the tupple "i-halo-low,i-halo-high,j-halo-low,j-halo-high,i-comp-low,i-comp-high, j-comp-low,j-comp-high" will then be the other elements of indexList. (If more than one type of partition is supported, the first element of indexList should be the type of partition.)
One detail: I think that there is no need to associate one handle per block, as practically all blocks of one partition will be transfered at once. If this is not the case, more than one grid/partition could be defined by one model process (each grid/partition would then be composed of the blocks beeing transfered at once)..
Should other types of partition be supported?
Now, concerning repartitioning (transfering the data between two models which partitionings do not match):
I would like to draw your attention, that all calculations needed for repartitioning between Apple <-> Serial, Box <-> Serial, Orange <-> Serial are already in Oasis (easy, I know!). More important: all calculations needed for repartitioning between two CUSTOM partitionings that do not match are already in Palm (not so easy!). These calculations are described in the document that you will find here attached. Please note that this document was written some time ago and that things slightly evolved since then but it can give you a first overview.
2.4.2 I/O library
The I/O library performs the exchanges with files stored on disk. Activated variables, regional selection, temporal and geographical transformation, and file names are chosen by the user in the SMIOC (see Section 1, B - Composition phase). Metadata and run time information are provided at run-time by the component model through the PSMILe. The data and the associated metadata will be read or written to the disk files.
For data access, calls to the NetCDF (http://www.unidata.ucar.edu/packages/netcdf/) library will be implemented. Support of formats other than NetCDF will not be implemented, but entry points for reading and writing other file formats will be provided.
Execution on parallel machines will have to work efficiently. MPI-IO is the standard solution and will be evaluated. In a first step, we will avoid parallel I/O by doing regional selection for input data and by doing postprocessing operation after a simulation to combine multiple outputs files provided by a parallel execution.
SL, 29/04/2002: Several runs make up an experiment and I would require that the partitioning of the experiment into runs is transparent to the user. This is in addition to the fact that the content of all output files be independent of the partitioning (restartability of all components). I consider it desirable e.g., that the 'output period' of data contained in one model raw output file (when it is archived) (and not the length of a run) is part of the experiment definition. This is easy to accomplish with GRIB data (just concatenate when an output period' is finished). With NetCDF is is possible, but more complicated. This can be important for the ease of use of the data diagnostics.
Restart files: -If the user wants to restart its coupled model after an unforeseen termination, he/she will have to look up in the "restart log file" the last date for which all model restart were saved and indicate it as the starting date in the coupled model configuration file (SCC). At the beginning of each run (yes, I think "run" is appropriate here!), the driver automatically reads the initial date in the SCC and transfers this information to the model. When the model calls the PSMILe routine that reads in the restart, this date could be used to check if it reads the correct restart (the date of the restart could be in the file name, but more preferabily in the file header, so that a check of coherence is possible when reading the file).
2.4.3 Coherence check routines
The PSMILe will perform some checks of coherence on coupling and I/O data, according to a coherence check level defined by the user in the Specific Coupling Configuration (SCC) file. The coherence check instance will:
2.4.4 Local transformation routines
The PSMILe will perform the following transformations locally. The priority for including these local transformations in the PSMILe is given here :
2.4.5 Error handler
3. Model requirements