The OASIS Coupler Forum

  HOME

MPMD with Oasis3-mct

Up to Specific issues in real coupled models

Posted by Anonymous at February 12 2014

Hi, 

First thank you for the amazing Oasis3-mct. We are a terrestrial systems research team in IBG3, Research Center Juelich, Germany. Currently we are trying to extend our CLM-Parflow-Cosmo-Oasis3-(mct) program to MPMD, on our Bluegene/Q system JUQUEEN. Do you think the effort is more close to "simply change the MPI_COMM_WORLD in oasis_init_comp to the splitted communicator, passed by the model" or "you need to rewrite the whole Oasis3-mct"? Put aside the possible name conflicts problem, could you kindly give us some hint that if the domain decomposition / rank / communicator management in Oasis3-mct is suitable for MPMD? 

Many thanks! Guowei HE g.he@fz-juelich.de

Posted by Anonymous at February 14 2014

Hi Guowei,

OASIS3-MCT is designed to run in MPMD. When coupling models using the coupler, there must be one executable by model.

In a coupled model, MPI_COMM_WORLD becomes the communicator with all the processors of the different models and indeed you have to replace the original MPI_COMM_WORLD of your models by the local communicator returned by oasis_get_localcomm (see the User Guide https://oasis.cerfacs.fr/wp-content/uploads/sites/114/2021/02/GLOBC-Valcke_TR_OASIS3-MCT_2.0_2013.pdf section 2.2.2).

Best regards,Laure

Posted by Anonymous at February 18 2014

Hi Laure, 

Thanks for the reply! 

Sorry I haven't stated the question correctly. We are doing data assimilation, which requires several sets of coupled models running under one MPI_COMM_WORLD.

For example, in tutorial we have model1 and model2. But in our case, we need several sets of model1 and model2, indepently running simutaneously. We just had a brute force hack: changing every MPI_COMM_WORLD in oasis_init_comp subroutine to a new comm argument, which is passed by our MPI_SPLIT_COMM to the subroutine.

Our initial test seems to be working: For example, we had 16 procs, rank 0-3 model1 communicates with 4-7 model2, and rank 8-11 model1 with 12-15 model2. They produce same result as a nproc_exe1=4, nproc_exe2=4 in the original run_tutorial does. But do you think this approach works for more complicated case? We also notice that oasis still calls MPI_COMM_WORLD in the inter_comm subroutine. More over, we are concerned about if MCT also directly manipulate the MPI_COMM_WORLD directly. 

Many thanks! /Guowei

Posted by Anonymous at February 19 2014

Hi Guowei, 

There is a simpler way to define a new communicator for each sub-group of processes of each model. In model1 and model2, you must define a different comp_name for each sub-group of processes and then call oasis_init_comp with each comp_name. 

Let me know if it is better.

Best regards, Laure
Reply to this