The OASIS Coupler Forum

  HOME

Pb MPI_GATHER and MPI_COMM_SPLIT in oasis_init_comp

Up to Bugs and debugs

Posted by Anonymous at March 8 2018

Dear all,

I am facing a MPI problem when using OASIS3-MCT with AROME/SURFEX and a toymodel. In oasis_init_comp: 

call MPI_GATHER(oasis_coupled, 1, MPI_LOGICAL, coupledlist, 1, MPI_LOGICAL, 0, MPI_COMM_WORLD, ierr) -->ok

call MPI_GATHER(compnm, ic_lvar, MPI_CHARACTER, compnmlist, ic_lvar, MPI_CHARACTER, 0, MPI_COMM_WORLD, ierr) -->waiting undefinitely...

After reducing the length of the model names to gather, with:

character(len=LEN_TRIM(cdnam)),pointer :: mynmlist( : ) 

allocate(mynmlist(mpi_size_world))

call MPI_GATHER(TRIM(cdnam), LEN_TRIM(cdnam), MPI_CHARACTER, mynmlist, LEN_TRIM(cdnam), MPI_CHARACTER, 0, MPI_COMM_WORLD, ierr)

if (mpi_rank_world == 0) compnmlist( : )=ADJUSTL(mynmlist( : )) ... --> It works and it goes further! (but it can not stay this way)

So, now, always in oasis_init_comp, I have the following error:

Fatal error in PMPI_Comm_split: Message truncated, error stack: PMPI_Comm_split(552)................: MPI_Comm_split(MPI_COMM_WORLD, color=1, key=0, new_comm=0x9b7030) failed PMPI_Comm_split(529)................: MPIR_Comm_split_impl(268)...........: MPIR_Get_contextid_sparse_group(941): MPIR_Allreduce_impl(1240)...........: MPIR_Allreduce_intra(1048)..........: MPIDI_CH3U_Receive_data_found(131)..: Message from rank 1 and tag 14 truncated; 2052 bytes received but buffer size is 2048 that comes from: … ikey = 0 icolor = compid call MPI_COMM_SPLIT(MPI_COMM_WORLD,icolor,ikey,mpi_comm_local,ierr) -->ok ikey = 0 icolor = 1 if (.not.oasis_coupled) icolor = 0 call MPI_COMM_SPLIT(MPI_COMM_WORLD,icolor,ikey,mpi_comm_global,ierr) -->error …

I am working on bullx (beaufix) in Météo-France. My compilation options follow…

Thanks for your help, Cindy

------------------------------------
WRAPDIR = ~martinezs/mkpack/support/wrapper F90 = $(WRAPDIR)/impi-5.0.0.028_ifc-15.0.0.090 #mpiifort F90FLAGS_1 = -c -convert big_endian -assume byterecl -align array64byte -traceback -fpic -qopenmp -qopenmp-threadprivate compat -fp-model source -qopt-report=5 -qopt-report-phase=vec -ftz -g -O2 -xAVX -finline-functions -finline-limit=500 -Winline -fast-transcendentals -list -r8 CC = $(WRAPDIR)/impi-5.0.0.028_icc-15.0.0.090 #mpiicc CCFLAGS_1 = -c -qopenmp -qopt-report=2 -qopt-report-phase=vec -fpic

Posted by Anonymous at March 14 2018

Dear all,

In fact, it appears that the issue comes from a strong incompatibility between the MPI version used as library during the compilation and for the mpirun binary on one side, and the packages loaded by default on the other side.

The correct commands 'module load ...' before the execution now solve this problem.

The codes must be compiled with the same MPI librairie version but must also be executed with the same MPI librairie version that was used to compile when using mpirun.

Thank you, Cindy
Reply to this