The OASIS Coupler Forum

  HOME

OASIS simulation abort due to GSSPOS CONSERV method error

Up to Specific issues in real coupled models

Posted by Anonymous at July 27 2023

Dear support team,
I am running an AWICM3 coupled simulation (OIFS-43R3+OASIS-MTC4+FESOM2) in the TC0319L137-DART configuration on the DKRZ machine Levante. The simulation is successfully running for 5 months and then crashing in the middle of the sixth month with the message:

(oasis_init_comp) OASIS RUNNING
(oasis_init_comp) OPEN debug file for pe, unit :       0    9999
 (oasis_advance_map) ERROR: gsspos sumdst is zero but sumsrc is not
 (oasis_abort) ABORT: file      = mod_oasis_advance.F90
 (oasis_abort) ABORT: line      =         1996
 (oasis_abort) ABORT: on model  = oifs
 (oasis_abort) ABORT: on global rank =         5760
 (oasis_abort) ABORT: on local  rank =            0
 (oasis_abort) ABORT: CALLING ABORT FROM OASIS LAYER NOW

I tried to resubmit the simulation with a slight change in the ocean parameters but the crash happens anyway, in the same day.

Do you have any suggestion for this issue?

Thank you, Martina

Posted by Anonymous at July 28 2023

Hi Martina,

I suppose that you are asking OASIS to perform a global CONSERV GSSPOS on your coupling field, right? In that case, the coupling field is integrated on the source grid before interpolation and on the target grid after interpolation, without considering values of masked points, and the residual (target - source) is uniformly distributed on the target grid proportionally to the value of the original field as a multiplicative term; with GSSPOS, the multiplicative term is computed separately for positive and negative values of the field.

Here it looks like the integral of your target field on the target grid equals zero (and this is not allowed as the calculation of the multiplicative term involves a division by that integral). This seems very weird to me that the integral of your coupling field after interpolation = 0 after 5 months. It would happen right away if it was a problem with the interpolation per se, I would say. 

So I would suppose that something is going wrong with your simulation which leads to that symptom and message? Did you check the other results of your simulation after 5 months/ Do they look reasonable?

Sorry for not being of much help here ...
  Sophie

Posted by Anonymous at August 1 2023

Hi Sophie, 

Thank you for your answer. 
Both the ocean and the atmosphere show physical results for the first months.
The same issue additionally happened in an other simulation after the year 2100 (so after more than 300 years of simulation with reasonable results).
That is the reason why we excluded that something could be wrong with the simulation, also when resubmitting the simulation but using the GLBPOS method instead of GSSPOS in both cases didn't result in the crash. 

Is there maybe any data I can provide you that could be useful to check at that?
 
Thank you again, 
Martina

Posted by Anonymous at August 2 2023

Hi again Sophie, 

while trying to figure out how to solve the problem we considered, given that in the gsspos method the conservation of the fields in is performed separately for the positive and negative fluxes, what would happen if a field negative/positive component switches the sign going from the source to the target field (in the sense that would be a non zero integral on the source field becoming then a zero integral on the target field)? 
Could that be what is happening here?

Martina

Posted by Anonymous at August 2 2023

Hi Martina,

I am not sure I can do anything specific about your problem. I suppose that GSSPOS can be more tricky than GLBPOS (thanks for testing that) as the sum is computed separately for positive and negative values of the field. In your case, I suppose that the sum of one part of the field (the positive or the negative) becomes zero, hence the message and the abort, even if of course the integral of the whole field is not zero.
I would recommend to go on using GLBPOS instead of GSSPOS for the rest of the simulation.
A detailed report on the different options was written by Tony at 
https://cerfacs.fr/wp-content/uploads/2019/09/GLOBC-Craig_oasis_map_conserv_092019.pdf
please have a look!
  With best regards,
 Sophie

Posted by Anonymous at August 2 2023

I've been following this discussion. I think both GSSPOS and GLBPOS have critical failure modes. 

GSSPOS, it seems, can lead to a divide by zero, when a single point on the source grid is negative, while all other points are positive (or vice-versa). After the remapping and associated smoothing, it is possible, perhaps even likely, that none of the target grid points will be positive. Thus the sum_neg over the target grid is zero. The resulting fraction by which the target gridpoint values are multiplied would be sum_neg(src)/sum_neg(trg) = val/0. This error is captured by oasis terminate, as seen above.

GLBPOS has it's critical failure mode when the total sum over the source and target grids are both close to zero. This can happen with fields like sensible heat flux that can be positive or negative. In this case with lim(sum(trg)->0) the fraction sum(src)/sum(trg) goes to infinity. Resulting in every gridpoint of the flux being multiplied by a huge value, leading to bad model results or a model explosion.

Mercifully, these two failure modes occur at opposite ends of total (not sign split) sum(src) over fields. GLBPOS fails when we have equal parts positive and negative values. GSSPOS fails when all but one value is of the same sign. I think it should be possible to create a fallback option for GSSPOS, where instead of terminating the simulation, we could use GLBPOS for a single timestep, and print out a warning. 

What do you think?

Best, Jan Streffing

Posted by Anonymous at August 3 2023

Hi,

I totally shares Jan analysis. If GSSPOS positive or negative sums are zero, it means that the positive or negative source grid points are few (and lost during the interpolation, I guess).

So the fallback option proposed seems relevant. A simple implementation could be to save the first (positive) sums and add them to the negative ones in case of problem (after checking that the new positive+negative sum on the destination grid is not zero).

Eric  Maisonnave
Reply to this