The OASIS Coupler Forum

  HOME

Wrong maxloops in lib/psmile/src/mod_oasis_mpi.F90

Up to Bugs and debugs

Posted by Anonymous at August 20 2020

Hello .

=> in lib/psmile/src/mod_oasis_mpi.F90: the maxloops computation is wrong !!!

!>-----------------------------------------------

Log2 reduction of linp over tasks to root :

maxloops = int(sqrt(float(commsize+1)))+1

do m = 1,maxloops

kfac = 2**m

recvid = commrank + kfac/2 ! task to recv from if (mod(commrank,kfac) /= 0

!>........................................................

It must be calculated as a Log base 2 value, like this :

maxloops = int(log(float(commsize+1))/log(2.))+1

=> The bug is apparent for high number of core. For example for commsize=988 -> maxloops=32 -> kfac = 2**32 = 0 as is a 32bit integer . ... and mod(commrank,kfac) , give a divide by zero error

Juan ESCOBAR MésoNH , Support Team

Posted by Anonymous at September 16 2020

Thanks for pointing this out. This will be fixed soon. The original SQRT implementation is not ideal, but as long as the integer math does not overflow, the results will be identical. The original implementation should have leveraged the log base 2 calculation.

Posted by Anonymous at September 21 2020

Hi,

Sorry for jumping in, but the results are not identical.

Try calculating by hand for commsize=(4**n)-1

n commsize current escobar

1 3 3 3

2 15 5 5

3 63 9 7

4 255 17 9

5 1023 33 11

This assumes that int(sqrt(float(4**n))) is 2**n, and int(log(float(4**n))/log(2.)) is 2*n. Rounding may affect the result.

All the best, Paul Leopardi

Posted by Anonymous at September 22 2020

Sorry if I was unclear. The value of maxloops will not be identical, but the way the model behaves will be identical. Within the maxloops loop, there are checks to skip values of m that are larger than needed. So maxloops could also be set to the machine largest int (huge(i)) and the model would get the same result. So maybe what I should have written is "as long as the integer math does not overflow, the model results will be identical". I think the point is that this is a bug fix, it does correct that computation of maxloops, but it does not change answers. Thanks.

Posted by Anonymous at October 12 2020

This fix was merged onto the Oasis3-MCT master in September, 2020. 
Thanks for bringing it to our attention.
Reply to this