The OASIS Coupler Forum

  HOME

Constant memory increase

Up to Specific issues in real coupled models

Posted by Anonymous at September 21 2018

Hi,

I wondered if anyone had ever come across any suggestion that there might be a memory leak in OASIS3-MCT (I'm specifically using version 3.0). We have an issue where the memory usage (or at least the resident set size) continually grows during a model run until the model is unable to allocate any more memory and crashes.

I have replicated this in a toy model and it seems that the act of calling oasis puts and gets causes the memory to increase (and never decrease). When data is actually exchanged between models the memory increases faster than when the puts and gets are called on a timestep when no coupling is performed, but they both still cause the memory to increase to some extent. So I'm wondering if there might be something in the OASIS3-MCT code which could account for this, or if there might be something in the underlying MPI libraries which could be responsible (I'm running on a Cray XC40) or if there might be something fundamentally wrong with the way our operating system is managing memory.

Any thoughts appreciated. Thanks, Richard

Posted by Anonymous at September 22 2018

Hi Richard,

We never heard of such a problem in OASIS3-MCT_3.0.

You have information on the memory usage in each routine if you put NLOGPRT=1 or NLOGPRT=2 in your namcouple (see the User Guide https://oasis.cerfacs.fr/wp-content/uploads/sites/114/2021/02/GLOBC-TR-oasis3mct_UserGuide3.0_052015.pdf, section 3.2).

Best regards, Laure

Posted by Anonymous at October 15 2018

After extensive investigations, it turns out that part of the problem is due to a Cray compiler bug (at least on Cray XC40 and probably on other similar machines).

In mod_oasis_advance.F90, there's a perfectly legal format definition as follows: character(len=*),parameter :: F01 = '(a,i3.3)' F01 is used in various places to perform internal write operations e.g. write(tstring,F01) 'pcpy_',cplid. It seems that every time this write statement is called, our heap memory increases by 1088 bytes. Putting this statement in an ~infinite loop in a simple test program, the heap grows until the program runs out of memory and crashes. Changing the WRITE statement to use an explicit format and repeating the test: write(tstring,'(a,i3.3)') 'pcpy_',cplid the heap never increases and the test program runs indefinitely.

Neither form of this code seems to be a problem with other compilers (e.g. IFORT), only Cray compilers seem to be affected. I'm told that this was reported to Cray in 2015 but is still awaiting a solution. Having changed the form or the write statement in our local build of OASIS3-MCT, although things are slightly improved, it does not solve all our memory problems so we obviously have further issues to resolve.
Reply to this