3.3 Software engineering
3.3.1 Open MP (F. Loercher, S. Champagneux, L. Giraud)
A possible parallelisation with OpenMP in elsA has two major advantages
to the existing MPI parallelisation:
-
The number of processors used could be chosen
independantly of the number of blocks.
- With less blocks used, the global numerical behaviour
improves.
In this work a strategy to parallelize elsA efficiently with OpenMP has been
elaborated in collaboration with the "Parallel Algorithms" team. Some of the most
CPU-time consuming functions (the Lussor-functions and the OperGradIntGF.for) have already been parallelized. Fig. 3.16 shows the acceleration for the LussorSca5 on
the Compaq Alpha server of CERFACS
as a function of meshsize.
3.3.2 Management and support (M. Montagnac, J.-F. Boussuge, S. Champagneux)
Tasks related to software management and code engineering are of the primary
importance in both a research and industrial working environnement. CERFACS'
industrial partners require high turnaround time response, reliability and
robustness among many other aspects. Furthermore, CERFACS researchers also ask
for simplicity in coding, for code clarity and for a highly-tunable code.
Those requests are reflected in the activities of the aerodynamics group. The elsA software comes along with
procedures to enhance productivity in a multi-user and multi-platform
environment: validation database, unitary test cases, cvs management tools,
software quality program, documentation, training.
Common works include the development of new features, the re-engineering of designs,
the improvement of verification and validation databases, the contribution to
debugging and to quality reviews and the writing of user's, developer's and
theoretical manuals.
Portability tests, optimization and benchmarking actions are also frequent
activities to ensure the reliability and the efficiency of the code and to
enable smooth transitions whenever industrial partners renew their computing
facilities.
Finally, researchers at CERFACS can take advantage of the industrial
environment delivered by Airbus and installed by the team members on CERFACS
computers. That enables a real synergy between the two partners.
Figure 3.16: Speedup of LussorSca5 on 4 processors of the Compaq Alpha.
Figure 3.17: CPU time of the function LussorSca before and after optimisation.
3.3.3 Parallelism with MPI (M. Montagnac, J.-F. Boussuge, S. Champagneux)
Since 2002, CERFACS is involved in an ONERA project called ParelsA to parallelize elsA.
Unitary test cases were developed to check the implementation of the MPI message
passing library so as to ensure the portability of the code. A re-engineering of
some aspects in the design has been proposed to reduce the number of synchronous
communications and the size of messages. A first insight of that work can be
seen in the following numbers. In a four processor calculation with 2 domains in
each one, the speedup went from 2.86 in the initial version of the code to
3.55 in an enhanced version. Even better results are expected in the near
future. Asynchronous communications are also under investigations.
Those high performance computing activities are carried out with the
CERFACS computing facilities but also with supercomputers from the
French national center CINES.
3.3.4 Code performance (M. Montagnac, J.-F. Boussuge, S. Champagneux)
As part of CERFACS's continuous effort for performance optimisation of its softwares, specific actions have been conducted.
For instance, CERFACS has participated to a benchmark between several European Multi-block codes (NSMB, FLOWer, RANS-MB, etc ...) during automn and winter 2002 in the framework of a rationalisation process conducted within Airbus. CERFACS has contributed to gain an averaged speed-up of three for elsA for a relevant range of industrial configurations on vector architectures such as NEC SX series and FUJITSU VPP series. This has finally lead to the selection of the elsA software as the single MB structured simulation tool across all Airbus sites in Europe. CERFACS is now focusing on SMP's architectures.
Most of the latest high-performance computers have a superscalar
architecture. As elsA was initially optimized for vector-computers, a
significant gain of performance can be achieved by optimizing the code
for scalar machines without changing the numerical behaviour.
In collaboration with the "Parallel Algorithms" team and SMP vendors such as IBM, some optimisation strategies have been
examinated in the context of elsA: changing of array-structure, blocking, prefetch Stream
optimisation, and changing of loop order. The appearance
of significant CPU-time peaks has been explained and methods to avoid them
have been found. Fig. 3.17 shows the impact
of the optimisation for a very CPU time consuming function, LussorSca.
|
|
|