% An Automated Tool for Parallel Finite Element Methods % with Domain-Decomposed Unstructured Meshes % % Jonathan Richard Shewchuk and Omar Ghattas % Carnegie Mellon University % Pittsburgh, PA 15213-3891 % jrs@cs.cmu.edu oghattas@cs.cmu.edu % % This is a LaTeX file. % \documentstyle[12pt]{article} \newcommand{\newtitle}[1]{\title{\normalsize \bf #1}} \newcommand{\newauthor}[1]{\author{\normalsize\sc #1}} \newcommand{\address}[1]{\vskip 2pt\centerline{#1}\vskip 2pt} \textheight 9.5 true in \textwidth 6.5 true in \hoffset -1.5 true cm \voffset -2.5 true cm \begin{document} \newtitle{An Automated Tool for Parallel Finite Element Methods\\ with Domain-Decomposed Unstructured Meshes} \newauthor{Jonathan Richard Shewchuk} \date{} \maketitle \vskip -30pt \address{School of Computer Science} \address{Carnegie Mellon University} \address{Pittsburgh, PA 15213-3891} \vskip 10pt \setcounter{page}{0} \renewcommand{\thepage}{\alph{page}} Archimedes is an automated system for finite element analysis on unstructured meshes using distributed memory supercomputers. Its components include a mesh generator, a mesh partitioner, a program for placement of data and routing of communication on a multicomputer, and a code generator. The mesh generator creates quality two-dimensional meshes on complex straight-line domains. Three-dimensional meshes can also be generated, albeit of lesser quality. Mesh refinement based on {\em a posteriori} error estimates is also possible. Mesh partitioning is done by a fast geometric algorithm which produces provably good partitions. This algorithm can also be used to obtain a nested dissection ordering on each subdomain of an unstructured mesh. The subdomains of the partitioned mesh are automatically placed on the processors of the parallel machine in such a way as to reduce communication costs. Connections are routed between processors. I focus on Archimedes' code generator, whose input is C code with special machine-independent primitives for finite element analysis, and whose output is parallel code for a particular multicomputer. The code generator allows one to write parallel code without knowing the underlying communication mechanisms of the parallel architecture. This makes the task of writing parallel finite element solvers, or experimenting with iterative linear solvers and preconditioners, much easier and quicker. Furthermore, the generality of the language allows Archimedes to be used to solve problems from a wide variety of scientific domains. I describe an implementation of an iterative linear solver and a domain decomposition method. By eliminating the interior nodes of each subdomain (static condensation), each processor explicitly forms its Schur complement. Each local Schur complement is a dense matrix connecting the nodes on the boundary of one subdomain; however, the global Schur complement is sparse. An iterative method is used to solve the global system on all boundary nodes; then each processor uses a local iterative method or backsubstitution to solve for its interior nodes. There are two advantages of explicitly forming the Schur complement: it is easy to precondition, and dense matrix arithmetic is much faster than sparse matrix arithmetic on modern sequential processors. I compare execution times and memory use for conjugate gradients and domain decomposition, implemented on a 64-processor iWarp multicomputer. \end{document}