s
KATHOLIEKE UNIVERSITEIT LEUVEN FACULTEIT TOEGEPASTE WETENSCHAPPEN DEPARTEMENT COMPUTERWETENSCHAPPEN AFDELING NUMERIEKE ANALYSE EN TOEGEPASTE WISKUNDE Celestijnenlaan 200A – B-3001 Heverlee
KRYLOV CONVERGENCE ACCELERATION AND DOMAIN DECOMPOSITION METHODS FOR NONMATCHING GRIDS
Proefschrift voorgedragen tot het behalen van het doctoraat in de toegepaste wetenschappen
Promotoren: Prof. Dr. ir. D. Roose Prof. Dr. X.-C. Cai
door Serge GOOSSENS
June 2000
s
KATHOLIEKE UNIVERSITEIT LEUVEN FACULTEIT TOEGEPASTE WETENSCHAPPEN DEPARTEMENT COMPUTERWETENSCHAPPEN AFDELING NUMERIEKE ANALYSE EN TOEGEPASTE WISKUNDE Celestijnenlaan 200A – B-3001 Heverlee
KRYLOV CONVERGENCE ACCELERATION AND DOMAIN DECOMPOSITION METHODS FOR NONMATCHING GRIDS
Jury: Prof. Dr. ir. R. Govaerts, voorzitter Prof. Dr. ir. D. Roose, promotor Prof. Dr. X.-C. Cai, promotor (University of Colorado at Boulder, U.S.A.) Prof. Dr. ir. S. Vandewalle Prof. Dr. ir. E. Toorman Prof. Dr. ir. R. Piessens Prof. Dr. ir. C. Vuik (Technische Universiteit Delft, Nederland)
U.D.C. 681.3G15
June 2000
Proefschrift voorgedragen tot het behalen van het doctoraat in de toegepaste wetenschappen door Serge GOOSSENS
c Katholieke Universiteit Leuven — Faculteit Toegepaste Wetenschappen Arenbergkasteel, B-3001 Heverlee, Belgium
Alle rechten voorbehouden. Niets uit deze uitgave mag worden vermenigvuldigd en/of openbaar gemaakt worden door middel van druk, fotocopie, microfilm, elektronisch of op welke andere wijze ook zonder voorafgaande schriftelijke toestemming van de uitgever.
All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the publisher.
D/2000/7515/30 ISBN 90-5682-260-8 Rev. 1
Krylov Convergence Acceleration and Domain Decomposition Methods for Nonmatching Grids
Serge Goossens Departement Computerwetenschappen, K.U.Leuven Celestijnenlaan 200A, B-3001 Heverlee, Belgi¨e
Abstract The numerical solution of partial differential equations is an important research topic in the field of scientific computing and numerical simulation. After discretisation, very large systems of linear equations have to be solved. In domain decomposition methods, the domain is split into a number of subdomains and the resulting subproblems are coupled via “artificial” boundary conditions. Krylov subspace methods are iterative methods to solve systems of linear algebraic equations. They are easy to implement but require a good preconditioner. In this thesis we develop efficient numerical solvers for partial differential equations, based on the combination of Krylov subspace methods, such as Flexible GMRES, with domain decomposition preconditioning and to extend the applicability of the developed technique to (overlapping) nonmatching grids. The first part of the thesis deals with improving the convergence of the iterative solver. The domain decomposition method can be optimised by determining an optimal coupling between the subdomains. This is done by imposing boundary conditions for the subdomain problems. Ritz and Harmonic Ritz values and the corresponding vectors are studied to understand the convergence behaviour of GMRES and to extract important information about the eigenvalue spectrum of the preconditioned matrix. With the Ritz vectors corresponding to outlying eigenvalues we can speed up the solution process. This has been successfully applied in a solver for the Shallow Water Equations. We also constructed an optimised nested Krylov method. This is an attractive way to extract a near-optimal approximation
iv
from a high dimensional Krylov subspace while keeping memory and computational requirements reasonably low. The second part of the thesis is devoted to the extension of the adopted domain decomposition method to nonmatching grids and focuses on discretisation techniques and error analysis. In the case of (overlapping) nonmatching grids information transfer from one grid to the other grid is not trivial since there is no global discretisation from which this information transfer can be derived. We focus on interpolation formulae and modified discretisation stencils to construct a consistent and second order accurate global discretisation. We also consider the mortar projection as interpolation technique and a coupling technique based on a finite difference discretisation is proposed as an alternative to interpolation.
Krylov Convergence Acceleration and Domain Decomposition Methods for Nonmatching Grids
Serge Goossens Departement Computerwetenschappen, K.U.Leuven Celestijnenlaan 200A, B-3001 Heverlee, Belgi¨e
Samenvatting Het numeriek oplossen van parti¨ele differentiaalvergelijkingen is een belangrijk onderzoeksonderwerp binnen de vakgebieden wetenschappelijk rekenen en numerieke simulatie. Na discretisatie moeten heel grote stelsels lineaire vergelijkingen opgelost worden. In domeindecompositiemethoden wordt het domein gesplitst in een aantal deeldomeinen en worden de resulterende deelproblemen gekoppeld aan de hand van “artifici¨ele” randvoorwaarden. Krylov deelruimte methoden zijn iteratieve methoden om stelsels lineaire vergelijkingen op te lossen. Ze kunnen gemakkelijk ge¨ımplementeerd worden, maar vereisen een goede preconditioneringstechniek. In deze thesis ontwikkelen we effici¨ente numerieke oplossingstechnieken voor parti¨ele differentiaalvergelijkingen, gebaseerd op de combinatie van Krylov deelruimte methoden, zoals Flexibele GMRES, en domeindecompositiepreconditionering. We breiden de toepasbaarheid van de ontwikkelde technieken ook uit voor (overlappende) niet-aansluitende roosters. Het eerste deel van de thesis behandelt het verbeteren van de convergentie van de iteratieve oplossingsmethode. De domeindecompositiemethode kan verbeterd worden door een optimale koppeling tussen de deeldomeinen op te leggen. Dit gebeurt aan de hand van de randvoorwaarden voor de deelproblemen. De Ritz en harmonische Ritz waarden en de overeenkomstige vectoren worden bestudeerd om meer inzicht te krijgen in het convergentiegedrag van GMRES en om belangrijke informatie over het eigenwaardenspectrum van de gepreconditioneerde matrix te bekomen.
vi
Aan de hand van de Ritz vectoren horende bij een klein aantal ge¨ısoleerde eigenwaarden, kunnen we de oplossingsmethode versnellen. Dit hebben we met succes toegepast in een simulatieprogramma voor de Ondiep Water Vergelijkingen. We hebben ook een verbeterde geneste Krylov methode geconstrueerd. Dit is een aantrekkelijke manier om bijna-optimale benaderingen uit een hoogdimensionale Krylov deelruimte te berekenen terwijl de reken- en geheugenkost toch nog redelijk laag blijft. In het tweede deel van de thesis bestuderen we de uitbreiding van de ontwikkelde domeindecompositiemethode naar niet-aansluitende roosters en concentreren we ons vooral op discretisatietechnieken en foutenanalyse. In het geval van overlappende niet-aansluitende roosters is informatieoverdracht van het ene rooster naar het andere rooster niet triviaal omdat er geen globale discretisatie is waarvan deze informatieoverdracht zou kunnen afgeleid worden. We bestuderen interpolatieformules en gewijzigde discretisatieschema’s om een consistente en globaal tweede orde nauwkeurige discretisatie te bekomen. We beschouwen ook de mortelprojectie als interpolatietechniek en als een alternatief voor interpolatie, stellen we een koppelingstechniek voor gebaseerd op een eindige-differentiediscretisatie.
Preface We’re supposed to research our subject, write it up and present it to the class with a visual aid.1
This thesis is the result of nearly five years of “research”, but it is clear that these five years resulted in a lot more than this thesis alone. First of all I would like to thank my thesis advisor, Prof. D. Roose, for the freedom I enjoyed while doing the research for this thesis. One of the most remarkable things I learned from him is how to handle deadlines. His “good words” and his signature have proved to be very useful on several applications. I enjoyed the collaboration with K. H. Tan (April 1st – June 21st, 1996) at the Waterloopkunding Laboratorium Delft Hydraulics (Delft, The Netherlands) and would like to thank Prof. G. Stelling for making this visit possible and E. de Goede for his help with the software. I would also like to thank R. W. Freund for bringing quasi kernel polynomials to my attention and I am grateful to H. A. Van der Vorst for his valuable suggestions, which resulted in several improvements in Chapter 4. I spent some time (July 16th – August 27th, 1997, August 24th – September 23th, 1998 and July 26th – August 20th, 1999) at the Department of Computer Science of the University of Colorado at Boulder (U.S.A.). I am very grateful to Prof. X.-C. Cai, not only for inviting me to Boulder and for the many discussions we had on nonmatching grids methods, but also for accepting to be my second thesis advisor. Prof. D. Keyes invited me to the Department of Computer Science of the Old Dominion University (U.S.A.) and to the Institute for Computer Applications in Science and Engineering (ICASE) at NASA Langley Research Center (U.S.A.). I still remember his enthusiastic reaction after my presentation at the Ninth International Conference on Domain Decomposition in Norway. 1 Bill
Watterson, Something under the bed is drooling (A Calvin and Hobbes Collection).
vii
viii
PREFACE
I would like to thank Prof. R. Govaerts, Prof. R. Piessens, Prof. E. Toorman, Prof. S. Vandewalle and Prof. C. Vuik for reading this thesis and accepting to be members of the jury. The contribution of my family and friends is of course not visible in this thesis, but their support has been indispensable. Last but not least, I want to mention that living in the Kaboutermansstraat in Leuven has been very enjoyable the past few years. Serge Goossens June 2000
Acknowledgement Engineers and scientists can never earn as much as business executives and sales people.2
This thesis presents research results of the Belgian Programme on Interuniversity Poles of Attraction, initiated by the Belgian State, Prime Minister’s Office for Science, Technology and Culture (IUAP P4/2). The scientific responsibility rests with its authors. The research presented in this thesis is also supported by the Research Council K.U.Leuven (OT/94/16). The financial support by the Flemish Institute for the Promotion of Scientific and Technological Research in Industry (Vlaams Instituut voor de bevordering van het Wetenschappelijk Technologisch onderzoek in de industrie (I.W.T.)) in the form of specialisation scholarship is gratefully acknowledged. Part of the work for this thesis was carried out during a research visit (April 1st – June 21st, 1996) to Waterloopkunding Laboratorium Delft Hydraulics (Delft, The Netherlands). The financial support for this visit by the E.C. MAST II Concerted Action “Application of High Performance Computing Techniques for the Modeling of Marine Ecosystems” (MMARIE) is gratefully acknowledged. Part of the work for this thesis was carried out during three research visits (July 16th – August 27th, 1997, August 24th – September 23th, 1998 and July 26th – August 20th, 1999) to the Department of Computer Science of the University of Colorado at Boulder (U.S.A.). The financial support for these visits by the Fund for Scientific Research - Flanders (Fonds voor Wetenschappelijk Onderzoek - Vlaanderen (F.W.O.)) is gratefully acknowledged. 2 Dilbert’s “Salary statement”. This theorem can be supported by simple mathematics, based on the following two postulates: “knowledge is power” (1) and “time is money” (2). As every engineer knows: Power = Work / Time. Now since Knowledge = Power (postulate 1) and Time = Money (postulate 2), we know that: Knowledge = Work / Money. Solving for Money, we get: Money = Work / Knowledge. Thus, as Knowledge approaches zero, Money approaches infinity, provided that even a very small amount of work is done. Conclusion: “The less you know, the more (money) you make”.
ix
x
ACKNOWLEDGEMENT
The travel support by Vlaamse Leergangen Leuven is also gratefully acknowledged.
Nederlandse Samenvatting
Krylov convergentieversnelling en domeindecompositiemethoden voor niet-aansluitende roosters Wat ik ook zeg of doe, de Euro blijft dalen. Misschien zeg ik beter helemaal niets meer.3
Inhoudsopgave 1 Inleiding
xii
2 Krylov-deelruimtemethoden
xiv
3 Domeindecompositiemethoden
xv
4 Ritz en Harmonische Ritz waarden uit Krylov-deelruimten 4.1 Inleiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Ritz Waarden en FOM Residu Veelterm. . . . . . . . . . . . . 4.3 Harmonische Ritz Waarden en GMRES Residu Veelterm . . .
xvi xvii xvii xviii
5 Geneste Krylov-deelruimtemethoden
xviii
3 Wim
Duisenberg, Voorzitter van de Europese Centrale Bank (ECB).
xi
NEDERLANDSE SAMENVATTING
xii
6
Krylov-deelruimtemethode voor de Ondiep-watervergelijkingen
xx
7
Veralgemeende Additieve Schwarz Methode voor de Ondiep-watervergelijkingen 7.1 Veralgemeende Additieve Schwarz Methode . . . . . . . . . . 7.2 Eigenwaardenspectrum en Convergentiegeschiedenis . . . . .
xxi xxi xxiii
8
Domeindecompositiemethoden voor niet-aansluitende roosters 8.1 Inleiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Samengesteld-rooster Differentiemethode . . . . . . . . . . .
xxiv xxiv xxiv
9
Mortel-projectie in Samengesteld-rooster Differentiemethode
xxxii
10 Koppeling door Eindige Differentie Discretisatie
xxxiii
11 Conclusies en suggesties voor toekomstig onderzoek 11.1 Conclusies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Suggesties voor toekomstig onderzoek . . . . . . . . . . . . .
xxxiv xxxiv xxxv
1
Inleiding
In deze thesis ontwikkelen we effici¨ente numerieke oplossingsmethoden voor parti¨ele differentiaalvergelijkingen, waarvoor we Krylov-deelruimtemethoden met domeindecompositiepreconditionering gebruiken. Hierbij is het de bedoeling een groot, ijl lineair stelsel Ax = b op te lossen, waarin de matrix A de discretisatie van een parti¨ele differentiaalvergelijking voorstelt. Deze vergelijking is gesteld op een domein dat gepartitioneerd is in verschillende deeldomeinen. Bij de domeindecompositiemethoden, die wij bestuderen, worden problemen op (overlappende) deelgebieden opgelost en wordt de koppeling tussen de deeldomeinen gerealiseerd aan de hand van randvoorwaarden op de randen van de deeldomeinen. Dit leidt tot een iteratief schema, de zgn. Schwarz iteratie. Deze techniek biedt voordelen in verband met modeleerflexibiliteit en maakt parallelle verwerking mogelijk. We gebruiken de domeindecompositiemethode als preconditioneringstechniek voor een Krylov-deelruimte iteratieve methode om de convergentie te versnellen. De stelsels lineaire vergelijkingen (bekomen na discretisatie en linearisatie) worden opgelost met een Krylov-deelruimte iteratieve methode, gebaseerd op de welbekende GMRES-methode. Om snelle convergentie te bekomen is het essentieel om deze iteratieve methode toe te passen op een “gepreconditioneerd” probleem. De preconditionering is gebaseerd op een additieve Schwarz domeindecompositiemethode. In vergelijking met “globale” preconditioneringstechnieken (zoals ILU e.d.) hebben domeindecompositiemethoden een aantal belangrijke
xiii
voordelen: ze zijn gemakkelijk en effici¨ent te parallelliseren; ze maken een aanzienlijke vereenvoudiging van problemen met een complexe geometrie mogelijk en ten slotte vertonen ze (samen met de aanverwante multi-rooster-methoden) superieur convergentiegedrag, d.w.z. dat de hoeveelheid rekenwerk om de oplossing te vinden, lineair evenredig is met het aantal vrijheidsgraden. Omwille van deze goede scaleerbaarheid vormen domeindecompositiemethoden (zeker wanneer ze gecombineerd worden met Krylov-deelruimtemethoden) de sleutel tot grootschalige numerieke simulaties. 1. Het eerste deel van de thesis behandelt het verbeteren van de convergentie van de iteratieve oplossingstechniek. (a) De domeindecompositiemethode kan ge-optimiseerd worden door een optimale koppeling tussen de deeldomeinen te bepalen. Dit gebeurt door randvoorwaarden op te leggen voor de deelproblemen. (b) Ritz en Harmonische Ritz waarden en de overeenkomstige vectoren worden bestudeerd om het convergentiegedrag van GMRES beter te begrijpen. (c) We stellen een verbeterde geneste Krylov-methode voor. Dit is een aantrekkelijke manier om bijna-optimale benaderingen uit een hoogdimensionale Krylov-deelruimte te berekenen terwijl de reken- en geheugenkost toch nog redelijk laag blijft. 2. In het tweede deel van de thesis bestuderen we de uitbreiding van de ontwikkelde domeindecompositiemethode naar niet-aansluitende roosters en concentreren we ons vooral op discretisatietechnieken en foutenanalyse. (a) In het geval van overlappende niet-aansluitende roosters is informatieoverdracht van het ene rooster naar het andere rooster niet triviaal omdat er geen globale discretisatie is waarvan deze informatieoverdracht zou kunnen afgeleid worden. We bestuderen interpolatieformules en gewijzigde discretisatieschema’s om een consistente en globaal tweede orde nauwkeurige discretisatie te bekomen. (b) We beschouwen ook de mortel-projectie als interpolatietechniek. (c) Als een alternatief voor interpolatie, stellen we een koppelingstechniek voor gebaseerd op een eindige-differentie-discretisatie. De ontwikkelde methoden worden gebruikt voor het oplossen van de ondiepwatervergelijkingen. Deze vergelijkingen zijn afgeleid van de algemene Navier– Stokes vergelijkingen en beschrijven de stroming van water in ondiepe gebieden, zoals rivieren, estuaria en zee¨en zoals de Noordzee. Dit gebeurde in samenwerking met het Waterloopkundig Laboratorium Delft Hydraulics (Delft, Nederland).
xiv
2
NEDERLANDSE SAMENVATTING
Krylov-deelruimtemethoden
Discretisatie (en eventueel linearisatie) van parti¨ele differentiaalvergelijkingen leidt tot grote, ijle lineaire stelsels, d.w.z. de matrix bevat zeer veel nulelementen. Iteratieve methoden om grote, ijle lineaire stelsels op te lossen worden steeds meer gebruikt. Vroeger werden directe methoden steeds verkozen boven iteratieve in realistische toepassingen en in commerci¨ele software omwille van hun robuustheid en voorspelbaar gedrag (geen convergentieproblemen, nauwkeurigheid volledig bepaald door de machineprecisie en het conditiegetal). Ondertussen beschikken we over een aantal zeer effici¨ente iteratieve methoden om lineaire stelsels op te lossen en is er een duidelijke verschuiving naar het gebruik van iteratieve technieken in toepassingsdomeinen, zoals numerieke stromingsmechanica, structuurmechanica, : : : De ijlheid van de matrices is de belangrijkste motivatie om iteratieve methoden te gebruiken, omdat deze in staat zijn gebruik te maken van deze ijle structuur, terwijl directe methoden, zoals Gauss-eliminatie, veel last ondervinden van het zgn. invullen van nul-posities tijdens de factorisatie A = LU. Discretisatie van 2D problemen leidt vaak tot een pentadiagonale matrix waarvan de bandbreedte evenredig is met 2n, waarbij n het aantal roosterpunten is per lijn. Ofschoon de matrix A slechts 5 diagonalen met niet-nul elementen bevat, zullen de onder- en bovendriehoeksmatrices L en U van de factorisatie A = LU, toch elk n diagonalen met niet-nul elementen bevatten, waardoor het geheugengebruik vaak onaanvaardbaar stijgt. Iteratieve methoden hebben geen last van dit invuleffect, vermits deze methoden enkel de matrix A gebruiken in matrix-vector producten van de vorm Ax. Naast de klassieke iteratieve methoden, zoals de Gauss–Seidel iteratie, is er op dit ogenblik heel wat belangstelling voor Krylov-deelruimte iteratieve methoden. In deze methoden wordt iteratief een basis berekend voor de Krylov-deelruimte en wordt een benaderende oplossing van het stelsel gezocht in deze ruimte, volgens een bepaald benaderingscriterium. De reeds lang gekende methode van de toegevoegde gradi¨enten is een Krylov-deelruimtemethode. Deze methode is erg effici¨ent in geheugengebruik, maar kan slechts gebruikt worden voor symmetrische positief-definiete problemen. De veralgemeende minimale residu methode (GMRES) is een optimale Krylov-deelruimtemethode voor niet-symmetrische, niet-positief-definiete problemen, die echter veel geheugen vergt. Het grote voordeel van een Krylovdeelruimtemethode is dat ze door toepassingsmensen als zwarte doos gebruikt kan worden. Men kan kiezen uit een waaier methoden, waarbij de keuze gebaseerd wordt op een aantal eenvoudige criteria, zoals geheugengebruik en eigenschappen van de matrix, bv. symmetrisch positief-definiet zijn. Essentieel voor een snelle convergentie is wel het gebruik van een goede preconditionering. Preconditionering betekent dat men, in plaats van het stelsel Ax = b, het stelsel AM 1 y = b oplost, waarbij M een benadering van A voorstelt. De benadering x = M 1 y wordt achteraf berekend door toepassing van de
xv
preconditionering op de oplossing y van het gepreconditioneerde stelsel. Er wordt dan ook heel wat onderzoek gedaan naar goede preconditioneringstechnieken. Gekende voorbeelden zijn diagonaal- en ILU-preconditionering. Recent gaat de aandacht naar preconditionering gebaseerd op domeindecompositie. De combinatie van preconditionering en Krylov-deelruimtemethoden kan effici¨ente en eenvoudige algoritmen leveren voor algemeen gebruik. Op dit ogenblik zijn deze Krylov-deelruimtemethoden competitief met de (klassieke) directe methoden. Dit is een geweldige vooruitgang vergeleken met vroegere iteratieve methoden, die geconstrueerd werden met bepaalde toepassingen in het achterhoofd en dus allerlei probleemafhankelijke parameters e.d. bevatten. Het belang en het succes van iteratieve methoden wordt duidelijk ge¨ıllustreerd door hun gebruik in softwarepakketten voor problemen uit de elasticiteitsleer en structuurmechanica. Tot voor kort waren deze softwarepakketten allen gebaseerd op directe methoden, vnl. frontale methoden, wat neerkomt op het toepassen van Gauss-eliminatie volgens een orde gedefinieerd door (golf)fronten die over de structuur lopen tijdens het eliminatieproces. Heden ten dage bevatten deze softwarepakketten ook iteratieve methoden, vooral Krylov-deelruimtemethoden. Vooral voor drie-dimensionale problemen zijn iteratieve technieken noodzakelijk. De huidige verwachting is dat we binnenkort regelmatig lineaire stelsels met 5 109 onbekenden zullen moeten oplossen. De benodigde rekentijd voor de meest economische directe methode, die momenteel bekend is, wordt geschat op 520 040 jaar, op voorwaarde dat de berekeningen kunnen uitgevoerd worden met een rekensnelheid van 1 Teraflop. Anderzijds bedraagt de benodigde rekentijd voor een algoritme gebaseerd op gepreconditioneerde toegevoegde gradi¨enten 575 seconden, indien dezelfde rekensnelheid kan gebruikt worden. In de praktijk zal de rekensnelheid voor de iteratieve methode zal wel iets lager zijn dan voor de directe. Bovendien zijn de geheugenvereisten van directe methoden veel te hoog voor dergelijke problemen.
3 Domeindecompositiemethoden “Divide et impera” is de filosofie achter het numeriek knip- en plakwerk dat met de term domeindecompositie aangeduid wordt. In 1870 publiceerde Schwarz een artikel dat ondertussen beschouwd wordt als het zaadje dat de domeindecompositieboom deed groeien. Schwarz was geboeid door de vraag of harmonische functies ook bestonden op gebieden met een niet-eenvoudige geometrie. Uitgaande van de kennis van het bestaan van harmonische functies op rechthoeken en cirkels, bewees hij dat harmonische functies ook bestaan op een gebied dat bestaat uit een rechthoek en een cirkel, die elkaar deels overlappen. Het bewijs dat hij hiervoor leverde gaf aanleiding tot een algoritme dat ondertussen gekend is als de alternerende Schwarz methode. Hierbij wordt het domein, waarop de parti¨ele differentiaalvergelijking moet opgelost worden, opgesplitst in een aantal overlappende
xvi
NEDERLANDSE SAMENVATTING
deeldomeinen. Dit wordt ge¨ıllustreerd in figuur 3.1 voor twee deeldomeinen Ω1 en Ω2 . Deze opsplitsing geeft aanleiding tot artifici¨ele randen voor de deeldomeinen (Γ1 voor Ω1 en Γ2 voor Ω2 ). Om de differentiaalvergelijking op te lossen op een deeldomein, moet een artifici¨ele randvoorwaarde op de artifici¨ele rand gegeven worden. Startend met een schatting voor de oplossing op Γ1 kan de differentiaalvergelijking opgelost worden op Ω1 , wat een schatting voor de oplossing levert op Γ2 . Deze schatting kan gebruikt worden als artifici¨ele randvoorwaarde om de differentiaalvergelijking op te lossen op Ω2 . Dit leidt tot een iteratieve methode, die onder bepaalde voorwaarden convergeert naar de oplossing van het globale probleem. Indien de roosters in beide deeldomeinen op elkaar aansluiten (doorlopende roosterlijnen) kan de oorspronkelijke methode aangepast en geoptimaliseerd worden. Dit leidt tot de multiplicatieve Schwarz methode. Helaas is deze methode (in originele vorm) niet geschikt voor uitvoering op een parallelle machine. Een kleine wijziging aan dit algoritme maakt het mogelijk dat alle deeldomeinen tegelijkertijd behandeld worden, met het aantal deeldomeinen gelijk aan het aantal processoren. Deze parallelle variant wordt additieve Schwarz methode genoemd omdat de oplossing van het probleem over het hele domein ontstaat door de som te maken van (de uitbreidingen van) de oplossingen van alle deelproblemen. Op dit ogenblik zijn domeindecompositiemethoden ontwikkeld tot praktisch bruikbare algoritmen die frequent gebruikt worden in technisch–wetenschappelijk rekenen. Domeindecompositiemethoden worden ook met succes gebruikt als goede preconditioneringstechnieken voor Krylov-deelruimtemethoden.
4 Ritz en Harmonische Ritz waarden uit Krylovdeelruimten Dit onderzoek werd verricht om schattingen van de eigenwaarden te bekomen tijdens het oplossen van het stelsel met de Krylov-deelruimte iteratieve methode. Op deze manier krijgen we een idee van het eigenwaardenspectrum van de gepreconditioneerde operator AM 1 voor grotere problemen. Voor grotere problemen kunnen we niet alle eigenwaarden van AM 1 berekenen omdat dit veel meer werk vraagt dan het oplossen van het stelsel. Voor een aantal kleine modelproblemen hebben we dit wel gedaan. We weten echter dat de meeste eigenwaarden van AM 1 gelijk zijn aan 1, omdat elke kolom die A en M gemeenschappelijk hebben, aanleiding geeft tot een eigenwaarde 1 voor AM 1 . De overblijvende eigenwaarden, die niet in deze cluster zitten, zijn zgn. “outliers” en hiervoor kunnen goede benaderingen bekomen worden aan de hand van de Ritz en Harmonische Ritz waarden.
xvii
4.1 Inleiding Het Arnoldi proces berekent een orthonormale basis voor K m (A; r0 ) en dus vormen de basisvectoren Vm = v1 v2 : : : vm 2 Rnm een orthogonale matrix. Tijdens het orthogonalisatieproces worden de getallen hi; j berekend zodat de Hessenberg matrices Hm = (hi; j ) 2 Rmm en H¯ m 2 R(m+1)m voldoen aan de fundamentele relatie AVm = Vm Hm + hm+1;m vm+1 eH m
¯ m: = Vm+1 H
(1)
Zowel FOM als GMRES zoeken een benaderende oplossing in K m (A; r0 ). Deze benadering kan geschreven worden als x = ϕm 1 (A)r0 , met ϕm 1 (λ) = γm 1 λm 1 + + γ1 λ + γ0 2 Pm 1 een re¨ele veelterm van graad (m 1). Het residu dat hiermee correspondeert is r=b
Ax = (I
Aϕm
˜ m (A)r0 1 (A)) r0 = ϕ
2K m
+1 (
A; r0 ):
FOM kiest de benadering zodat het residu orthogonaal staat t.o.v. K VmH (r0
(2)
m (A; r0 ):
AVm ym ) = 0 , Hm ym = βe1 ;
(3)
met β = jjr0 jj2 . GMRES minimaliseert de norm van het residu krGMRES k2 :
krGMRESk2 = kVmH 1 (r0 +
AVm ym ) k2 = kβe1
H¯ m ym k2 :
(4)
De vector ym wordt berekend door het overgedetermineerde stelsel H¯ m ym = βe1 op te lossen, bv. aan de hand van de normaalvergelijkingen:
H¯ mH H¯ m ym = HmH βe1 , Hm + h2m+1;m fm eH m ym = βe1 ;
(5)
met fm = Hm H em .
4.2 Ritz Waarden en FOM Residu Veelterm. De klassieke Galerkin aanpak om benaderende eigenwaarden te berekenen is als volgt. Een benaderende eigenvector x = Vm ym wordt in K m (A; r0 ) gezocht zodat het residu van de eigenwaardenvergelijking orthogonaal staat t.o.v. K m (A; r0 ) (Ax
µx) ? K
m (A; r0 )
, VmH (AVmym
µVm ym ) = 0:
(6)
De benaderende eigenwaarden worden gevonden als de eigenwaarden van Hm = (m) VmH AVm . De Ritz waarden ϑi zijn per definitie de eigenwaarden van deze matrix Hm ; het zijn dus de gekende “Arnoldi eigenwaardenschattingen”. De FOM(λ) is een veelvoud van de karakteristieke veelterm van Hm . residuveelterm ϕ˜ FOM m Dit impliceert dat de Ritz waarden de nulpunten van de FOM-residuveelterm zijn en dat ze deze methode volledig karakteriseren.
NEDERLANDSE SAMENVATTING
xviii
4.3
Harmonische Ritz Waarden en GMRES Residu Veelterm
(m) De Harmonische Ritz waarden ϑ˜ i zijn per definitie de omgekeerden van de (ge1 wone) Ritz waarden van A berekend uit AK m (A; r0 ). De motivatie voor deze definitie is het feit dat de omgekeerden van de Harmonische Ritz waarden in het “field of values” van A 1 zitten, terwijl de Ritz waarden in dat van A zitten. De Harmonische Ritz waarden zijn benaderende eigenwaarden volgens het minimale residu criterium. Een benaderende eigenvector x = Vm ym wordt gezocht in K m (A; r0 ). Het residu van de eigenwaardenvergelijking moet orthogonaal zijn t.o.v. AK m (A; r0 )
(Ax
µx) ? AK
m (A; r0 )
, (AVm)H (AVmym
µVmym ) = 0:
(7)
Met (2.16) vinden we de equivalente eigenwaardenproblemen
H¯ mH H¯ m ym = µHmH ym , Hm + h2m+1;m fm eH m ym = µym :
(8)
We kunnen de norm van de rang-1 aanpassing in (4.15) afschatten
kh2m
H +1;m f m em k2
h2m+1;m σmin (Hm )
:
(9)
De Harmonische Ritz waarden zijn gelijk aan de Ritz waarden indien een invariante deelruimte gevonden is, omdat dan hm+1;m = 0. Het zijn dan ook eigenwaarden van A. Vergelijking (4.17) toont aan dat de verschillen tussen de Harmonische Ritz waarden en de Ritz waarden alleen maar groot kunnen zijn als hm+1;m groot is en σmin (Hm ) klein is. Dit is het geval wanneer GMRES stagnateert. Net zoals voor Ritz waarden en de FOM-residuveelterm, hebben we bewezen dat de Harmonische Ritz waarden de nulpunten van de GMRES-residuveelterm zijn, door aan te tonen dat deze laatste een veelvoud is van een karakteristieke veelterm van een matrix die als eigenwaarden deze Harmonische Ritz waarden heeft.
5
Geneste Krylov-deelruimtemethoden
Om het stelsel Ax = b op te lossen gebruiken we een Krylov-deelruimtemethode (meestal GMRES of FGMRES) met (7.1) als preconditioneringstechniek. Zoals hoger vermeld gebruiken we ook een Krylov-deelruimtemethode om de deelproblemen op te lossen, wat inhoudt dat we de inverse Mi 1 vervangen door pk (Mi ). We bestuderen deze aanpak wanneer slechts 1 deeldomein gebruikt wordt, m.a.w. de preconditionering wordt bekomen door het probleem benaderend op te lossen met een Krylov-deelruimtemethode. We spreken dan van “geneste Krylovdeelruimtemethoden”.
xix
De stelling van Faber en Manteuffel leert ons dat we voor de meeste nietHermitische problemen geen korte recursiebetrekking kunnen vinden die de optimale benaderingen uit opeenvolgende Krylov-deelruimten genereert. Dit impliceert dat ofwel korte recursiebetrekkingen, zoals in BiCG, CGS, QMR, TFQMR, Bi-CGstab en BiCGstab(l), gebruikt worden ofwel wordt de optimaliteit behouden ten koste van hoge geheugenvereisten. In de praktijk worden herstarte of afgebroken varianten van optimale methoden, zoals GMRES, gebruikt. Onlangs werden twee geneste Krylov-deelruimte iteratieve methoden, FGMRES/GMRES en GMRESR, voorgesteld. Zowel FGMRES als GCR maken het mogelijk dat de preconditionering verschillend is in elke stap van de iteratie. GMRES is een veranderlijke preconditionering, omdat de geconstrueerde veelterm in elke stap anders kan zijn. GMRESR, voorgesteld door Van der Vorst en Vuik, is gebaseerd op de GCR (Generalized Conjugate Residual) methode beschreven door Eisenstat et al. We vermelden ook de “Generalized Conjugate Gradient” methode van Axelsson en Vassilevski die nauw verwant is met de GMRESR methode Van der Vorst en Vuik. Er zijn twee verschillende manieren om een geneste iteratie gebaseerd op FGMRES of GCR te bekomen. De eerste is gebaseerd op het residu van de benaderende oplossing in elke stap van de uitwendige iteratie en de tweede is gebaseerd op de laatste vector gegenereerd in het Arnoldi proces. Deze vector is eigenlijk ook een residuvector, namelijk de genormaliseerde residuvector van de geassocieerde Galerkin-projectiemethode. Vermits het residu gedefinieerd is als r j = b Ax j , met x j de benaderende oplossing, is het duidelijk dat indien de oplossing van het stelsel Az = r j
(10)
gevonden kan worden, de oplossing van het oorspronkelijke stelsel verkregen kan worden als x = x j + z. Deze residu-gebaseerde aanpak is terug te vinden in GMRESR, waar GCR gebruikt wordt in de uitwendige iteratie en GMRES in de inwendige. Het is eenvoudig aan te tonen dat als de preconditionering exact is in een stap, dit wil zeggen als de inwendige iteratie het stelsel Az = v j
(11)
exact oplost, een minimale residu methode (zoals FGMRES) de exacte oplossing van het originele probleem vindt. De zoekrichtingen in deze methoden zijn verschillend, maar het convergentiegedrag is vergelijkbaar. Het doel is quasi-optimale benaderingen te berekenen en slechts een beperkt aantal vectoren te moeten opslaan. Deze methoden kunnen verbeterd worden door op te merken dat het rechterlid van de inwendige iteratie orthogonaal is ten opzichte van een deelruimte, gegeneerd in de uitwendige iteratie. Het is dus wenselijk in de inwendige iteratie te
xx
NEDERLANDSE SAMENVATTING
orthogonaliseren ten opzichte van deze deelruimte, wat de convergentie versnelt. Dit leidt tot onze FGMRES/EGMRES methode en tot GCRO. Voor een zelfde aantal matrix-vectorvermenigvuldigingen leiden deze methoden altijd tot een betere oplossing dan de originele methode zonder de extra inwendige orthogonalisaties. Deze geneste schema’s kunnen stilvallen (“breakdown”) zonder dat de oplossing gevonden is. Ook de verbeterde schema’s kunnen op deze manier stilvallen. Dit is duidelijk te zien in GCRO waar een singuliere matrix gebruikt wordt om een Krylov-deelruimte te construeren in de inwendige iteratie. Nu is dit fenomeen (“breakdown”) wel zeldzaam in de praktijk, vermits het zich pas kan voordoen nadat het aantal matrix-vectorvermenigvuldigingen groter is geworden dan de dimensie van de Krylov-deelruimte, met andere woorden nadat de inwendige iteratie de ganse Krylov-deelruimte doorlopen heeft. De klassieke aanpak van dit probleem (“breakdown”) is gebruik te maken van de LSQR zoekrichting, die altijd tot een afname van de norm van het residu leidt. Dit is in essentie slechts een stap van een algoritme gebaseerd op de normaalvergelijkingen. Het is genoegzaam bekend dat de normaalvergelijkingen slecht geconditioneerd kunnen zijn en dat de methoden die hierop gebaseerd zijn traag convergeren. Bovendien wordt het gebruik van de transpose van de matrix liever vermeden. Wij hebben een oplossing voor het “breakdown” probleem geconstrueerd die de transpose van de matrix niet vereist door een combinatie van FGMRES en GMRESR te gebruiken. In onze aanpak zorgen we ervoor dat het inwendig Arnoldi proces altijd verder gaat met de laatst berekende vector en dus zeker een component bevat uit de e´ e´ n-dimensionale ruimte K m+1 (A; r0 ) n K m (A; r0 ), m.a.w. volgens Am b, waarbij m het totaal aantal matrix-vectorvermenigvuldigingen is. Op deze manier kunnen we de hele Krylov-deelruimte doorlopen en vermijden we het “breakdown” probleem dat in GCRO kan optreden. Door in de inwendige iteratie te projecteren op het residu, lossen we impliciet (5.1) op, zodat we het probleem vermijden dat zich voordoet wanneer er stagnatie optreedt en het residu geen component volgens v j heeft omdat in dit geval (5.2) oplossen natuurlijk niets helpt.
6 Krylov-deelruimtemethode voor de Ondiep-watervergelijkingen In de DELFT 3 D - FLOW software wordt tijdsintegratie uitgevoerd met een “Alternating Operator Implicit” (AOI) methode. Bij deze aanpak leidt de ordening van expliciete en impliciete stappen in elke tijdstap tot een stelsel vergelijkingen voor de waterstand. Tot voor kort werd dit stelsel opgelost met een “Alternating Direction Implicit” (ADI) iteratie, die niet (goed) meer convergeert voor grote tijdstappen en kleine roosterafstanden. We hebben een robuuste en effici¨ente oplossingstechniek ge¨ımplementeerd door een Krylov-deelruimte methode te gebruiken met de oorspronkelijke ADI
xxi
methode als preconditioneringstechniek. Deze oplossingstechniek wordt ook gebruikt als de oplossingstechniek voor de deeldomeinen in een domeindecompositiemethode, die ook versneld wordt met behulp van een Krylov-deelruimte methode. In dit geval kunnen bepaalde vectoren van de deelruimte, geconstrueerd tijdens het oplossingsproces, gebruikt worden voor de oplossing van de volgende lineaire stelsels. Dit verhoogt uiteraard de effici¨entie van de methode. De gebruikte domeindecompositiemethode is een additieve preconditioneringstechniek en is dus bijzonder goed geschikt om aangewend te worden op een parallelle computer.
7 Veralgemeende Additieve Schwarz Methode voor de Ondiep-watervergelijkingen We beschrijven de domeindecompositiepreconditionering die we gebruiken bij de oplossing van het lineair stelsel Ax = b.
7.1 Veralgemeende Additieve Schwarz Methode De gebruikte domeindecompositiepreconditionering is gebaseerd op een Veralgemeende Additieve Schwarz Methode (Generalised Additive Schwarz Method, GASM). We noteren de lineaire restrictie-operator die de componenten behorend tot deeldomein i selecteert, als Ri : Ω 7! Ωi . De matrix Mi = Ri ARTi is de deelmatrix van de matrix A die betrekking heeft op deeldomein i. Het resultaat van het toepassen van de GASM kan geschreven worden als een som van de oplossingen van de onafhankelijke deelproblemen die tegelijkertijd kunnen opgelost worden, nl. p
M
1
=
∑ RTi Mi 1 Ri
:
(12)
i=1
We geven de structuur van de GASM voor het geval van twee deeldomeinen, gescheiden door de scheidingslijn Γ zoals getoond wordt in figuur 7.1. De uitbreiding naar meerdere deeldomeinen ligt voor de hand. De deeldomeinen worden uitgebreid tot een kleine overlapping bekomen wordt. Deze overlapping wordt zodanig gekozen dat de restrictie-operatoren kunnen gedefinieerd worden zonder de discretisatiemolecule te verdelen over de deeldomeinoperatoren Mi (d.w.z. de overlapping en de restricties worden gedefinieerd zodat de differentiemolecule alleen maar punten gebruikt die in dit deeldomein aanwezig zijn). Dit is van belang bij parallelle verwerking, waarbij ieder deeldomein in het geheugen van een andere processor bewaard wordt. Figuur 7.2 illustreert het uitbreidingsproces. De punten in deeldomein Ω1 zijn alleen verbonden met punten in Ω1 of in Ωl door de discretisatiemolecule. Gelijkaardige uitspraken kunnen we doen voor punten in Ωl , Ωr en Ω2 . Dit leidt tot
NEDERLANDSE SAMENVATTING
xxii
een blokstructuur voor het stelsel lineaire vergelijkingen. Nadat de nodige overlapping verkregen is door uitbreiding, worden de (kleine) deeldomeinen Ωl en Ωr gedupliceerd in respectievelijk Ωl˜ en Ωr˜ . We verkrijgen dan een uitgebreid stelsel lineaire vergelijkingen, waarin we nog steeds de relatie tussen de “overlappende” onbekenden moeten specifi¨eren. De eenvoudigste manier is gewoon te stellen dat de waarden in de gedupliceerde deeldomeinen Ωl˜ en Ωr˜ kopie¨en moeten zijn van de originele deeldomeinen Ωl en Ωr . Dit is de zgn. Dirichlet-Dirichlet (DD) koppeling, omdat dit neerkomt op het opleggen van Dirichlet randvoorwaarden voor de deelproblemen. Het uitgebreide stelsel lineaire vergelijkingen met deze DDkoppeling kan als volgt geschreven worden: 0 B B B B B B
A11 Al1 0 0 0 0
A1l All 0 I 0 0
0 Alr I 0 0 0
0 0 0 I Arl 0
0 0 I 0 Arr A2r
0 0 0 0 Ar2 A22
10 CB CB CB CB CB CB A
ζ1 ζl ζ˜ r ζ˜ l ζr ζ2
1
0
C B C B C B C=B C B C B A
f1 fl 0 0 fr f2
1
C C C C: C C A
(13)
Snelle convergentie wordt bekomen door een goede splitsing van de uitgebreide Schwarz matrix te kiezen, in plaats van de overlapping groter te maken. De spectraalradius van de gepreconditioneerde operator AM 1 , en dus de convergentie-eigenschappen van een Krylov-deelruimtemethode gepreconditioneerd met de GASM (7.1), kunnen verbeterd worden door het stelsel lineaire vergelijkingen te vermenigvuldigen met een goed gekozen, niet-singuliere matrix P. Dit kan ge¨ınterpreteerd worden als het opleggen van meer algemene randvoorwaarden aan de scheidingslijn Γ. De deelmatrices Clr , Cll , Crr en Crl stellen de discretisatie van de koppelingsvergelijkingen (transmissievoorwaarden) voor en kunnen zodanig gekozen worden dat groepering (“clustering”) van de eigenwaarden van de gepreconditioneerde operator AM 1 bekomen wordt. Deze deelmatrices kunnen vrij gekozen worden onder de voorwaarde dat
C=
Clr Crr
Cll Crl
(14)
niet-singulier is. Deze voorwaarde impliceert dat de matrix P in (7.4) die (7.3) in (7.6) transformeert niet-singulier is. Op deze manier ontstaan de zgn. Lokaal Geoptimiseerde Blok Jacobi (LOBJ) preconditioneringmatrices, die gebaseerd zijn op het uitgebreide stelsel lineaire vergelijkingen Aζ = f : 0 B B B B B B
A11 Al1 0 0 0 0
A1l All Cll Crl 0 0
0 Alr Clr Crr 0 0
0 0 Cll Crl Arl 0
0 0 Clr Crr Arr A2r
0 0 0 0 Ar2 A22
10 CB CB CB CB CB CB A
ζ1 ζl ζ˜ r ζ˜ l ζr ζ2
1
0
C B C B C B C=B C B C B A
f1 fl 0 0 fr f2
1
C C C C: C C A
(15)
xxiii
De Veralgemeende Additieve Schwarz Methode verschilt van de klassieke Additieve Schwarz Preconditioneringstechniek doordat de transmissievoorwaarden over de scheidingslijnen, d.w.z. de randvoorwaarden voor de deeldomeinproblemen, gekozen kunnen worden om spectrale eigenschappen van de gepreconditioneerde operator te verbeteren. We hebben ook aandacht besteed aan het benaderend oplossen van de deelproblemen in (7.1). Dit betekent dat het kleine stelsel met matrix Mi dat in deeldomein Ωi opgelost moet worden, niet exact opgelost wordt, maar benaderend door een iteratieve methode te gebruiken en deze te stoppen wanneer een bepaalde tolerantie bereikt is of wanneer een voorafbepaald aantal iteratiestappen uitgevoerd is. In (7.1) wordt de inverse Mi 1 vervangen door pk (Mi ), een veelterm in Mi die Mi 1 benadert. De motivatie hiervoor is dat het niet nodig is de deelproblemen exact op te lossen als de randvoorwaarden op de artifici¨ele randen nog niet juist zijn. Op deze manier kan veel rekentijd uitgespaard worden zonder dat de kwaliteit van de preconditionering M 1 afneemt. Er dient wel op gelet te worden dat pk (Mi ) een redelijke benadering van Mi 1 is, anders zal het aantal globale iteraties stijgen omdat we lokaal in de deeldomeinen te weinig werk doen.
7.2 Eigenwaardenspectrum en Convergentiegeschiedenis Informatie over het eigenwaardenspectrum maakt het mogelijk gepaste vectoren te selecteren die opgenomen kunnen worden in de benaderingsruimte, gebruikt in FGMRES (Flexible GMRES). Dit versnelt de iteratieve methode nog meer. Onlangs kreeg deze door ons ontwikkelde techniek navolging in het gebied van de niet-lineaire elasticiteitsberekeningen. We gebruiken de GASM als preconditioneringstechniek in een Krylov-deelruimtemethode en we onderzoeken de verbanden tussen de randvoorwaarden die opgelegd worden op de artifici¨ele randen van de deeldomeinen en de eigenwaardenspectra van de gepreconditioneerde operator AM 1 . Voor grotere problemen gebeurt dit aan de hand van de Ritz en Harmonische Ritz waarden. Voor kleine problemen kunnen we alle eigenwaarden van AM 1 berekenen. Met deze informatie is het mogelijk de convergentieversnelling te voorspellen. Bovendien kan de Krylov-deelruimtemethode nog versneld worden wanneer benaderingen van de eigenvectoren, overeenkomend met extreme eigenwaarden van AM 1 expliciet in de zoekruimte gebracht worden. Na het oplossen van het stelsel met GMRES berekenen we de Ritz waarden. Zo krijgen we informatie over de extreme eigenwaarden die aanwezig zijn in het spectrum van AM 1 . Een typisch Ritz spectrum dat bekomen wordt wanneer we de veralgemeende additieve Schwarz preconditioneringsmatrix gebruiken, is afgebeeld in figuur 7.4. Kenmerkend voor dit spectrum is dat een groot aantal eigenwaarden in een groepje rond 1 liggen en dat een klein aantal ge¨ısoleerde eigenwaarden duidelijk verwijderd zijn van deze groep. De eigenvectoren die overeenkomen met de buitenste eigenwaarden van dit spectrum zijn de componenten die
NEDERLANDSE SAMENVATTING
xxiv
verwijderd moeten worden uit het initieel residu, omdat die de snelle convergentie belemmeren. Ter vergelijking tonen we in figuur 7.3 het Ritz spectrum van de domeindecompositiemethode voor hetzelfde probleem wanneer Dirichlet-Dirichlet koppeling gebruikt wordt. Dit spectrum vertoont geen duidelijke scheiding tussen een aantal aan de rand gelegen eigenwaarden en een groep eigenwaarden rond 1. Erger nog, de eigenwaarden liggen verspreid over het open interval (0; 2) en veel ervan liggen dicht tegen 0 of dicht tegen 2. Dit eigenwaardenspectrum verklaart de trage convergentie van de domeindecompositiemethode met Dirichlet-Dirichlet koppeling wanneer deze als oplossingstechniek gebruikt wordt. De spectraalradius van de matrix (I AM 1 ) is bijna 1 en deze spectraalradius is een bovengrens voor de convergentiefactor. De convergentiegeschiedenis van FGMRES wordt getoond in de figuren 7.5 en 7.6. In deze berekeningen gebruiken we telkens 6 Ritz vectoren, berekend uit de zoekruimte die gebruikt werd voor de oplossing van het eerste stelsel. Deze 6 Ritz vectoren resulteren niet in een reductie van de norm van het residu, maar zorgen er wel voor dat de convergentie sneller is in de latere iteraties.
8 Domeindecompositiemethoden voor niet-aansluitende roosters 8.1
Inleiding
We hebben een eindige-differentiemethode ontwikkeld met gewijzigde discretisatie aan de scheidingslijn die het gebruik van een lager dimensionale interpolatieoperator op de scheidingslijn mogelijk maakt. Het voordeel hiervan is dat de koppeling tussen de verschillende roosters minder rekenwerk vergt. De resultaten die hiermee behaald werden, tonen aan dat het mogelijk is een consistent en globaal tweede orde schema te bekomen op overlappende, niet-aansluitende roosters door gebruik te maken van een lokale koppeling met een lager dimensionale interpolatie die slechts 4 roosterpunten langs de scheidingslijn gebruikt. Dit is een verbetering ten opzichte van de klassieke theorie die een hoger dimensionale interpolatie vereist en dus uiteraard meer roosterpunten in de koppelingsvergelijkingen gebruikt.
8.2
Samengesteld-rooster Differentiemethode
We geven een korte beschrijving van een samengesteld-rooster differentiemethode (Composite Mesh Difference Method, CMDM) voor het oplossen van de tweede orde elliptische parti¨ele differentiaalvergelijking L u = f in Ω met een Dirichlet randvoorwaarde u = g op ∂Ω. Gegeven een domein Ω bestaande uit p niet¯ ¯ = [p Ω overlappende deeldomeinen Ωi zo dat Ω i=1 i , construeren we een rooster met roosterafstand hi in elk van de uitgebreide deeldomeinen Ω0i van Ωi . Als gevolg
xxv
van de uitbreidingen van de deeldomeinen overlappen deze roosters. We noteren Γi = ∂Ω0i \ ∂Ω. Veronderstelling 8.1 De afbrekingsfout αi (x) = (Lhi
kαi (x)k∞ Cα hri kukr i
i
L ) u(x) is orde ri : (16)
:
i +2;∞;Ωi
0
De constante Cαi is onafhankelijk van de roosterafstand hi en kukk;∞;Ωi is de Sobolev norm voor de ruimte W∞k (Ω0i ). 0
Veronderstelling 8.2 De interpolatie-operator Ii gebruikt alleen waarden van ¯ j en gebruikt geen waarden van roosterpunten in Ω0 . roosterpunten in [ j6=i Ω i Veronderstelling 8.3 De interpolatiefout βi (x) = (u
Iiu) (x) is orde si :
kβi(x)k∞ Cβ hsi kuks ∞ Ω i
i;
i
;
(17)
: Γci
We hebben de interpolatie-operator Ii alleen nodig in een gebied ΩΓci rond Γci . De interpolatieconstante σi = kIi k∞ is de norm van de interpolatiematrix. De grootste interpolatieconstante is σ = maxi σi . Voor lineaire interpolatie is σi = 1, terwijl voor kwadratische of cubische interpolatie geldt dat σi = 5=4 in 1D en σi = 25=16 in 2D. De globale discretisatie uh = (uh1 ; uh2 ; ; uh p ) op het samengestelde rooster wordt bekomen door het koppelen van de locale discretisaties door te eisen dat de oplossing op e´ e´ n rooster overeenkomt met de interpolatie van de oplossingen op de naburige roosters. Het resulterende stelsel vergelijkingen bestaat uit p deelproblemen van de vorm 8 < :
in Ω0i , op Γi , op ∂Ω0i n Γi .
Lhi uhi = fhi u hi u hi
= g hi = zhi = Ii uh
(18)
De deelproblemen (8.6) moeten voldoen aan de volgende veronderstellingen. Veronderstelling 8.4 De lokale eindige differentie discretisaties (8.6) zijn stabiel in de maximum norm, m.a.w. er bestaat een constante Ki onafhankelijk van hi zo dat
kuh k∞ Ω Ki k fh k∞ Ω + maxfkgh k∞ Γ kzh k∞ ∂Ω nΓ g i
;
0
i
i
;
0
;
i
i
i;
i
;
0
i
i
:
(19)
Veronderstelling 8.5 De discretisaties (8.6) voldoen aan een sterk discreet maximum principe, d.w.z. de oplossing uhi van (8.6) met fhi = 0 en ghi = 0 beperkt tot ¯ i voldoet aan Ω
kuh k∞ Ω¯ ρi kzh k∞ ∂Ω nΓ i
;
i
i
;
0
i
i
:
(20)
NEDERLANDSE SAMENVATTING
xxvi
De contractiefactor 0 ρi < 1 is een maat voor de reductie van de fout. Veronderstelling 8.6 Het product van de interpolatieconstante en de contractiefactor is kleiner dan 1 τ = maxi (ρi σ) < 1:
(21)
Globale Foutenanalyse Onder deze voorwaarden kan de stabiliteit van het samengestelde schema bewezen worden en verkrijgen we de volgende foutengrens. Stelling 8.1 De fout van de discrete oplossing voldoet aan p
∑ kehi k∞
i=1
1+
σ 1
τ
p
p
i=1
i=1
∑ Ki kαi k∞ + ∑ kβi k∞
! :
(22)
Schwarz Alternerende Methode Het resulterende stelsel kan opgelost worden door een iteratie waarbij de p deelproblemen (8.6) gelijktijdig en onafhankelijk van elkaar opgelost worden. De convergentiefactor van deze iteratie is begrensd door de contractiefactor τ. Stelling 8.2 De rij fuh
g convergeert naar de exacte discrete oplossing uh en n 0 (23) d (uh uh ) τn d (uh uh ) De afstand wordt gedefinieerd als d (wh vh ) = maxi fkwh vh k∞ Ω¯ g. (n)
( )
( )
;
;
:
;
i
i
;
0
i
Dit is een parallelle variant van de Schwarz alternerende methode. We gebruiken dit natuurlijk als een preconditioneringstechniek in een Krylov-deelruimtemethode. Consistente Interpolatie De volgende definitie van consistente interpolatie bevat een belangrijke voorwaarde voor de combinaties van discretisatie- en interpolatieformules. Definitie 8.1 Stel dat Ii de interpolatie-operator van ΩΓci naar Γci is, en dat L de differentiaaloperator is die benaderd wordt door een eindige differentie-operator Di (Lhi ; Ii ), die afhankelijk is van de gewone eindige differentie-operator Lhi en van Ii . We zeggen dat het discretisatie en interpolatie paar consistent op Ω0hi is, als voor alle x 2 Ω0hi .
(L
Di (Lhi ; Ii )) u(x) = O (hi )
(24)
Vergelijking (8.22) stelt dat de afbrekingsfout van het gecombineerde discretisatie en interpolatie paar, naar 0 moet gaan als de roosterafstand hi naar 0 gaat.
xxvii
Lokale Foutenanalyse We duiden een indexpaar aan met p = (i; j). JΩ is de verzameling van indexparen van roosterpunten in het domein Ω. We maken de volgende veronderstellingen. Veronderstelling 8.7 Voor alle p 2 JΩ : Lh is van de vorm: Lh u p = c p u p + ∑k ck uk met positieve co¨effici¨enten en de som over k wordt genomen over alle roosterpunten die verbonden zijn met p. Veronderstelling 8.8 Voor alle p 2 JΩ : c p ∑k ck . Veronderstelling 8.9 De verzameling JΩ is verbonden. Per definitie is een punt verbonden met alle punten die voorkomen in (8.15) met een van nul verschillende co¨effici¨ent. Per definitie is een verzameling verbonden als voor elke twee punten p en q in JΩ , er een rij punten p = p0 , p1 , : : : , pm+1 = q bestaat, zodat elk punt pi verbonden is met pi 1 en pi+1 , voor i = 1; 2; : : : ; m. Veronderstelling 8.10 Ten minste e´ e´ n vergelijking bevat een Dirichlet randvoorwaarde. Het maximum principe kan dan als volgt geformuleerd worden. Lemma 8.1 (Maximum Principe) Veronderstel dat Lh , JΩ en J∂Ω aan alle hogere vermelde veronderstellingen voldoen en dat er een roosterfunctie u p bestaat waarvoor geldt dat Lh u p 0 voor alle p 2 JΩ . Dan kan u p niet een niet-negatief maximum bereiken in een inwendig punt:
max p2JΩ u p max maxa2J∂Ω ua ; 0
:
(25)
Stelling 8.3 Veronderstel dat een niet-negatieve roosterfunctie Φ p gedefinieerd is op JΩ [ J∂Ω zodat Lh Φ p 1 voor alle p 2 JΩ en dat al de hoger vermelde veronderstellingen voldaan zijn. Onder deze voorwaarden is de benaderingsfout begrensd door
je p j maxa2J
∂Ω
Φa max p2JΩ jTp j
(26)
met Tp de lokale afbrekingsfout. Om tweede orde nauwkeurigheid in de L∞ norm te bewijzen, moeten we nagaan of de discretisatievergelijkingen en de koppelingsvergelijkingen voldoen aan de veronderstellingen voor het maximum principe.
NEDERLANDSE SAMENVATTING
xxviii
Vergelijking van een aantal Schemas De klassieke 5-punts differentiemolecule met bilineaire interpolatie Vermits zowel de klassieke 5-punts differentiemolecule als bilineaire interpolatie tweede orde nauwkeurig zijn, volgt uit (8.11) dat de resulterende CMDM ook tweede orde is. Dit schema voldoet niet aan de consistente interpolatie voorwaarde gedefinieerd in sectie 8.5. We beschrijven de inconsistentie. We kiezen een lokaal co¨ordinatenstelsel zodat (0; 0) de co¨ordinaten zijn van een punt naast de scheidingslijn en het punt met co¨ordinaten (h; 0) op de scheidingslijn ligt. Om een eindige differentie discretisatie te bekomen, moet de waarde in (h; 0) bekomen worden door interpolatie van waarden uit het andere rooster. De differentiemolecule S is van de vorm S = 4u(0; 0) + u(0; h) + u(0; h) + u( h; 0) + v:
(27)
De waarde v wordt berekend als een bilineaire interpolatie voor u(h; 0), d.w.z. v = (1 ξ)(1 η)u(h ξk; ηk) + (1 ξ)ηu(h ξk; (1 η)k) + ξ(1 η)u(h + ordi(1 ξ)k; ηk) + ξηu(h + (1 ξ)k; (1 η)k). Figuur 8.3 toont de lokale co¨ naten (ξ; η) die gebruikt worden in de interpolatieformule. Een reeksontwikkeling van (8.23) toont aan dat de discretisatie inconsistent is in de roosterpunten waar de bilineaire interpolatie gebruikt wordt: S h2
(uxx + uyy ) =
γ2k (ξ(1 2
ξ)uxx + η(1
η)uyy) + O (h):
(28)
Hierin is γk = k=h de verhouding van de roosterafstanden. Merk op dat dit schema alleen maar consistent is als ξ en η allebei 0 of 1 zijn, wat impliceert dat de roosters aansluiten op de scheidingslijn. Stelling 8.4 De klassieke 5-punts differentiemolecule met bilineaire interpolatie resulteert in een tweede orde schema in elk uitgebreid deeldomein Ω0i :
je pj CΦ max a
(
CI ; max p j 2J (ξ(1
ξ)uxx + η(1
γ2 η)uyy) k 2E˜ 2
)
h2
(29)
2 3 + CD h + O (h );
8 p 2 Ω0i , met CI = (Mxxxx + Myyyy )
=(48E1 ) een constante afhankelijk van de afgeleiden uxxxx en uyyyy . De constante CΦa is het maximum van de niet-negatieve functie Φ, gedefinieerd in (8.31), over de roosterpunten waar een Dirichlet randvoorwaarde gebruikt wordt voor het deeldomein Ω0i en CD 0 is de constante die voorkomt in de foutengrens voor de nauwkeurigheid van de Dirichlet randvoorwaarden op de inwendige randen ∂Ω0i n Γi . De verzameling J bevat alle punten waar de inconsistente discretisatie (8.25) gebruikt wordt. De constanten E1 en E˜2 zijn ondergrenzen voor Lhi Φ.
xxix
Tweede Orde Schema op een Gewijzigde Differentiemolecule Om een consistente discretisatie te bekomen, construeren we een tweede orde schema op de gewijzigde differentiemolecule afgebeeld in Fig. 8.5. De roosterafstand k h wordt zo gekozen dat het roosterpunt aan de rechterkant op een roosterlijn in het andere rooster komt te liggen. Het punt u( 2h; 0) is nodig om een tweede orde schema te bekomen met de discretisatieformule S = c0;0 u(0; 0) + c0; 1u(0; h) + c0;1u(0; h) + c 1;0 u(
h; 0) + c1;0u(k; 0) + c
2;0 u(
2h; 0):
(30)
De co¨effici¨enten zijn γk 1 ; γk + 2 2(2 γk ) ; c 1;0 = γk + 1 c0 ; 1 = 1 ; 3 ; c0;0 = 1 γk c0;1 = 1; 6 c1;0 = : γk (γk + 1)(γk + 2) c
2;0 =
(31) (32) (33) (34) (35) (36) (37)
De lokale discretisatiefout is S h2
(uxx + uyy ) =
h2 ((3γk 12
2)uxxxx + uyyyy ) + O (h3):
(38)
Stelling 8.7.1 toont aan dat tweede orde nauwkeurigheid bekomen wordt wanneer deze discretisatieformule gebruikt wordt voor de koppeling tussen de deeldomeinen en wanneer in elk deeldomein de klassieke 5-punts differentiemolecule gebruikt wordt.
Gewijzigde Differentiemolecule met Lineaire Interpolatie We bestuderen het effect van een lineaire interpolatie langs de y-as voor u(k; 0), d.w.z. v = (1 η)u(k; ηl ) + ηu(k; (1 η)l ) en tonen aan dat een consistente benadering bestaat. Kies de co¨effici¨enten in S = c0;0 u(0; 0) + c0; 1u(0; h) + c0;1u(0; h) + c
1;0u(
h ; 0 ) + cv v
(39)
NEDERLANDSE SAMENVATTING
xxx
als 1;0 =
c
c 0 ;0 = 2 c0 ;
= l =h.
η(1
(40)
η)γ2l (γk + 1)2 ; γk (γk + 1)
1 = c0;1 = 1
cv = met γl
2 ; γk + 1
η(1 η)γ2l ; γk (γk + 1)
2 ; γk (γk + 1)
(41) (42) (43)
Dit resulteert in S h2
(uxx + uyy ) = O (h):
(44)
Merk op dat de differentieformule gewijzigd moet worden om het effect van de lage orde interpolatie te compenseren. Als de bijkomende veronderstelling 8.8.1 geldt, toont stelling 8.8.1 tweede orde nauwkeurigheid aan wanneer deze discretisatieformule gebruikt wordt voor de koppeling tussen de deeldomeinen en wanneer in elk deeldomein de klassieke 5-punts differentiemolecule gebruikt wordt. Gewijzigde Differentiemolecule met Cubische Interpolatie De interpolatie langs de x-as kan vermeden worden door een gewijzigde molecule te gebruiken. We tonen nu aan de cubische interpolatie langs de y-as een tweede orde schema oplevert. Dit is equivalent met de constructie van een tweede orde differentieformule op de molecule afgebeeld in figuur 8.8. Stelling 8.5 Het is niet mogelijk een tweede orde nauwkeurige discretisatie, op de differentiemolecule afgebeeld in figuur 8.7, van (uxx + uyy ) in het punt (0; 0) te bekomen tenzij minstens 4 punten langs de lijn x = k gebruikt worden of e´ e´ n van deze punten een y-co¨ordinaat gelijk aan 0 heeft. Stelling 8.6 De unieke tweede orde nauwkeurige discretisatie, op de differentiemolecule afgebeeld in figuur 8.8, van (uxx + uyy ) in het punt (0; 0) waarbij maar 4 punten gebruikt worden langs de lijn x = k waarvan er geen enkel een y-co¨ordinaat gelijk aan 0 heeft, is het tweede orde schema (8.39) op de gewijzigde molecule met cubische interpolatie langs de lijn x = k voor het punt (k; 0). We zoeken de co¨effici¨enten in de formule S = c0;0 u(0; 0) + c0; 1u(0; h) + c0;1u(0; h) + c 1;0u( h; 0) + c 2;0u( 2h; 0) + c1; 1 u(k; (1 + η)l ) + c1;0u(k; ηl ) + c1;1 u(k; (1 η)l ) + c1;2u(k; (2 η)l ): (45)
xxxi
De gevonden co¨effici¨enten zijn γk 1 ; γk + 2 2 γk c 1;0 = 2 ; γk + 1 3 c 0 ;0 = 1 ; γk c0; 1 = c0;1 = 1; η(1 η)(2 η) ; c1 ; 1 = γk (γk + 1)(γk + 2) (1 η)(2 η)(η + 1) ; c 1 ;0 = 3 γk (γk + 1)(γk + 2) (2 η)η(η + 1) c 1 ;1 = 3 ; γk (γk + 1)(γk + 2) η(1 η)(η + 1) c 1 ;2 = : γk (γk + 1)(γk + 2) c
2;0 =
(46) (47) (48) (49) (50) (51) (52) (53)
Dit resulteert in een tweede orde nauwkeurige discretisatie S h2
2 (uxx + uyy ) = O (h ):
(54)
De lokale afbrekingsfout voor deze discretisatie is
α=
(3γk
12
2)
uxxxx +
1 12
(2
η)(1 η)η(η + 1)γ4l uyyyy h2 + O (h3): 4γk (γk + 1)(γk + 2) (55)
Het is duidelijk dat de co¨effici¨enten c1; 1 , c1;0 , c1;1 en c1;2 gelijk zijn aan het product van de co¨effici¨ent c1;0 in (8.39) en de cubische Lagrange interpolatieveeltermen. Numerieke Resultaten De numerieke resultaten tonen aan dat ons schema, gebaseerd op de gewijzigde discretisatiemolecule en e´ e´ ndimensionale cubische interpolatie, even nauwkeurig is als de klassieke aanpak met tweedimensionale bicubische interpolatie, waarbij 16 punten in de interpolatieformule gebruikt worden. Wij gebruiken maar 4 punten voor de interpolatie. De gevolgen van het gebruik van de klassieke 5-punts differentiemolecule met bilineaire interpolatie als inconsistente discretisatie (8.25) worden ook duidelijk aangetoond door de numerieke resultaten. Voor het bestudeerde testgeval is de dominante term in de foutengrens (8.27) gelijk aan e (ξ(1 ξ)c1 + c2 ) h2 , waarbij
NEDERLANDSE SAMENVATTING
xxxii
c1 en c2 constanten onafhankelijk van ξ en h zijn. Met behulp van deze uitdrukking, kunnen we de verhouding γe tussen 2 opeenvolgende foutennormen schatten. Het rooster wordt verfijnd door de roosterafstanden te delen door 2, m.a.w. hi+1 = hi =2. We vinden γe =
keΩ k∞ c1 (ξi (1 = keΩ + k∞ c1 (ξi 1 (1 0
hi
0
hi 1
+
ξi ) + γc ) h2i ξi+1) + γc ) h2i+1
=
ξi (1 ξi+1 (1
ξi) + γc 4 ξi+1) + γc
(56)
waarbij γc = c2 =c1 . Hiermee kunnen de resultaten voor bilineaire interpolatie in de tabellen 8.1, 8.8 en 9.1 verklaard worden. Wanneer de mortel-projectie als interpolatie-operator gebruikt wordt, bekomen we gelijkaardige resultaten, zoals te zien is in tabel 9.2. Voor een tweede orde schema met een constante co¨effici¨ent voor de h2 in de foutengrens is deze verhouding natuurlijk 4. Het schema is wel tweede orde (zie stelling 8.6.1) maar deze afhankelijkheid van de relatieve positie van de scheidingslijn in het andere rooster in ongewenst. Een tweede nadeel van de klassieke 5-punts differentiemolecule met bilineaire interpolatie is dat de nauwkeurigheid afhankelijk is van de grootte van de overlap. De overlap moet heel groot zijn (50% of meer) om resultaten te bekomen die even nauwkeurig zijn als met een consistente methode.
9 Mortel-projectie in Samengesteld-rooster Differentiemethode We hebben verschillende eindige elementen discretisaties bestudeerd. In dit geval wordt de andere klassieke methode om problemen op niet-aansluitende roosters op te lossen, nl. de mortel-element-methode (mortar element method) aangewend. Voor overlappende, niet-aansluitende roosters is de mortel-element-methode echter niet triviaal omdat er niet alleen een globale projectie langs de scheidingslijn nodig is, maar ook een aangepaste bilineaire vorm, die een weging doet van de bilineaire vormen van de verschillende deeldomeinen in dat deel van het domein waar de deeldomeinen overlappen. De theorie toont aan dat dit nodig is opdat de resulterende globale discretisatie consistent zou zijn. Dit bemoeilijkt uiteraard de implementatie omdat er nu een koppeling is in het overlappende deel van het domein en niet alleen op de rand. Vermits de mortel-element-methode theoretisch zeer goed gefundeerd is, hebben we een vergelijkende studie van beide methoden uitgevoerd en een combinatie van deze twee methoden geprobeerd. We hebben de mogelijkheid onderzocht om deze globale mortel-projectie langs de scheidingslijn te gebruiken als interpolatietechniek in de hoger vermelde eindige-differentiemethoden (de zgn. composite mesh difference methods). Dit is mogelijk omdat de mortel-projectie ook een tweede orde nauwkeurige interpolatie is en dus vergeleken kan worden met bili-
xxxiii
neaire interpolatie. De resultaten zijn uiteraard verschillend vermits een ander benaderingscriterium gebruikt wordt. Deze vergelijking heeft geleid tot een negatief resultaat voor de mortel-projectie. De numerieke resultaten vertonen veel gelijkenissen in de zin dat ook voor de mortel-projectie de foutengrens afhankelijk is van de relatieve positie van de scheidingslijn in het andere rooster (wat ongewenst is) en dat de nauwkeurigheid afhankelijk is van de grootte van het overlappende deel van het domein (wat nog veel meer ongewenst is). De verklaring hiervoor is te vinden in het feit dat een lineaire interpolatie (in het geval van P1 of Q1 elementen) gebruikt wordt in de richting normaal op de scheidingslijn. In het geval van mortel-projectie zit deze interpolatie verborgen in het berekenen van de meesterzijde (master side) van de projectie. Dit betekent dat voor overlappende, nietaansluitende roosters de mortel-element-methode alleen maar bruikbaar is indien de gewogen bilineaire vorm gebruikt wordt om een consistente globale discretisatie te bekomen.
10 Koppeling door Eindige Differentie Discretisatie We gaan na of de parti¨ele differentiaalvergelijking gediscretiseerd kan worden op de differentiemolecule gevormd door de punten u1 , u2 , u3 , u4 en v1 , afgebeeld in figuur 10.1. De punten u1 , u2 , u3 , u4 zijn de 4 hoekpunten van het vierkant waar het punt v1 in ligt. We defini¨eren de locale co¨ordinaten α = (xv1 xu1 )=(xu2 xu1 ) en β = (yv1 yu1 )=(yu3 yu1 ) en zoeken de co¨effici¨enten γ0 , γ1 , γ2 , γ3 en γ4 in L(α; β)
= +
γ0 u (0; 0) + γ1 u ((1 α)h; (1 β)h) + γ2 u ( αh; (1 γ3 u ( αh; βh) + γ4 u ((1 α)h; βh)
β)h) (57)
zodat een consistente discretisatie van (uxx + uyy )h2 =2 in v1 = u(0; 0) bekomen wordt. Hiervoor gebruiken we de Taylor ontwikkeling van u(x; y) rond de oorsprong: u(x; y) = u + ux x + uy y + uxx x2 =2 + uxy xy + uyy y2 =2 + O (h3). We veronderstellen dat jxj h en jyj h zodat de restterm afgeschat kan worden door Ch3 . De eis dat de co¨effici¨enten van u, ux , uy en uxy nul zijn samen met de vereiste dat de co¨effici¨enten van uxx h2 =2 en uyy h2 =2 gelijk zijn aan 1 in de Taylor ontwikkeling van (10.2), levert een overgedetermineerd stelsel Cg = c op: 0 B B B B B B
1 0 0 0 0 0
1
β) α) (1 β)2 (1 α)2 (1 α)(1 β) (1
(1
1
β) α (1 β)2 α2 α(1 β) (1
1 β α β2 α2 αβ
1 β (1 α) β2 (1 α)2 (1 α)β
1
0
C CB CB CB CB C A
γ0 γ1 γ2 γ3 γ4
1
0
B C B C B C=B C B A B
0 0 0 1 1 0
1 C C C C: C C A
(58)
NEDERLANDSE SAMENVATTING
xxxiv
Dit overgedetermineerd stelsel Cg = c kan alleen maar oplossingen hebben indien de determinant van de uitgebreide matrix C˜ = (C j c) nul is: det C˜ = det (C j c) = (β
α)(α + β
1):
(59)
Een oplossing bestaat dus alleen maar als α = β of α + β = 1, d.w.z. als het punt v1 op een van de diagonalen van het vierkant gevormd door u1 , u2 , u3 en u4 ligt. De oplossing is g=
2
(1
α)=α
α=(1
α) α=(1
α) 1
(1
α)=α 1
T
(60)
als α = β en als α + β = 1 is de oplossing g=
2
(1
α)=α
α=(1
α) 1
(1
α)=α 1
α=(1
α)
T
:
(61)
De lokale afbrekingsfout wordt bepaald door deze oplossing in te vullen in (10.2): L(α; β) = h2 (uxx + uyy ) =2 + Cαh3 (uxxx + uyyy ) =6 + O (h4);
(62)
met Cα = 1 2α als α = β en Cα = 2α 1 als α + β = 1. Dit is een eerste orde O (h) discretisatie van uxx + uyy. Een tweede orde O (h2 ) discretisatie wordt alleen maar bekomen als α = β = 1=2: L(α; β) = h2 (uxx + uyy ) =2 + h4 (uxxxx + 6uxxyy + uyyyy ) =96 + O (h6):
(63)
Stelling 10.3.1 toont de tweede orde nauwkeurigheid aan wanneer deze discretisatieformule gebruikt wordt voor de koppeling tussen de deeldomeinen. Omwille van de beperkte toepasbaarheid is deze eindige differentie discretisatie koppelingstechniek niet aanvaardbaar.
11 Conclusies en suggesties voor toekomstig onderzoek 11.1 Conclusies We zijn duidelijk geslaagd in het verbeteren van de convergentie van de iteratieve oplossingstechniek, althans toch voor de toepassing die in deze thesis beschouwd wordt, nl. het oplossen van de Ondiep-watervergelijkingen. De resultaten behaald met de iteratieve methode gebaseerd op FGMRES en de Veralgemeende Additieve Schwarz Methode zijn zeer goed. De numerieke methoden voorgesteld in deze thesis voor deze toepassing zijn ge¨ıntegreerd in de software die door het Waterloopkundig Laboratorium Delft Hydraulics (Delft, Nederland) ontwikkeld wordt voor het oplossen van de Ondiep-watervergelijkingen op realistische toepassingen.
xxxv
Het onderzoek naar Ritz en Harmonische Ritz waarden in hoofdstuk 4, en naar geneste Krylov-methoden in hoofdstuk 5, heeft geleid tot nieuwe inzichten, zoals verbanden in Simpler GMRES en een betere keuze voor de startvector in geneste Krylov-methoden. Maar deze nieuwe inzichten kunnen de convergentieeigenschappen van de algoritmes alleen maar verbeteren wanneer stagnatie optreedt. Voor het probleem dat beschouwd wordt in het tweede deel van deze thesis, de uitbreiding van de methode naar niet-aansluitende roosters, hebben we, althans toch voor 2D elliptische problemen, een aanvaardbare oplossing voorgesteld. Onze lager dimensionale interpolatieformules en gewijzigde discretisatieschema’s resulteren in een consistente en tweede orde nauwkeurige globale discretisatie. We hebben ook een aantal alternatieve technieken onderzocht. De resultaten in hoofdstuk 9 tonen aan dat het gebruik van de mortel-projectie in een samengesteldrooster differentiemethode benaderingen oplevert die vergelijkbaar zijn met bilineaire interpolatie. Maar vermits de mortel-projectie moeilijker te implementeren is, wordt deze keuze dus niet weerhouden. De eindige differentie discretisatie koppelingstechniek in hoofdstuk 10 is niet aanvaardbaar omwille van de beperkte toepasbaarheid.
11.2 Suggesties voor toekomstig onderzoek Vermits er geen beste Krylov-methode kan zijn, zal het verdere onderzoek in dit domein vooral over preconditioneren moeten gaan. Verschillende succesvolle technieken zijn reeds beschikbaar, o.a. domeindecompositie, meerroostermethoden, opstellen van een ijle benaderende inverse operator, onvolledige factorisaties, enz. In het ideale schema voor het oplossen van parti¨ele differentiaalvergelijkingen op niet-aansluitende roosters, moet de nauwkeurigheid van optimale orde zijn en onafhankelijk van de grootte van de overlap en van de verhouding van de verschillende roosterafstanden. Om het gebruik van snelle oplossingstechnieken voor de deelproblemen mogelijk te maken, is het wenselijk geen gewichten te hebben in de discretisatie op de overlappende delen van de deeldomeinen. Er dient nog veel onderzoek te gebeuren in dit domein. We vermelden een aantal onderzoeksonderwerpen. Het zou zeer interessant zijn om onze lager dimensionale interpolatieformules en gewijzigde discretisatieschema’s uit te breiden voor roosters die geen evenwijdige roosterlijnen hebben, en voor roosters die door een elliptische roostergenerator berekend zijn. Voor 3D toepassingen zal deze techniek zeer voordelig zijn, vermits 3D cubische interpolatie 64 punten vereist, terwijl voor 2D cubische interpolatie slechts 16 punten nodig zijn. De vraag blijft natuurlijk of we de discretisatieformule zo kunnen wijzigen dat een 1D interpolatie voldoende is voor een 3D probleem. Onze experimenten in hoofdstuk 9 tonen het effect van het gebruik van de mortel-projectie in een samengesteld-rooster differentiemethode. We stellen nog
xxxvi
NEDERLANDSE SAMENVATTING
een paar andere experimenten voor om meer inzicht te krijgen in de mortel-elementen-methode, nl. door na te gaan of bilineaire interpolatie gebruikt kan worden wanneer de gewogen bilineaire vorm gebruikt wordt.
Contents Preface
vii
Acknowledgement
ix
Nederlandse Samenvatting
xi
Contents
xxxvii
Notations and Abbreviations
xliii
List of Figures
xlv
List of Tables
xlvii
1 Introduction 1.1 Context and General Motivation . . . . . . . . . . . . . . . . . . 1.2 Overview and Contribution of the Thesis . . . . . . . . . . . . . . 2 Krylov Subspace Methods 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Overview of Krylov subspace methods . . . . . . . . . . . 2.2.1 Conjugate Gradient (CG) . . . . . . . . . . . . . . 2.2.2 MINRES and SYMMLQ . . . . . . . . . . . . . . 2.2.3 CG on the Normal Equations . . . . . . . . . . . . 2.2.4 GMRES and FOM . . . . . . . . . . . . . . . . . 2.2.5 Flexible GMRES (FGMRES) and FOM (FFOM) . 2.2.6 Generalised Conjugate Residual (GCR) . . . . . . 2.2.7 Short Recurrence Methods . . . . . . . . . . . . . BiConjugate Gradient (BiCG) . . . . . . . . . . . Quasi Minimal Residual (QMR) . . . . . . . . . . Conjugate Gradient Squared (CGS) . . . . . . . . Transpose Free Quasi Minimal Residual (TFQMR) xxxvii
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
1 1 2 5 5 6 6 7 8 8 10 11 11 12 12 12 13
CONTENTS
xxxviii
2.3 2.4 3
4
BiConjugate Gradient Stabilized (BiCGStab) . . . . . . . Optimal Krylov Subspace Methods . . . . . . . . . . . . . . . . . Convergence Properties . . . . . . . . . . . . . . . . . . . . . . .
13 14 14
Domain Decomposition Methods 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 One Level Algorithms . . . . . . . . . . . . . . . . . . . . . 3.2.1 Alternating Schwarz Method: Continuous Formulation 3.2.2 Alternating Schwarz Method: Discrete Formulation . . 3.2.3 Schwarz Method for Matching Grids . . . . . . . . . 3.2.4 Symmetric Multiplicative Schwarz Method . . . . . . 3.2.5 Additive Schwarz Method . . . . . . . . . . . . . . . 3.2.6 Many Subdomains . . . . . . . . . . . . . . . . . . . 3.2.7 Convergence Behaviour . . . . . . . . . . . . . . . . 3.3 Two Level Algorithms . . . . . . . . . . . . . . . . . . . . . 3.3.1 Subdomain solves are not sufficient . . . . . . . . . . 3.3.2 Basic Two Level Method . . . . . . . . . . . . . . . . 3.3.3 Two Level Schwarz Methods . . . . . . . . . . . . . . Two level additive Schwarz preconditioner . . . . . . Two level multiplicative Schwarz preconditioners . . . Two level hybrid overlapping Schwarz preconditioner 3.3.4 Convergence Behaviour . . . . . . . . . . . . . . . . 3.4 Multilevel Algorithms . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Additive Multilevel Schwarz Methods . . . . . . . . . Multilevel Diagonal Scaling . . . . . . . . . . . . . . BPX Method . . . . . . . . . . . . . . . . . . . . . . Hierarchical Basis Method . . . . . . . . . . . . . . . 3.4.2 Multiplicative Multilevel Schwarz Methods . . . . . . Multiplicative between Levels, Additive in Levels . . . Multiplicative Multilevel Diagonal Scaling . . . . . . Multiplicative between Levels, Multiplicative in Levels 3.4.3 Multigrid Methods . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
17 17 17 18 19 20 23 23 24 24 25 25 26 27 28 28 28 29 29 29 31 31 31 32 32 32 32 32
Ritz and Harmonic Ritz Values 4.1 Introduction . . . . . . . . . . . . . . . . . . . . 4.2 Ritz Values and FOM Residual Polynomial . . . 4.3 Simpler GMRES and the Transformation Matrix . 4.4 Harmonic Ritz Values . . . . . . . . . . . . . . . 4.5 GMRES Residual Polynomial . . . . . . . . . . 4.6 Example 1: Stagnation of GMRES . . . . . . . . 4.7 Example 2: A Convection-Diffusion Problem . . 4.8 Conclusion . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
35 36 37 38 39 40 43 45 47
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
xxxix
5 Nested Krylov Subspace Methods 5.1 Introduction . . . . . . . . . . . . . . . . . 5.2 Nested Iterations . . . . . . . . . . . . . . 5.3 Extended GMRES . . . . . . . . . . . . . 5.4 Breakdown and Stagnation . . . . . . . . . 5.4.1 Types of breakdown . . . . . . . . 5.4.2 LSQR switch . . . . . . . . . . . . 5.4.3 Transpose-free cure for breakdown Description . . . . . . . . . . . . . Example: Stagnation of GMRES . . 5.5 Numerical Results . . . . . . . . . . . . . . 5.6 Conclusion . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
51 51 52 53 55 55 56 56 56 56 57 60
6 Shallow Water Equations Solver 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 6.2 Shallow Water Equations . . . . . . . . . . . . . . . . 6.3 Alternating Operator Implicit Method . . . . . . . . . 6.4 Alternating Direction Implicit . . . . . . . . . . . . . 6.4.1 Basic Iterative Method . . . . . . . . . . . . . 6.4.2 Relaxation Parameters in D ELFT 3 D - FLOW . . 6.4.3 ADI Solver for Water Elevation Equation . . . 6.5 Preconditioning . . . . . . . . . . . . . . . . . . . . . 6.5.1 Jacobi Preconditioning: Diagonal Row Scaling 6.5.2 One Step ADI Preconditioning . . . . . . . . . 6.5.3 Cycle ADI Preconditioning . . . . . . . . . . . 6.6 ADI preconditioning in D ELFT 3 D - FLOW . . . . . . . 6.7 The Benqu´e test case . . . . . . . . . . . . . . . . . . 6.8 The Clyde river model . . . . . . . . . . . . . . . . . 6.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
61 61 62 63 64 64 65 66 66 66 66 67 67 68 69 69
7 Shallow Water Equations Solver 7.1 Introduction . . . . . . . . . . . . . . 7.2 Generalised Additive Schwarz Method 7.3 Krylov Convergence Acceleration . . 7.4 Reuse strategies . . . . . . . . . . . . 7.5 Rectangular basin . . . . . . . . . . . 7.6 Conclusion . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
75 75 76 79 80 82 85
8 Composite Mesh Difference Methods 8.1 Introduction . . . . . . . . . . . . . 8.2 Composite Mesh Difference Method 8.3 A Parallel Schwarz Iterative Method 8.4 Subdomain Error Analysis . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
87 87 88 91 91
. . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
CONTENTS
xl
8.5 8.6
Consistence of Grid Interpolations . . . . . . . Five-point Stencil with Bilinear Interpolation . 8.6.1 Inconsistency in Discretisation . . . . . 8.6.2 Subdomain Error Analysis . . . . . . . 8.7 Second Order Scheme on a Modified Stencil . . 8.7.1 Discretisation Stencil . . . . . . . . . . 8.7.2 Subdomain Error Analysis . . . . . . . 8.8 Modified Stencil with Linear Interpolation . . . 8.8.1 Discretisation Stencil . . . . . . . . . . 8.8.2 Subdomain Error Analysis . . . . . . . 8.9 Modified Stencil with Cubic Interpolation . . . 8.10 Numerical results . . . . . . . . . . . . . . . . 8.10.1 Smooth function . . . . . . . . . . . . Standard stencil with 2D interpolation . Modified stencil with 1D interpolation . Effect of overlap . . . . . . . . . . . . 8.10.2 Function with a peak . . . . . . . . . . Standard stencil with 2D interpolation . Modified stencil with 1D interpolation . Effect of overlap . . . . . . . . . . . . 8.10.3 Quadratic Function . . . . . . . . . . . 8.11 Conclusion . . . . . . . . . . . . . . . . . . . 9
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
93 94 94 95 98 98 100 101 101 103 104 106 106 107 109 112 115 115 115 115 119 120
Mortar Projection in CMDM 9.1 Introduction . . . . . . . . . . . . . . . . . . . . 9.2 2D Linear Finite Elements . . . . . . . . . . . . 9.2.1 Triangular P1 Linear Finite Element . . . 9.2.2 Quadrilateral Q1 Bilinear Finite Element 9.3 Overlapping Nonmatching Grid Mortar Method . 9.4 Mortar Projection in CMDM . . . . . . . . . . . 9.4.1 Maximum Principle . . . . . . . . . . . 9.4.2 Inconsistent Discretisation . . . . . . . . 9.4.3 Computation of Master Side . . . . . . . 9.4.4 Size of the overlap . . . . . . . . . . . . 9.5 Numerical results . . . . . . . . . . . . . . . . . 9.5.1 Quadratic Function . . . . . . . . . . . . Inconsistent Schemes . . . . . . . . . . . Second Order Consistent Scheme . . . . Effect of Overlap . . . . . . . . . . . . . 9.6 Conclusion . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
121 121 122 122 124 126 127 127 128 129 130 131 131 131 132 132 133
xli
10 Finite Difference Coupling 10.1 Introduction . . . . . . . . 10.2 Finite Difference Coupling 10.3 Error Analysis . . . . . . . 10.4 Numerical Results . . . . . 10.5 Conclusion . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
135 135 135 137 138 141
11 Conclusions and Suggestions for Future Research 143 11.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 11.2 Suggestions for Future Research . . . . . . . . . . . . . . . . . . 144 A Ritz Values and FOM Residual Polynomial 145 A.1 Proof of Lemma 4.2.1 . . . . . . . . . . . . . . . . . . . . . . . . 145 A.2 Proof of Lemma 4.2.2 . . . . . . . . . . . . . . . . . . . . . . . . 146 A.3 Proof of Theorem 4.2.1 . . . . . . . . . . . . . . . . . . . . . . . 146 B Convection–Diffusion Problem 149 B.1 Formulation and Discretisation . . . . . . . . . . . . . . . . . . . 149 B.2 Testcase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 B.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Bibliography
155
Index
167
Notations and Abbreviations Never express yourself more clearly than you are able to think.4
List of Symbols O
P R
K m (A; r0 ) W pk (Ω)
: : : : :
big “O” order symbol polynomials real numbers Krylov subspace Sobolev space
List of Abbreviations ADI AOI BVP Bi-CGStab BiCG CG CGNE CGNR CGS CMDM FFOM FGMRES 4 Niels
: : : : : : : : : : : :
Alternating Direction Implicit Alternating Operator Implicit Boundary Value Problem Biconjugate Gradient Stabilised Biconjugate Gradient Conjugate Gradient Conjugate Gradient on the Normal Equations Conjugate Gradient on the Normal Equations Conjugate Gradient Squared Composite Mesh Difference Method Flexible FOM Flexible GMRES
Bohr
xliii
NOTATIONS AND ABBREVIATIONS
xliv
FOM GCR GMRES GS ILU KSM MINRES PCG PDE QMR RBGS SGS SOR SSOR SWE SYMMLQ TFQMR
: : : : : : : : : : : : : : : : :
Full Orthogonalisation Method Generalized Conjugate Residual Generalised Minimal Residual Gauss–Seidel Incomplete LU factorisation Krylov Subspace Method Minimal Residual Preconditioned Conjugate Gradient Partial Differential Equation Quasi-Minimal Residual Red–Black Gauss–Seidel Symmetric Gauss–Seidel Successive Over-Relaxation Symmetric Successive Over-Relaxation Shallow Water Equations Symmetric LQ Transpose-Free Quasi-Minimal Residual
List of Figures 3.1 4.1 4.2 4.3 4.4 4.5 4.6
Overlapping subdomains in the Alternating Schwarz Method . . . Ritz () and Harmonic Ritz (+) values from K 10 (A; r0 ) for example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ritz () and Harmonic Ritz (+) values from K 19 (A; r0 ) for example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norm of the residual jjrGMRES jj2 as a function of m for example 2. Ritz () and Harmonic Ritz (+) values from K 130 (A; r0 ) for example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ritz () and Harmonic Ritz (+) values from K 41 (A; r0 ) for example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ritz () and Harmonic Ritz (+) values from K 80 (A; r0 ) for example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
43 44 46 47 48 49
Norm of the residual jjrm jj2 as a function of the dimension m of the Krylov subspace for FG and FEG compared to full GMRES. .
58
6.1 6.2 6.3
Benqu´e model: grid and depth. . . . . . . . . . . . . . . . . . . . Clyde model: depth. . . . . . . . . . . . . . . . . . . . . . . . . . Clyde model: grid. . . . . . . . . . . . . . . . . . . . . . . . . .
72 73 74
7.1 7.2 7.3 7.4 7.5 7.6
Grid before partitioning . . . . . . . . . . . . . . . . . . . . . Grid after partitioning . . . . . . . . . . . . . . . . . . . . . . Spectrum of Hm : Dirichlet-Dirichlet Coupling . . . . . . . . . Spectrum of Hm : Locally Optimised Coupling . . . . . . . . . Convergence histories of the preconditioned FGMRES method Convergence histories of the preconditioned FGMRES method
. . . . . .
76 77 80 81 82 83
8.1 8.2 8.3
The standard five-point stencil . . . . . . . . . . . . . . . . The standard five-point stencil with bilinear interpolation . . A composite mesh with overlapping and nonmatching grids. scaled local coordinates (ξ; η) are used in the interpolation. .
. . . . . . The . . .
94 95
5.1
xlv
. . . . . .
96
xlvi
8.4
8.5 8.6 8.7 8.8 8.9 8.10
8.11 8.12 8.13 9.1 9.2 9.3 9.4 9.5 9.6 9.7
LIST OF FIGURES The mesh width k h is selected so that a grid line in the other mesh is matched. The mesh on Ω1 has γk = k=h = 1:2, while for the mesh on Ω2 this factor is 2. . . . . . . . . . . . . . . . . . . . Modified stencil for the second order scheme. . . . . . . . . . . . Stencil for the first order scheme with 1D linear interpolation. . . Stencil for the scheme with only 3 points for the 1D interpolation. Stencil for the second order scheme with 1D cubic interpolation. . The solution us on Ω01 and Ω02 . . . . . . . . . . . . . . . . . . . . The grids on the overlapping subdomains Ω01 = [0; 1:4℄ [0; 1℄ with h1 = 0:2 (points marked ) and Ω02 = [0:75; 2℄ [0; 1℄ with h2 = 0:25 (points marked +). . . . . . . . . . . . . . . . . . . . . . . . The solution u p on Ω01 . . . . . . . . . . . . . . . . . . . . . . . . The solution u p on Ω02 . . . . . . . . . . . . . . . . . . . . . . . . The solution uq on Ω01 and Ω02 . . . . . . . . . . . . . . . . . . . . Uniform grid and P1 finite element mesh. . . . . . . . . . . . . . Uniform grid and Q1 finite element mesh. . . . . . . . . . . . . . A mortar projection interface test function. . . . . . . . . . . . . . The 3 types of basisfunctions for interface test function space. . . The mortar projection does not satisfy the maximum principle. . . The mortar master function ϕ1 on the interface γ1 is constructed from the P1 finite element representation on Ω2 . . . . . . . . . . . The mortar master function ϕ1 on the interface γ1 is constructed from the Q1 finite element representation on Ω2 . . . . . . . . . .
99 99 102 105 107 109
110 114 114 120 122 124 126 127 128 129 130
10.1 Stencil for interpolation and nonmatching grid. . . . . . . . . . . 136 B.1 Solution of the convection–diffusion problem described in Sec. B.2. 152 B.2 GMRES convergence history for the solution of the convection– diffusion problem described in Sec. B.2. . . . . . . . . . . . . . . 153
List of Tables 3.1
Ordering of subspaces in several multilevel methods . . . . . . . .
33
4.1
Estimated and computed norm of the Ritz and Harmonic Ritz values from K m (A; r0 ) for example 1. . . . . . . . . . . . . . . . . .
45
5.1 5.2
6.1 6.2 6.3
7.1
7.2
8.1
8.2
8.3
Performance Results for GMRES, FG and FEG . . . . . . . . . . The number of outer iterations mo and the dimension k of the subspace used as a function of the dimension i in the inner iteration. . The ADI-preconditioned FGMRES algorithm. . . . . . . . . . . . CPU times of D ELFT 3 D - FLOW AOI(ADI) and AOI(FGMRESADI) for the Benqu´e test case. . . . . . . . . . . . . . . . . . . . CPU times of D ELFT 3 D - FLOW AOI(ADI) and AOI(FGMRESADI) for the Clyde model. . . . . . . . . . . . . . . . . . . . . . Number of preconditioning steps needed to solve the second linear system when the reuse of vectors in the subspace is done by truncation, assembling (rank k) or assembling of preconditioned Ritz vectors (k outliers) for the rectangular basin partitioned in 4 strips. CPU time for the test problem, with and without assembling and with different tolerances for inner systems. . . . . . . . . . . . . .
57 59 70 71 71
84 85
Norm of the error in the overlapping domains Ω01 and Ω02 for the smooth function us using the standard stencil with bilinear interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Norm of the error in the overlapping domains Ω01 and Ω02 for the smooth function us using the standard stencil with bicubic interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Norm of the error in the overlapping domains Ω01 and Ω02 for the smooth function us using the modified stencil with 1D linear interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 xlvii
xlviii
8.4
8.5
8.6
8.7 8.8
8.9
8.10
8.11
8.12
8.13
8.14
9.1 9.2 9.3
LIST OF TABLES Norm of the error in the overlapping domains Ω01 and Ω02 for the smooth function us using the modified stencil with 1D cubic interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The effect of overlap on the norm of the error in the nonoverlapping domains Ω1 and Ω2 for the smooth function us using the standard stencil with 2D interpolation. . . . . . . . . . . . . . . . . . The effect of overlap on the norm of the error in the nonoverlapping domains Ω1 and Ω2 for the smooth function us using the modified stencils with 1D linear interpolation. . . . . . . . . . . . . . The effect of overlap on the convergence rate of the Schwarz method for the smooth function us . . . . . . . . . . . . . . . . . . . . . . Norm of the error in the overlapping domains Ω01 and Ω02 for the function with a peak u p using the standard stencil with bilinear interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norm of the error in the overlapping domains Ω01 and Ω02 for the function with a peak u p using the standard stencil with bicubic interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norm of the error in the overlapping domains Ω01 and Ω02 for the function with a peak u p using the modified stencil with 1D linear interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norm of the error in the overlapping domains Ω01 and Ω02 for the function with a peak u p using the modified stencil with 1D cubic interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . The effect of overlap on the norm of the error in the nonoverlapping domains Ω1 and Ω2 for the function with a peak u p using the standard stencil with 2D interpolation. . . . . . . . . . . . . . . . The effect of overlap on the norm of the error in the nonoverlapping domains Ω1 and Ω2 for the function with a peak u p using the modified stencils with 1D interpolation. . . . . . . . . . . . . . . The effect of overlap on the convergence rate of the Schwarz method for the function with a peak u p . . . . . . . . . . . . . . . . . . . .
111
113
113 113
116
116
117
117
118
118 119
Effect of inconsistent discretisation: results for P1 stencil with bilinear interpolation. . . . . . . . . . . . . . . . . . . . . . . . . . 131 Effect of inconsistent discretisation: results for P1 stencil with mortar projection. . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Effect of overlap on the convergence rate of the Schwarz method and on the accuracy for the standard P1 stencil with mortar projection. The same results are obtained with bilinear interpolation. . . 133
10.1 Norm of the error in the domains Ω0 and Ω1 for the function u1 . . 139 10.2 Norm of the error in the domains Ω0 and Ω1 for the function u2 . . 140
xlix
10.3 Norm of the error in the domains Ω0 and Ω1 for the function u2 , when bilinear interpolation is used. . . . . . . . . . . . . . . . . . 141
Chapter 1
Introduction This report, by its very length, defends itself against the risk of being read.1
1.1 Context and General Motivation The numerical solution of partial differential equations is an important research topic in the field of scientific computing and numerical simulation. After discretisation, very large systems of linear equations have to be solved. In domain decomposition methods, the domain is split into a number of subdomains and the resulting subproblems are coupled via “artifical” boundary conditions. Krylov subspace methods are iterative methods to solve systems of linear algebraic equations. They are easy to implement but require a good preconditioner. In this thesis we develop efficient numerical solvers for partial differential equations, based on the combination of Krylov subspace methods, such as Flexible GMRES, with domain decomposition preconditioning and to extend the applicability of the developed technique to (overlapping) nonmatching grids. 1. The first part of the thesis deals with improving the convergence of the iterative solver. (a) The domain decomposition method can be optimised by determining an optimal coupling between the subdomains. This is done by imposing boundary conditions for the subdomain problems. 1 Winston
Churchill
1
2
CHAPTER 1. INTRODUCTION
(b) Ritz and Harmonic Ritz values and the corresponding vectors are studied to understand the convergence behaviour of GMRES. (c) We propose an optimised nested Krylov method, which is an attractive way to extract a near-optimal approximation from a high dimensional Krylov subspace while keeping memory and computational requirements reasonably low. 2. The second part of the thesis is devoted to the extension of the adopted domain decomposition method to nonmatching grids and focuses on discretisation techniques and error analysis. (a) In the case of (overlapping) nonmatching grids information transfer from one grid to the other grid is not trivial since there is no global discretisation from which this information transfer can be derived. We focus on interpolation formulae and modified discretisation stencils to construct a consistent and second order accurate global discretisation. (b) Instead of an interpolation formula, the mortar projection can be used. (c) As an alternative to interpolation, a coupling technique based on a finite difference discretisation can be considered.
1.2 Overview and Contribution of the Thesis In Chapter 2 we give a short overview of Krylov subspace methods since many popular iterative methods for the solution of linear systems and the computation of (selected) eigenvalues make use of Krylov subspaces. Some domain decomposition methods are briefly discussed in Chapter 3. Only Schwarz methods are considered here, because we are not using substructuring (Schur) methods. All the prerequisites to understand our Generalised Additive Schwarz Method can be found here. In Chapter 4, we prove that the Ritz and Harmonic Ritz values are the zeroes of the residual polynomials of FOM and GMRES respectively and show that the Walker–Zhou interpretation of GMRES enables us to formulate the relation between the Harmonic Ritz values and GMRES in the same way as the relation between the Ritz values and FOM. We present an upper bound for the norm of the difference between the matrices from which the Ritz and Harmonic Ritz values are computed. The differences between the Ritz and Harmonic Ritz values enable us to describe breakdown of FOM and stagnation of GMRES. These results have been published in [GR99]. In Chapter 5 we present a nested Krylov method, which provides an attractive way to extract a near-optimal approximation from a high dimensional Krylov subspace while keeping memory and computational requirements reasonably low. It is comparable to the GCRO [dS96a] method, but differs from it since we use
1.2. OVERVIEW AND CONTRIBUTION OF THE THESIS
3
another vector to start the inner iteration. This results in a transpose-free cure for breakdown. This method was presented in [Goo98]. A solver for the Shallow Water Equations based on FGMRES and the Generalised Additive Schwarz Method is discussed in Chapters 6 and 7. The numerical methods introduced in these two chapters have been integrated in the software developed by Waterloopkundig Laboratorium Delft Hydraulics (Delft, The Netherlands) for the solution of the Shallow Water Equations on real-life applications. In the first chapter on the Shallow Water Equations (Chapter 6) we show how the convergence of the original ADI iterative method is accelerated with GMRES. The application of a domain decomposition preconditioner in combination with FGMRES to solve the equation for the water elevation is described in the second chapter on the Shallow Water Equations (Chapter 7). We show that preconditioning GMRES with our Generalised Additive Schwarz Method leads to a clustered eigenvalue spectrum with only a few outlying eigenvalues and that introduction of Ritz vectors corresponding to the outlying eigenvalues in the subsequent FGMRES iterations speeds up the solution process. These results have been published in [GTR98a] and [GTR98b]. Chapter 8 is the first of three chapters devoted to nonmatching grids methods. In this chapter interpolation formulae and modified discretisation stencils are considered to construct a consistent and second order accurate global discretisation. We propose a modified Composite Mesh Difference Method (CMDM) in which a lower dimensional interpolation can be used along the interface of the nonmatching grids. The advantage of this approach is that fewer interpolation points are needed while the same order of global accuracy is preserved. We have published this method in [GC99]. In Chapter 9 we study experimentally the effect of the mortar projection in an overlapping composite mesh difference method for two-dimensional elliptic problems. This is motivated by the fact that the overlapping mortar element method has several desirable properties: the discretisation is consistent, the accuracy is of optimal order and the error is independent of the size of the overlap, and also independent of the ratio of the mesh sizes. However, a major disadvantage of the method is that it needs weights in the bilinear form. Our experiments reveal the effect of dropping the weights in the discretisation. These results will be published in [GC00]. Finally we consider in Chapter 10 a finite difference method on overlapping nonmatching grids for a two-dimensional Poisson problem. Instead of using an interpolation formula to transfer information from one grid to the other, i.e. to compute Dirichlet boundary conditions on the interior boundary, we propose a coupling technique based on a finite difference discretisation. The idea is to discretise the given PDE on the modified stencil consisting of the point on the interior boundary and its neighbours in the other mesh. This method is published in [GCR98].
Chapter 2
Krylov Subspace Methods She says we don’t really understand it, but there are many things we don’t understand, and we just have to do the best we can with the knowledge we have.1
In this chapter we give a short overview of Krylov subspace methods. Krylov subspace methods are one of the two ingredients on which the work presented in this thesis builds on.
2.1 Introduction Many iterative methods for the solution of linear systems and the computation of (selected) eigenvalues make use of Krylov subspaces. The iterative methods we consider for the solution of the large sparse system of linear equations Ax = b
(2.1)
where A 2 Rnn is the given real matrix and b 2 Rn is the right-hand side vector, produce a sequence of approximations to the exact solution A 1 b. We denote by xm the approximation after m iterations and by rm = b Axm the corresponding residual. The error em = A 1 b xm is defined as the exact solution minus the approximation and satisfies the residual equation Aem = rm : 1 Bill
Watterson, Something under the bed is drooling (A Calvin and Hobbes Collection).
5
(2.2)
6
CHAPTER 2. KRYLOV SUBSPACE METHODS
The initial guess is denoted by x0 and r0 = b Ax0 is the corresponding initial residual which is used to construct the Krylov subspaces. Definition 2.1.1 Given a real matrix A 2 Rnn and a vector r0 subspaces are defined by
2 Rn , the Krylov
K m (A; r0 ) = SPANfr0 ; Ar0 ; : : : ; Am 1 r0 g:
(2.3)
For m = 1; 2; : : : ; n they form a nested sequence of subspaces.
2.2 Overview of Krylov subspace methods A general reference for numerical linear algebra problems is the well known book by Golub and Van Loan [GVL92]. Several books [BBC+ 94, Bru95, Gre97, Saa96] cover the classical Krylov subspace methods quite well. An overview of the state of the art on polynomial based iteration methods for symmetric linear systems is given in a book by Fischer [Fis96]. A brief overview of the most commonly used techniques and standard Krylov subspace methods by Saad [Saa98] and one by Van der Vorst and Sleijpen [VdVS98] can be found in the proceedings of the NATO Advanced Study Institute on Algorithms for Large Scale Linear Algebraic Systems [WAS98].
2.2.1 Conjugate Gradient (CG) The Conjugate Gradient (CG) method is an effective Krylov subspace method for symmetric positive definite linear systems. The paper by Hestenes and Stiefel [HS52] is the classical description of the CG method. Hestenes and Stiefel presented the method as a direct method because of its finite termination property, i.e. in exact arithmetic the exact solution A 1 b is found after n iterations. However they also report error-reduction properties and exper iments showing premature convergence. Since the 1970’s CG is considered an iterative method. Its most important application area is sparse linear systems. CG selects an update um 2 K m (A; r0 ) from the Krylov subspace so that the approximation xm = x0 + um minimises eH m Aem . This minimum is guaranteed to exist only if A is symmetric and positive definite. In this case the so-called Anorm jjem jj2A = eH m Aem can be defined. The CG method can be preconditioned with any symmetric positive definite matrix M. The preconditioned version of CG uses M 1 A to construct the Krylov subspace and satisfies the same minimisation property, but over a different subspace. The above minimisation of the A-norm of the error results in the corresponding residuals being M 1 orthogonal riH M
1
rj
=0
for i 6= j.
(2.4)
2.2. OVERVIEW OF KRYLOV SUBSPACE METHODS
7
The convergence behaviour of the CG method is difficult to predict, but useful bounds can be obtained. If CG with a symmetric positive definite preconditioner M is used to solve the linear system (2.1) with a symmetric positive definite matrix A, then it can be shown that
jjemjjA 2jje0jjA ρm
(2.5)
:
The characteristic number ρ is given by q
κA (M
1 A)
κA (M
1 A) + 1
ρ= q
1 (2.6)
where κA (A) = λmax (M 1 A)=λmin (M 1 A) is the ratio of the largest eigenvalue to the smallest eigenvalue. Another important result concerning the convergence of CG is the so-called superlinear convergence, i.e. convergence at a rate that increases per iteration. The explanation for this phenomenon is that CG tends to eliminate components of the error in the direction of the eigenvectors associated with extremal eigenvalues first. After these have been eliminated, CG proceeds as if these eigenvalues were not present in the system and as if the convergence were governed by a smaller condition number. This convergence behaviour of CG has been analysed by e.g. Van der Sluis and Van der Vorst [VdSVdV86].
2.2.2 MINRES and SYMMLQ While CG is a variant of the Lanczos method that can only be applied to symmetric positive definite systems, MINRES and SYMMLQ can be applied to symmetric indefinite systems. For a symmetric matrix an orthogonal basis for the Krylov subspace can be constructed by a three term recurrence Avm = tm
1;m vm 1 + tm;m vm + tm+1;m vm+1 :
(2.7)
This means that Krylov subspace methods for symmetric matrices are based on the fundamental relation AVm = Vm Tm + tm+1;m vm+1 eH m
= Vm+1 T¯m :
(2.8)
The rectangular tridiagonal matrix T¯m 2 R(m+1)m is the symmetric (ti+1;i = ti;i+1 ) tridiagonal matrix Tm supplemented with an extra row 0 : : : 0 tm+1;m . MINRES selects the approximate solution xm = x0 + Vm ym that minimises the 2-norm of the residual
jjb
Axmjj2
= = =
jjr0 AVmymjj2 jjr0 Vm 1T¯m ymjj2 jjβe1 Dm 1T¯m ymjj2
(2.9)
+
+
;
(2.10) (2.11)
8
CHAPTER 2. KRYLOV SUBSPACE METHODS
where β = jjr0 jj2 and Dm+1 = DIAG jjv1 jj2 jjv2 jj2 : : : jjvm+1 jj2 is the diagonal matrix containing the norm of the basis vectors. The vector ym is computed by solving the tridiagonal system in the least squares sense, e.g. by annihilating the subdiagonal elements of the tridiagonal matrix by means of Givens rotations and solving the resulting upper bidiagonal system. SYMMLQ is the extension of CG to symmetric indefinite systems and solves the reduced system Tm ym = βe1
(2.12)
for the coefficients ym using an LQ-decomposition (into a lower triangular matrix L and an unitary matrix Q) of Tm . Equation (2.12) is exactly the same Galerkin condition as in CG, i.e. the residual is made orthogonal to the subspace the update is taken from. Hence the residual is a scalar multiple of vm+1 . In CG the matrix Tm is symmetric positive definite because A is symmetric positive definite and hence a Cholesky factorisation of Tm is done as the algorithm proceeds.
2.2.3 CG on the Normal Equations The CGNE and CGNR methods are based on the straightforward idea to apply CG to the symmetric positive definite normal equations. CGNR stands for Conjugate Gradient on the normal equations with optimal residual. In this method CG is applied to the linear system AH Ax = AH b;
(2.13)
while in the CGNE method CG is applied to the linear system AAH u = b;
(2.14)
which results from setting x = AH u. CGNE stands for Conjugate Gradient on the normal equations with optimal error. The approximations in the CGNE and CGNR methods are in the same subspace but the optimality properties are different. CGNR computes the minimal residual approximation while CGNE produces the minimal error approximation. The rate of convergence of CG on the normal equations may be slow since the convergence now depends on the square of the condition number κ2 (AH A) = κ2 (AAH ) = κ22 (A):
(2.15)
Therefore these methods are seldom used in practice.
2.2.4 GMRES and FOM The Full Orthogonalisation Method (FOM), also known as Arnoldi’s method for linear systems [Arn51], is the extension of CG to nonsymmetric systems. The
2.2. OVERVIEW OF KRYLOV SUBSPACE METHODS
9
Generalised Minimal Residual (GMRES) proposed by Saad and Schultz [SS86] is the extension of MINRES to nonsymmetric systems. A theoretical comparison of FOM and GMRES is given by Brown [Bro91] and more recently by Cullum and Greenbaum [CG96]. Both for FOM and GMRES, the computational kernel is the Arnoldi process to compute an orthonormal basis for K m (A; r0 ). The Arnoldi basis vectors Vm = v1 v2 : : : vm 2 Rnm form an orthogonal matrix. In the orthogonalisation process the scalars hi; j are computed so that the square upper Hessenberg matrix Hm = (hi; j ) 2 Rmm satisfies the fundamental relation AVm = Vm Hm + hm+1;m vm+1 eH m
¯ m: = Vm+1 H
(2.16)
The rectangular upper Hessenberg matrix H¯ m 2 R(m+1)m is the square upper Hessenberg matrix Hm supplemented with an extra row 0 : : : 0 hm+1;m . In the general nonsymmetric case, an orthogonal basis for the Krylov subspace can only be obtained by an explicit orthogonalisation, while in the symmetric case a three term recursion is sufficient. Hence all previously computed vectors in the orthogonal sequence have to be stored in memory. For this reason “full” GMRES is unacceptable and restarted versions of GMRES are used in practice. GMRES selects the approximate solution xm = x0 + Vm ym that minimises the 2-norm of the residual
jjb
Axmjj2
= = =
jjr0 AVmym jj2 jjr0 Vm 1H¯ mym jj2 jjβe1 H¯ mymjj2 +
;
(2.17) (2.18) (2.19)
where β = jjr0 jj2 . The computation of the vector ym in (2.19) is done in the least squares sense as in MINRES. First the subdiagonal elements of the upper Hessenberg matrix are annihilated, e.g. with Givens rotations, and next the resulting upper triangular system is solved. FOM selects the approximate solution such that the residual is orthogonal to K m (A; r0 ) by solving the reduced system VmH (r0
AVm ym ) = 0 , Hm ym = βe1 :
(2.20)
We do not elaborate on implementation details here, but we refer to the classical implementation of GMRES by Saad and Schultz [SS86] using the (modified) Gramm–Schmidt orthogonalisation process and the implementation of GMRES by Walker [Wal88] using Householder transformations. An implementation of GMRES which has the current residual vector at every step of the iteration is the Simpler GMRES algorithm described by Walker and Zhou [WZ94]. The superlinear convergence behaviour of GMRES is described by e.g. Van der Vorst and Vuik [VdVV93].
10
CHAPTER 2. KRYLOV SUBSPACE METHODS
In case A is nonhermitian and indefinite, the convergence of (full) GMRES may stagnate for many iterations. Consequently, it may be necessary to make the restart frequency substantially larger to overcome near-stagnation. The impact of the restart parameter of GMRES has been studied by Joubert [Jou94a]. Restarting slows down the convergence. Based on the observation that the convergence of restarted GMRES can be much faster when approximate eigenvectors corresponding to a few of the smallest eigenvalues are added to the GMRES search space, Morgan [Mor95, Mor97] has developed Implicitly Restarted GMRES. Restarted GMRES preconditioned by deflation has also been described by Erhel et al. [EBP94]. Saad [Saa97] has given an analysis of Krylov methods augmented with nearly invariant subspaces. Chapman and Saad [CS97] have also described GMRES with deflation. Another way to avoid the slow convergence of (explicitly) restarted GMRES is to use the truncation strategies developed by de Sturler [dS96b]. A robust GMRES-based adaptive polynomial preconditioning algorithm for nonsymmetric linear systems is given by Joubert [Jou94b].
2.2.5 Flexible GMRES (FGMRES) and FOM (FFOM) The Flexible GMRES (FGMRES) method developed by Saad [Saa93] is a flexible variant of GMRES, which allows the introduction of a set of well-chosen vectors in the search space and computes the minimum norm approximate solution. The fundamental relation in FGMRES is similar to (2.16). We have that AZm = Vm Hm + hm+1;m vm+1 eH m
¯m = Vm+1 H
(2.21)
containing the search directions where Zm = z1 z2 : : : zm is the matrix and the matrix Vm = v1 v2 : : : vm is defined by its columns. The matrix Hm is a square m m upper Hessenberg matrix whose elements are computed during the orthogonalisation process of the v-vectors, consequently VmH Vm = I. Using a fixed preconditioner with FGMRES is equivalent to using right preconditioned GMRES with this preconditioner. In this case, the matrix Zm can be computed by applying the fixed preconditioner M 1 to the matrix Vm Zm = M 1Vm
(2.22)
and the only difference is the explicit storage of all preconditioned search vectors in FGMRES. In GMRES one additional right preconditioning step is required to compute the update to the initial guess. As a result, GMRES is cheaper in terms of memory requirements at the cost of one extra preconditioning step. Flexible FOM (FFOM) can be defined as the Galerkin-like method corresponding to (2.21), i.e. find an approximate solution in Zm so that the corresponding residual is orthogonal to Vm . Hence the FFOM residual vector is a scalar multiple of v j , the last vector generated in the Arnoldi process. FFOM has also been mentioned by Vuik [Vui95].
2.2. OVERVIEW OF KRYLOV SUBSPACE METHODS
11
2.2.6 Generalised Conjugate Residual (GCR) The Generalised Conjugate Residual (GCR) is due to Eisenstat et al. [EES83]. They describe the method as a conjugate-gradient-like descent method for nonsymmetric systems of linear equations. The approximate solution xi is updated in every step using the search direction pi xi+1 = xi + αi pi
(2.23)
where the coefficient αi αi =
(ri ; Api )
(2.24)
(Api ; Api )
is chosen so that the residual norm
jjri 1 jj2 = jjb
Axi+1jj2 = jjb
+
A(xi + αi pi )jj2
(2.25)
as a function of αi is minimised. Different variants of GCR exist. They differ in the technique used to determine the vector pi . The new search direction is computed from the current residual and the previous search directions i 1
pi = ri + ∑ β j p j : (i)
(2.26)
j =0
(i)
Several criteria can be used to determine the coefficients β j in (2.26). The classical choice is to compute a set of AH A-orthogonal search vectors, by setting (i)
βj
=
(Ari ; Ap j ) (Ap j ; Ap j )
:
(2.27)
A truncated variant results when the new search vector is made AH A-orthogonal only with respect to the last k 0 search vector vectors. A disadvantage of GCR is that both pi and Api are stored. Consequently it requires twice the storage GMRES requires. The advantage is that GCR is a flexible method, i.e. any vector pi can be used to update the approximate solution. Therefore the memory requirements are similar to those of FGMRES.
2.2.7 Short Recurrence Methods From the theorem by Faber and Manteuffel [FM84], we know that for most nonHermitian problems one cannot expect to find short recurrence formulae that generate the optimal approximations from successive Krylov subspaces. This implies that either short recurrences, as in BiCG [Lan50, Lan52], CGS [Son89],
12
CHAPTER 2. KRYLOV SUBSPACE METHODS
QMR [FN91], TFQMR [Fre93], Bi-CGStab [VdV92], Bi-CGStab2 [Gut93] and BiCGstab(l) [SVdVF94], are used or the optimality is kept at the cost of high storage demands. In practice only restarted or truncated versions of optimal methods, such as GMRES, are used. Several Krylov subspace methods based on short recurrence formulas exist for general nonsymmetric systems. These methods do not have the desired optimality property because the orthogonality is sacrificed. Consequently we cannot prove convergence properties for these methods using the methods in the next section and therefore we do not consider these methods in the theoretical results. However the performance of these methods may be quite good. BiConjugate Gradient (BiCG) The CG method is not suitable for nonsymmetric systems because the residuals cannot be made orthogonal with short recurrences. In the BiConjugate Gradient (BiCG) method the update relations for the residuals in the CG method are augmented with similar relations based on the transpose AH of the matrix A. The residuals r˜i in the sequence based on AH and the residuals r j in the sequence based on A are bi-orthogonal, i.e. they satisfy r˜iH r j
=0
for i 6= j.
(2.28)
Few theoretical results are known about the convergence of BiCG. For symmetric positive definite matrices BiCG reduces to CG at twice the cost per iteration. For nonsymmetric matrices it has been shown that in phases of the process where there is significant reduction of the norm of the residual, the method is more or less comparable to full GMRES. In practice this is often confirmed, but the convergence behaviour may be quite irregular. Quasi Minimal Residual (QMR) The main idea behind the Quasi Minimal Residual (QMR) method is to solve the reduced tridiagonal system in a least squares sense, similar to the approach followed in GMRES. QMR minimises a quasi residual so the convergence behaviour is a lot smoother than the convergence behaviour of BiCG. The least square approach for the reduced tridiagonal system in the QMR method also overcomes the breakdown that may occur in BiCG due to the implicit LU-factorisation of this reduced tridiagonal matrix. Conjugate Gradient Squared (CGS) The Conjugate Gradient Squared (CGS) is based on the observation that it is possible to apply the square of the residual polynomial in BiCG without using AH . If
2.2. OVERVIEW OF KRYLOV SUBSPACE METHODS
13
BiCG computes a residual (BiCG)
ri
˜ i (A)r0 =ϕ
(2.29)
which is characterised by the residual polynomial ϕ˜ i (z) 2 Pi with ϕ˜ i (0) = 1, then CGS computes (CGS)
ri
˜ 2i (A)r0 : =ϕ
(2.30)
The observed speed of convergence of CGS is often twice that of BiCG, which is in agreement with the observation that the same “contraction” operator is applied twice. However, there is no reason why the “contraction” operator, even if (BiCG) ˜ i (A)r0 . It is it reduces the initial residual r0 , should reduce the vector ri =ϕ clear that CGS inherits the irregular convergence behaviour and the breakdown conditions from BiCG. A consequence of the irregular convergence behaviour is that the corrections to the approximate solution may be so large that cancellation can occur, leading to a solution that is less accurate than suggested by the updated residual. This phenomenon was studied by Van der Vorst [VdV92] and was one of the reasons to develop BiCGStab. Transpose Free Quasi Minimal Residual (TFQMR) The Transpose Free Quasi Minimal Residual (TFQMR) method is also a squared method. It is made transpose free by squaring the QMR method. It is quite competitive with CGS while having a much smoother convergence behaviour due to the quasi minimal residual property. The TFQMR method can be considered the QMR version of CGS. BiConjugate Gradient Stabilized (BiCGStab) The BiConjugate Gradient Stabilized (BiCGStab) method was developed to avoid the irregular convergence behaviour in the BiCG and CGS methods. It can be interpreted as the product of BiCG with GMRES(1) in every step. The residuals satisfy (BiCGStab)
ri
˜ i (A)ϕ˜ i (A)r0 ; =ψ
(2.31)
˜ i (z) 2 Pi with ψ ˜ i (0) = 1 is a product of i minimal residwhere the polynomial ψ ual polynomials of degree 1. Since in every step a residual is minimised (in the GMRES(1) substep after the BiCG part) a much smoother convergence behaviour is obtained. ˜ i (z) can have only real Gutknecht [Gut93] pointed out that the polynomial ψ zeroes since it is constructed in the GMRES(1) part of BiCGStab and therefore is
14
CHAPTER 2. KRYLOV SUBSPACE METHODS
unable of handling complex conjugate eigenvalues. He suggested using a polynomial of degree 2 to overcome this problem and GMRES(2) is used in the variant of BiCGStab he proposed. This method is referred to as BiCGStab2. It is clearly advantageous to increase the degree of the minimal residual polynomial computed in every substep BiCGStab. In the BiCGStab(l) method proposed by Sleijpen et al. [SVdVF94] even longer recurrences are computed in the GMRES part of the algorithm. The parameter l refers to the degree of the minimal residual polynomial constructed in every substep, i.e. GMRES(l) is used.
2.3 Optimal Krylov Subspace Methods For a symmetric positive definite linear system, CG is optimal in the sense that the A-norm of the error em = A 1 b xm is minimised in every step of the iteration. For general, nonsymmetric linear systems optimality is defined in terms of the 2-norm of the residual. Definition 2.3.1 An optimal Krylov subspace method is a Krylov subspace method that minimises the residual norm jjrm jj2 = jjb Axmjj2 for each m
jjrmjj2 = minϕ˜
2Pm ϕ˜ m (0)=1
m (λ )
;
jjϕ˜ (AM
1
jj
)r0 2
(2.32)
The best known optimal Krylov subspace methods are GMRES, its flexible variant FGMRES and the mathematically equivalent GCR method for general nonsymmetric matrices and MINRES for symmetric matrices. The paper by Nachtigal et al. [NRT92a] shows that for any group of methods there is a class of problems p for which the given method outperforms the other methods by a factor of O ( n) or even O (n). Hence there is no best overall Krylov subspace method. This is due to the fact that Krylov subspace methods are fundamentally different from each other, e.g. the convergence behaviour of methods based on the normal equations like CGNE and CGNR, is governed by the singular values, while the convergence behaviour of other methods like GMRES and CGS is governed by eigenvalues or pseudo-eigenvalues.
2.4 Convergence Properties In this section we briefly discus some convergence properties of optimal Krylov subspace methods. The first theorem is the so-called finite termination property. It states that in exact arithmetic the solution is found in no more than n iterations. In practice this bound is of no use, since n is usually quite large and performing that many iterations is highly undesirable. Moreover for a nonsymmetric matrix the memory requirements would be unacceptably high.
2.4. CONVERGENCE PROPERTIES
15
The second theorem states that for normal matrices the convergence of optimal Krylov subspace methods is determined by the eigenvalues of the preconditioned operator AM 1 via the complex approximation problem of minimising jϕ˜ m (z)j over the eigenvalue spectrum subject to the condition ϕ˜ m (0) = 1. Finally, Theorem 2.4.3 will prove to be very useful for our domain decomposition preconditioning technique since our domain decomposition preconditioning technique results in an eigenvalue spectrum clustered around 1 and a small number of outliers. Theorem 2.4.1 In exact arithmetic an optimal Krylov subspace method will converge in no more than n steps. The next theorem was established by Eisenstat et al. [EES83] for the GCR method. It was repeated for the GMRES method by Saad and Schultz [SS86]. Theorem 2.4.2 Suppose A = XDX 1 is diagonalisable and define ρm as the minimum over all polynomials ϕ˜ m (λ) 2 Pm of degree m, satisfying ϕ˜ m (0) = 1, of the maximum of this polynomial evaluated for every eigenvalue λ j ( j = 1; 2; : : : ; n) of the matrix A ρm = minϕ˜ m (λ)2Pm;ϕ˜ m (0)=1 max1 jnjϕ˜ m (λ j )j:
(2.33)
The residual rm = b Axm after m iterations of an optimal Krylov subspace method satisfies
jjr m jj2 κ (X )ρ jjr 0 jj2 2 m ( )
(2.34)
( )
where κ2 (X ) = jjX jj2 jjX 1 jj2 is the 2-norm condition number of the matrix whose columns are the eigenvectors of the matrix A. Our next theorem is a modification of Theorem 5 in [SS86], tailored to the eigenvalue spectrum of the preconditioned operator AM 1 we observe when our domain decomposition preconditioner is used. In this case the preconditioned operator AM 1 has a small number of well-isolated eigenvalues, which we refer to as outliers, and the remaining eigenvalues are clustered. Theorem 2.4.3 Suppose that there are a small number no of well-isolated eigenvalues of the preconditioned operator AM 1 , the so-called outliers, λ j for j = 1; 2; : : : no and that the remaining eigenvalues of the preconditioned operator AM 1 lie in a circle with radius ρ and centre 1. For each outlier k j denotes the smallest positive number k such that ker λ j I
AM
1 k
= ker
λ jI
AM
1 k+1
(2.35)
CHAPTER 2. KRYLOV SUBSPACE METHODS
16
no
and set d =
∑ k j . We define the distance δ as follows
j =1
δ = maxjz 1j=ρ max1 jno
jλ j zj jλ j j
:
(2.36)
The residual rm = b Axm after m > d iterations of an optimal Krylov subspace method with right preconditioner M 1 satisfies
jjr m jj2 Cδd ρm jjr 0 jj2 ( )
d
(2.37)
( )
where the constant C is independent of m. Proof. An optimal Krylov subspace method will compute a better residual polynomial than any polynomial we construct for comparison, because of the optimality no z kj property. In order to obtain the above result we include the factor ∏ 1 λj j =1 of degree d in our comparison residual polynomial since this factor zeroes out the residual components corresponding to the outliers. Our comparison polynomial is of the form ˜ m d (z) (z) = ϕ ϕ˜ compare m
no
∏
j =1
1
z λj
k j
(2.38)
where ϕ˜ m d (z) 2 Pm d and ϕ˜ m d (0) = 1 in order for our comparison polynomial compare ϕ˜ m (z) to be a valid residual polynomial. The factor ρm d in the bound, is due to the fact that all the other eigenvalues are clustered around 1 and within a circle with radius ρ and is obtained by setting C = 1 and R = ρ in Theorem 5 in [SS86]. This result basically says that d iterations are needed to capture the outliers and that after the outliers have been captured faster convergence will be observed.
Chapter 3
Domain Decomposition Methods Gallia omnia divisa est in partes tres.1
In this chapter we give a short overview of domain decomposition methods. Domain decomposition methods are the second of the two ingredients on which the work presented in this thesis builds on.
3.1 Introduction For an introduction to the subject we refer to the book by Smith et al. [SBG96] and the overview paper by Chan and Mathew [CM94]. We also mention the recent book by Quarteroni and Valli [QV99] which gives good formulations of domain decomposition methods for several problems. The latest developments in the field are covered in the proceedings of the 12 international conferences on domain decomposition methods [GGMP88, CGPW89, CGPW90, GKM+ 91, KCM+ 92, QPKW94, KX95, GPSW96, BEK98, MFC98, LBCW99, CKKP00].
3.2 One Level Algorithms The earliest known domain decomposition method was introduced by Schwarz [Sch90] in 1870. The method introduced by Schwarz may be used to solve elliptic boundary value problems on domains that are the union of two subdomains 1 Julius
Caesar, De bello Gallico
17
CHAPTER 3. DOMAIN DECOMPOSITION METHODS
18
B B
B B
B B
Γ2
B B
Ω1 B B
Γ1 Ω2
B B
Figure 3.1: Overlapping subdomains in the Alternating Schwarz Method by alternatingly solving the same elliptic boundary value problem restricted to the individual subdomains. Schwarz derived the alternating method to prove the existence of a solution to (3.1) in a domain for which there was no known analytic solution. P. Lions [Lio78, Lio88, Lio89, Lio90] did some of the early work on Schwarz methods.
3.2.1 Alternating Schwarz Method: Continuous Formulation We wish to solve the linear elliptic PDE
Lu = f u=g
in Ω, on ∂Ω.
(3.1)
The simplest form of L is minus the Laplacian:
L = ∇2 =
d
∂2
∑ ∂x2
i=1
;
(3.2)
i
where d denotes the dimension. In this thesis we focus on 2D problems and set d = 2. For simplicity we restrict the discussion to Dirichlet boundary conditions. More general Neumann and Robin boundary conditions may also be dealt with easily. We need to introduce some notations in order to describe the classical alternating Schwarz method. The domain Ω is the union of the two overlapping subdomains Ω1 and Ω2 . The artifical boundaries Γ1 and Γ2 are the part of the boundary of the subdomains that are interior to the domain Ω. The geometry is shown in (n) ¯ i after n iterations and Fig. 3.1. Let ui denote the approximate solution on Ω (n) (n) ui jΓ j the trace of ui on Γ j . (0)
Given an initial guess for u2
jΓ
1
the alternating Schwarz method proceeds as
3.2. ONE LEVEL ALGORITHMS
19
follows. Iteratively for n = 1; 2; 3; : : : in the first half-step one solves the BVP 8 > <
L u(1n) = f
> :
u1 = g (n) (n u1 = u2
in Ω1 , on ∂Ω1 n Γ1 , on Γ1 ,
(n)
1)
jΓ
1
(3.3)
(n)
for u1 and in the second half-step one solves the BVP 8 > <
L u(2n) = f
> :
u2 = g (n) (n) u2 = u1 jΓ2
in Ω2 , on ∂Ω2 n Γ2 , on Γ2 ,
(n)
(3.4)
(n)
for u2 . Thus in each half-step of the alternating Schwarz method one solves an elliptic BVP on the subdomain Ωi with the given boundary conditions g on the true boundary ∂Ωi n Γi , and the most recent approximate solution in the other subdomain on the interior boundary.
3.2.2 Alternating Schwarz Method: Discrete Formulation The Alternating Schwarz Method (3.3) and (3.4) can be written in discretised form as 8 (n) > < A1 u 1 = f 1 (n)
> :
u∂Ω nΓ = g1 1 1 (n) (n uΓ1 = IΩ2 7!Γ1 u2
1)
in Ω1 , on ∂Ω1 n Γ1 ,
(3.5)
on Γ1 ,
and 8 (n) > < A2 u 2 = f 2 (n)
> :
u∂Ω nΓ = g2 2 2 (n) (n) uΓ2 = IΩ1 7!Γ2 u1
in Ω2 , on ∂Ω2 n Γ2 ,
(3.6)
on Γ2 .
The matrix Ai is the discrete form of the operator Li restricted to Ωi and IΩi 7!Γ j denotes a discrete operator that interpolates values from the nodes in the interior of Ωi to the nodes on the boundary Γ j . The discrete equivalents of the r.h.s. f and the boundary values g are fi and gi respectively. The vector of coefficients ui is partitioned in three parts 0
uΩi
1
ui = u∂Ωi nΓi A : uΓi
(3.7)
CHAPTER 3. DOMAIN DECOMPOSITION METHODS
20
The coefficients in u∂Ωi nΓi are known since they are given by the Dirichlet boundary conditions. We rewrite (3.5) and (3.6) using the decomposition of the matrix Ai as follows Ai =
A∂Ωi nΓi
AΩi
AΓi
(3.8)
:
The Dirichlet boundary conditions are eliminated, giving f˜i = fi
A∂Ωi nΓi gi :
(3.9)
The discrete Alternating Schwarz Method can be written in the form (after adding (n 1) (n 1) and subtracting AΩ1 uΩ1 and AΩ2 uΩ2 ): (n)
uΩ1
(n)
uΩ2
(n 1) 1 = uΩ + AΩ 1 1 (n 1) 1 = uΩ + AΩ 2 2
(n
f˜1
AΩ1 uΩ1
f˜2
AΩ2 uΩ2
(n
1) 1)
(n
AΓ1 IΩ2 7!Γ1 u2
(n)
AΓ2 IΩ1 7!Γ2 u1
1)
;
(3.10) (3.11)
:
This is the block Gauss-Seidel method for the linear system
AΩ1
AΓ2 IΩ1 7!Γ2
AΓ1 IΩ2 7!Γ1 AΩ2
uΩ1 uΩ2
=
f˜1 f˜2
(3.12)
:
In general, this system is not symmetric, even if the submatrices AΩ1 and AΩ2 are symmetric, since (AΓ1 IΩ2 7!Γ1 ) 6= (AΓ2 IΩ1 7!Γ2 )H , due to the interpolation operators IΩ2 7!Γ1 and IΩ1 7!Γ2 . A block Jacobi method can also be used to solve (3.12) instead of block Gauss– Seidel, (n)
uΩ1
(n)
uΩ2
(n 1) 1 = uΩ + AΩ
1
1
(n 1) 1 = uΩ + AΩ 2 2
(n
f˜1
AΩ1 uΩ1
f˜2
AΩ2 uΩ2
(n
1) 1)
(n
AΓ1 IΩ2 7!Γ1 u2
(n
AΓ2 IΩ1 7!Γ2 u1
1) 1)
;
(3.13)
:
(3.14)
The advantage of this method when applied to many subdomains is that it allows parallel computation of (3.13) and (3.14), while in the block Gauss–Seidel method the computation of (3.11) can only be started after completion of (3.10) because (n) of the presence of u1 in (3.11).
3.2.3 Schwarz Method for Matching Grids The Schwarz methods can be optimised for matching grids in the overlap region. This is the case in many applications, e.g. when the PDE is first discretised on the entire domain to obtain the linear system Au = f
(3.15)
3.2. ONE LEVEL ALGORITHMS
21
and this system is solved with the aid of a Schwarz method. Clearly, on matching grids the interpolation operator IΩi 7!Γ j = I can be omitted since it is the identity matrix. When there is no direct coupling in the matrices between nodes on opposite sides of the artificial boundaries Γi we have that AΓi uΓi
= AΩnΩi uΩnΩi :
(3.16)
We define a new iteration (even when there is a direct coupling and (3.16) is not valid) by replacing those terms to obtain (n)
uΩ1
(n)
uΩ2
(n 1) 1 = uΩ + AΩ 1 1 (n 1) 1 = uΩ + AΩ 2 2
(n
f1
AΩ1 uΩ1
f2
AΩ2 uΩ2
(n
1)
(n
1)
AΩnΩ1 uΩnΩ
1
1)
(n)
AΩnΩ2 uΩnΩ
;
(3.17)
:
(3.18)
2
The explicit dependency on the artificial boundary Γi has been removed. Note that we have actually changed the iteration when (3.16) is not valid, i.e. when there is a direct coupling between nodes on opposite sides of the artificial boundaries, which is the case e.g. when high order finite differences are used. (n 1) (n) In (3.18) we could update uΩ2 in the overlap region with values from uΩ1 , but this does not matter if the subdomain problems AΩi1 are solved exactly, only the updated values at the boundary are important. The terms
(n
f1
AΩ1 uΩ1
f2
AΩ2 uΩ2
1)
(n
1)
AΩnΩ1 uΩnΩ
(3.19)
1
and
(n
1)
(n )
AΩnΩ2 uΩnΩ
(3.20)
2
calculate the new residual in the interior of Ωi . Based on these observations we rewrite the multiplicative Schwarz method in a more compact matrix form as follows: u
(n+ 21 )
=u
(n)
+ 1
u(n+1) = u(n+ 2 ) +
AΩ11 0
0 0
0 0
0 AΩ21
Au(n)
f
f
(3.21)
;
1
Au(n+ 2 )
:
(3.22)
The most recent artificial boundary values are incorporated in the right hand side 1 via the residual vectors ( f Au(n)) and ( f Aun+ 2 ). Consequently the subdomain problems in (3.21) and (3.22) use a zero Dirichlet boundary condition on the artificial boundary, unlike in the nonmatching grids alternating Schwarz method where these boundary values are handled explicitly.
22
CHAPTER 3. DOMAIN DECOMPOSITION METHODS
We introduce some notation that allows a compact notation of the algorithms. Let Ri be the rectangular restriction matrix that returns the vector of coefficients defined in the interior of Ωi , i.e. uΩ1
= R1 u =
I
0
uΩ2
= R2 u =
0
I
and
uΩ1 uΩnΩ1 uΩnΩ2 uΩ2
(3.23)
(3.24)
:
We have reordered the vector u to show that Ri is just a permutation of a rectangular identity matrix, which is of course never formed in practice. With these restriction matrices a compact notation for the subdomain matrix is AΩi
H = Ri ARi :
(3.25)
The multiplicative Schwarz method can written as
1
H 1 u(n+ 2 ) = u(n) + RH 1 (R1 AR1 ) R1 f
1
H 1 u(n+1) = u(n+ 2 ) + RH 2 (R2 AR2 ) R2 f
Au(n)
1
Au(n+ 2 )
(3.26)
:
(3.27)
The matrix Bi is defined by H Bi = RH i Ri ARi
1
Ri
(3.28)
and is also never formed explicitly. The action of this matrix on a residual vector is straightforward. It restricts the residual to one subdomain, solves the problem on the subdomain to generate a correction and extends this correction back to the entire domain. We can now rewrite the multiplicative Schwarz method in a single step as u(n+1) = u(n) + (B1 + B2
B2 AB1 ) f
Au(n)
:
(3.29)
This is a Richardson iteration with preconditioner Bmult
= B1 + B2
B2 AB1 :
(3.30)
A Krylov subspace acceleration technique can be used to improve the convergence rate of the basic Richardson procedure. The application of the preconditioner Bmult to a vector v is one iteration of the multiplicative Schwarz method with a zero initial guess and a right-hand side equal to v. The result z = Bmult v can be computed in 2 steps: y = B1 v; z = y + B2 (v
Ay) :
(3.31) (3.32)
3.2. ONE LEVEL ALGORITHMS
23
In the Richardson iteration (3.29) this preconditioner is applied to the residual vector f Au(n). In some Krylov subspace methods, such as GMRES, it is applied to vectors generated by the Arnoldi process instead of the current residual vector.
3.2.4 Symmetric Multiplicative Schwarz Method The multiplicative Schwarz preconditioner (3.30) is not symmetric, even when the matrix A is symmetric. Hence it cannot be used in the conjugate gradient method. This problem can be overcome by including a third step
1
u(n+ 3 ) = u(n) + B1 f u
2 (n+ 3 )
=u
1 (n+ 3 )
+ B2
Au(n)
2
f
u(n+1) = u(n+ 3 ) + B1 f
Au
(3.33) 1 (n+ 3 ) 2
Au(n+ 3 )
(3.34)
(3.35)
;
which makes the preconditioner symmetric. The symmetry can be seen when we write the iteration as a single step method u(n+1) = u(n) + (B1 + B2
B2AB1
B1 AB2 + B1AB2 AB1 ) f
Au(n)
(3.36)
The symmetric multiplicative Schwarz preconditioner can be written in factored form as Bsym;mult
= B1 + (I
B1A) B2 (I
AB1) :
(3.37)
3.2.5 Additive Schwarz Method The additive Schwarz method was introduced by Dryja and Widlund [DW87]. While the multiplicative Schwarz methods can be interpreted as generalisations of the block Gauss–Seidel method, the additive Schwarz method is a generalisation of the block Jacobi method, which can be written as
u(n+1) = u(n) +
AΩ11 0
0 0
+
0 0 0 AΩ21
Au(n)
f
;
(3.38)
or with the notation introduced above
u(n+1) = u(n) + (B1 + B2 ) f
Au(n)
:
(3.39)
This Richardson iteration with the additive Schwarz preconditioner Badd = B1 + B2 :
(3.40)
24
CHAPTER 3. DOMAIN DECOMPOSITION METHODS
may have convergence problems. Hence in all implementations a Krylov subspace method is used to ensure and accelerate the convergence. Another straightforward way to derive the additive Schwarz preconditioner is to consider modifications of the multiplicative Schwarz preconditioner in order to allow parallel application of B1 and B2 . The term B2 AB1 in (3.30) prevents the parallel application of B1 and B2 , so we merely drop that term from the preconditioner to obtain (3.40). It is clear that the convergence rate of the additive Schwarz method is slower than that of the multiplicative Schwarz method. For many problems, the multiplicative Schwarz method requires approximately half as many iterations as the additive Schwarz method. This is similar to the classical convergence results for the Jacobi and Gauss–Seidel methods.
3.2.6 Many Subdomains Extension of Schwarz methods to more than two subdomains is straightforward. The additive Schwarz preconditioner on many subdomains can be written as p
Badd
=
∑ Bi
(3.41)
;
i=1
where p is the number of subdomains. The explicit form of the multiplicative Schwarz preconditioner on many subdomains is Bmult
= =
(I
A
1
(I
B p A) : : : (I
(I
(I
B1 A)) A
AB p) : : : (I
1
AB1)) :
(3.42) (3.43)
This explicit form of the multiplicative Schwarz preconditioner is quite confusing because of the presence of A 1 , which is of course not needed in the implementation. The purely multiplicative Schwarz method has very little potential for parallelism. In case of many subdomains, there are often many subdomains which share no common grid points. Therefore the solution on all of these subdomains could be updated simultaneously in parallel. Hence some parallelism can be introduced in the multiplicative Schwarz method by “colouring” the subdomains so that two subdomains which share a common point have a different colour.
3.2.7 Convergence Behaviour Assume that the size of the domain Ω is O (1), that the mesh diameter is O (h) and that the subdomains are of diameter O (H ) and overlap with a width of O (δ). The number of nodes across a subdomain is O (H =h) and the number of nodes across the entire domain is O (1=h). The number of nodes across an overlap region is O (δ=h).
3.3. TWO LEVEL ALGORITHMS
25
For linear elliptic operators where the diffusion is dominant, the overlapping Schwarz methods, when used as a preconditioner in a Krylov subspace method, have the following convergence behaviour properties.
The number of iterations increases as 1 =H. If δ is kept proportional to H, the number of iterations is bounded independently of h and H =h. The number of iterations for the multiplicative Schwarz method is roughly half of that needed for the additive Schwarz method. The convergence is poor for δ = 0, but improves rapidly as the overlap δ is increased.
3.3 Two Level Algorithms For elliptic (diffusion) dominated PDEs the one level methods outlined in the previous section 3.2 are most effective only for a small number of subdomains.
3.3.1 Subdomain solves are not sufficient The problem with one level methods is that information is propagated only to neighbouring subdomains and that there is no global exchange of information over the entire domain. The iterative linear system solver must have some mechanism for global communication. The best known techniques to achieve this overall communication are multigrid methods. Subdomain solves are not sufficient because a constant nonzero error on the subdomain is not corrected. This can be understood in different ways. For the standard finite difference and finite element discretisation of a second order elliptic PDE, e.g. the Laplacian L = ∇2 , the row sum of the matrix A corresponding to a point away from the boundary is zero, since there is no zeroorder term in the PDE. A constant nonzero error on a subdomain Ωi produces a zero residual on this subdomain. Hence the local correction 0
0 0 ci = 0 AΩi1 0 0
1
0 0 A f 0
Au(n)
0
0 0 1 = 0 AΩ i 0 0
1
0 0 A Aen 0
(3.44)
is zero as well. An explanation can also be given in terms of projections onto the subspaces Vih , corresponding to the subdomains. All functions in the subspaces Vih are zero on the boundary, so these spaces do not contain a nonzero constant function. Hence the projection of a constant error onto this subspace, which is what happens when
26
CHAPTER 3. DOMAIN DECOMPOSITION METHODS
a subdomain solve is done, results in a zero correction. This implies that “smooth” errors are damped slowly, since they can be decomposed in a large constant part and a small part representing the difference between the actual error and its average. Technically speaking, “smooth” functions are actually low energy functions. For the bilinear form Z a(u; v) = ∇u:∇v dx (3.45) induced by the Laplacian L = ∇2 , the low energy functions are the intuitive “smooth” functions. It is clear that any nonzero constant function has a zero energy norm. The solution of an elliptic PDE at any point depends on the right hand side and on the boundary conditions for the entire domain. For the model problem (3.1) with zero Dirichlet boundary conditions g = 0, we can write the explicit solution as Z u(x) = G(x; x0 ) f (x0 )dx0 (3.46) where G(x; x0 ) is the appropriate Green’s function. For a fixed point x0 the function G(x0 ; x0 ) has a peak at x0 = x0 and decays rapidly for x0 far from x0 . Hence the solution at x0 depends strongly on the value of f (x0 ) for x0 near x0 and weakly for x0 far from x0 . This characteristic of elliptic PDEs explains why subdomain solves are good but not sufficient to solve (3.1).
3.3.2 Basic Two Level Method The idea behind a coarse grid correction is to restrict the residual on the fine grid to the coarse grid, solve the coarse grid problem and interpolate this correction back to the fine grid. A simple coarse grid correction is thus given by cC = RCH AC 1 RC ( fF
AF u F ) ;
(3.47)
where RC is the restriction (matrix) operator which maps from the fine to the coarse grid and RCH is the matrix representation of an interpolation from the coarse to the fine grid. We define BC = RCH AC 1 RC
(3.48)
as the coarse part of the preconditioner. Since the rank of BC is equal to the dimension of AC , which is much less than the dimension of AF , it is clear that BC cannot be used as the only preconditioner because it has a huge null space.
3.3. TWO LEVEL ALGORITHMS
27
If BF is a full rank preconditioner for AF , we can define an iterative method as follows:
1
u(n+ 2 ) = u(n) + BC f
1
Au(n)
1
u(n+1) = u(n+ 2 ) + BF f
Au(n+ 2 )
(3.49)
(3.50)
:
This iteration scheme can be written as a one step method u(n+1) = u(n) + (BC + BF
BF ABC ) f
Au(n)
:
(3.51)
This is a Richardson iteration with preconditioner Btwo = BC + BF
BF ABC :
(3.52)
The convergence rate of the basic Richardson procedure can be improved by using Krylov subspace acceleration techniques. The application of the preconditioner Btwo to a vector v is one iteration of the multiplicative Schwarz method with a zero initial guess. The result z = Btwo v can be computed in 2 steps: y = BC v; z = y + BF (v
(3.53) (3.54)
Ay) :
For nonsymmetric problems there is no algorithmic need for the interpolation operator to be the transpose of the restriction operator. The coarse part of the preconditioner may therefore be of the more general form BC = IAC 1 R, where I is the interpolation operator and R is the restriction operator.
3.3.3 Two Level Schwarz Methods The basic two level method introduced above can be generalised and a multitude of algorithms can be designed. In classical multigrid, the local solvers BF are simple iterative schemes such as Jacobi or Gauss–Seidel. They are referred to as smoothers because they remove the high frequency components of the error. From this point of view, domain decomposition methods are generalisations of multigrid where the simple iterative local solvers are replaced by more general and more robust local solvers. The Schwarz smoothers are simply one level additive or multiplicative overlapping Schwarz preconditioners. Jacobi and Gauss–Seidel smoothers correspond to the special case where the restriction operators Rk return a single component of the vector. Block Jacobi and Block Gauss–Seidel are special cases in which there is no overlap between the subdomains. As for the alternating Schwarz method, the multiplicative two level method has a corresponding additive form
u(n+1) = u(n) + (BC + BF ) f
Au(n)
:
(3.55)
CHAPTER 3. DOMAIN DECOMPOSITION METHODS
28
This additive two level method would not be used directly, but as a preconditioner in a Krylov subspace iterative method. In two level Schwarz methods the local preconditioner BF is either a multiplicative or additive overlapping Schwarz preconditioner which is combined with either an additive or multiplicative two level method. We denote the fine grid com1 ponent of the preconditioner acting on the subdomain Ωi by Bi = RH i AΩi Ri for i = 1; 2; : : : ; p and reserve the index zero for the coarse grid part of the preconditioner B0 = BC = RCH AC 1 RC . Two level additive Schwarz preconditioner The two level overlapping additive Schwarz method is due to Dryja and Widlund [DW87] and can be written as !
p
z = Btwo;add v =
p
RCH AC 1 RC + ∑ Bi v = ∑ Bi v: i=1
(3.56)
i=0
This formulation clearly shows that both the two levels as well as the subdomains are treated in an additive way. Hence all subproblems (including the coarse grid problem) can be solved in parallel. Two level multiplicative Schwarz preconditioners In the two level multiplicative Schwarz method the coarse grid correction and all the subdomain solves are done in a multiplicative way. The explicit form of the preconditioner is given by Btwo;mult
=
I
I
B p AF 1
:::
I
B1 AF 1
I
B0 AF 1
AF 1 :
(3.57)
Note that although AF 1 appears in the explicit form of the preconditioner, it is not needed in the implementation. Two level hybrid overlapping Schwarz preconditioner Cai [Cai93] proposed the two level hybrid I overlapping Schwarz preconditioner. This is a combination of a multiplicative smoother with an additive coarse grid correction. The two level hybrid II overlapping Schwarz preconditioner was introduced by Mandel [Man93]. This variant uses an additive smoother and a multiplicative coarse grid correction. Various other combinations are possible as well as symmetrised versions. We refer to a paper by Holst and Vandewalle [HV97] on how to symmetrise these preconditioners. They also illustrate why symmetrising may be a bad idea.
3.4. MULTILEVEL ALGORITHMS
29
3.3.4 Convergence Behaviour Using the same notations as in Sec. 3.2.7, the two level Schwarz methods, have the following convergence behaviour.
If δ is kept proportional to H, the number of iterations is bounded independently of h, H and H =h. The number of iterations for the multiplicative Schwarz method is roughly half of that needed for the additive Schwarz method. The convergence improves rapidly as the overlap δ is increased.
The major difference with the results in Sec. 3.2.7, is that the number of iterations is bounded independently of H and does not increase like 1=H. For nonsymmetric problems the same satisfactory convergence behaviour is observed if H is sufficiently small depending on the strength of the convection term. With the addition of a coarse grid problem, the rate of convergence can be made independent of the size of the problem. This reflects the fact that the Green’s function, which is nonzero everywhere in the domain, is quite small away from its peak. The local subdomain solves capture the behaviour near the peak and the behaviour in the tail of the Green’s function can be approximated with fewer degrees of freedom.
3.4 Multilevel Algorithms There are at least two reasons to consider multilevel algorithms. The first reason is that two level algorithms may be inadequate for large problems (on fine grids) because of memory considerations. In the case of a large number of small subdomains, the coarse grid problem is large and in the case of a small number of large subdomains, the subdomain problems may be too large to be solved on a single processor. The second reason to consider multilevel algorithms is that they may dramatically reduce the total amount of computational work needed to solve a given problem to a particular accuracy.
3.4.1 Additive Multilevel Schwarz Methods Additive multilevel Schwarz methods are natural extensions of the two level additive Schwarz methods. Essentially the coarse grid solve is replaced by a two level preconditioner recursively. The additive multilevel Schwarz preconditioner is due to Dryja and Widlund [DW91]. Assume we have a possibly nonnested family of grids Ω(i) on the domain Ω. The superscript (i) refers to the level, where Ω(0) denotes the coarsest grid and Ω( j) is the finest grid. Let A(i) be the matrix obtained by discretising the PDE on
CHAPTER 3. DOMAIN DECOMPOSITION METHODS
30
the grid Ω(i) . The matrix R(i) is the restriction operator to the ith level, i.e. from the next finer level (i + 1) to level i and is often defined by the requirement that
H
is an interpolation from grid i to grid (i + 1). The coarse level matrices R(i) are often derived from the fine grid matrix using the relation
A(i) = R(i) A(i+1) R(i)
H
(3.58)
;
which is called the Galerkin condition. (i) Every grid Ω(i) is partitioned in pi (possibly overlapping) subdomains Ωk , for (i)
(i)
k = 1; 2; : : : ; pi . Let Rk be the restriction operator for subdomain Ωk . The matrix (i)
Ak
(i) (i) = Rk A
(i)
H
(3.59)
Rk
(i)
is the submatrix of the matrix A(i) corresponding to the subdomain Ωk . The standard two level additive Schwarz preconditioner with a large coarse grid problem is given by B
( j)
=
R
(j
1)
H
A
(j
1)
1
pj
R
(j
1)
+
∑
( j)
H
Rk
( j)
Ak
1
( j)
Rk
:
(3.60)
k=1 ( j)
The subproblems involving Ak are small and can be solved quickly. The coarse grid problem involving A( j 1) may be quite large and we may wish not to solve it
exactly. We approximate A( j tioner
1)
1
A( j
by a two level additive Schwarz precondi1)
1
B j (
1)
(3.61)
which is defined in a similar way as (3.60). This technique is applied recursively until we reach the coarsest grid, on which
1
can be applied exactly, i.e. the corresponding linear system is so small A(0) that it can be solved to machine precision quite fast. (i) Using the interpolation R(i) and restriction Rk operators defined above, we (i) can define the restriction operator R¯ , which maps directly from the finest grid k
(i)
Ω( j) to the kth subdomain Ωk on the ith level. The additive multilevel Schwarz preconditioner can then be written as j
Badd ;mult
=
pi
∑∑
(i) R¯ k
H
(i)
Ak
1
(i) R¯ k :
(3.62)
i=0 k=1
Hence applying the additive multilevel Schwarz preconditioner is applying the additive Schwarz smoother on all levels simultaneously.
3.4. MULTILEVEL ALGORITHMS
31
Multilevel Diagonal Scaling The multilevel diagonal scaling preconditioner is a special case of the additive multilevel Schwarz preconditioner when minimal size subdomains are used, i.e. every single node is in a separate subdomain. In this case the matrix A(i) is approximated by its diagonal D(i) since the only nonzero element in (3.59) is the diagonal element. The restriction operator R¯ (i) maps directly from the finest grid Ω( j) to the grid (i) Ω on the ith level and can be constructed easily from the interpolation R(i) operators. The multilevel diagonal scaling preconditioner can then be written as
Bmult ;diag = R¯ (0)
H
A(0)
1
j 1
R¯ (0) + ∑ R¯ (i)
H
D(i)
1
R¯ (i) + D( j)
1 :
i=1
(3.63) This preconditioner only needs the diagonal elements of the matrices A(i) except on the coarse level where the entire matrix A(0) is needed. It is possible to use a simple diagonal solve on the coarse grid as well, but this may lead to an inferior rate of convergence. BPX Method The BPX preconditioner is named after Bramble, Pasciak and Xu [BPX90]. For the model problem L = ∇2 and linear or multilinear finite elements on (i) a quasi-uniform grid in d dimensions the elements in the matrix D scale like
O
h(i)
d 2
. This captures the most essential properties of a second order
uniformly elliptic problem. The BPX preconditioner or multilevel nodal basis preconditioner is obtained
by replacing D(i)
BBPX
=
R¯ (0)
1
H
in (3.63) by h(i) A (0 )
1
2 d
j 1
.
R¯ (0) + ∑ R¯ (i)
H
h(i)
2 d
R¯ (i) + h( j)
2 d
I:
i=1
(3.64) Hierarchical Basis Method The hierarchical basis method is another multilevel preconditioner that can be derived from the multilevel nodal basis preconditioner by dropping some terms. In order to reduce the amount of computation per iteration, every node is treated only once, on the coarsest grid in which it appears. It is essentially a change of basis to a hierarchical basis followed by a diagonal preconditioning.
32
CHAPTER 3. DOMAIN DECOMPOSITION METHODS
Yserentant [Yse86a, Yse86b, Yse86c] proved the convergence rates of the hierarchical basis method. See [Yse90] for a comparison of the hierarchical basis method with the BPX method.
3.4.2 Multiplicative Multilevel Schwarz Methods The difference between additive and multiplicative methods is that in the multiplicative version the residual is updated between the substeps of the preconditioner. The basic building blocks are of course the same for the additive and multiplicative versions. Multiplicative between Levels, Additive in Levels This is a straightforward method where all the updates on a particular level are done in parallel since the additive Schwarz smoother is used on every level and the levels are traversed in a sequential manner. If the smoothing is done before the coarse grid correction, it is called presmoothing, if it is done after the coarse grid correction, it is called postsmoothing. It is clear that this algorithm can be symmetrised by reversing the sequence in which the levels were traversed. One can also apply the additive correction on a level more than once before moving on to the next level. Multiplicative Multilevel Diagonal Scaling In this method the levels are traversed sequentially and a diagonal scaling is done on each level. The symmetrised version of this algorithm is known in the multigrid literature as V-cycle multigrid, with one pre- and postsmoothing step of (damped) Jacobi relaxation. If the recursive call to the next coarser level is done twice in succession, the algorithm is equivalent to W-cycle multigrid. Multiplicative between Levels, Multiplicative in Levels This combination has the best numerical convergence rate, but it is not necessarily the fastest method in terms of wall clock time on a parallel computer, since all corrections are applied in a sequential manner within each level and all the levels are visited sequentially. Some parallelism can be introduced within one level by colouring the subdomains in this level, as in the case of the one level multiplicative Schwarz method.
3.4.3 Multigrid Methods The early work on multigrid was done by Brandt [Bra72, Bra77]. A good introduction to multigrid methods is the tutorial by Briggs [Bri87]. Several books cover
3.4. MULTILEVEL ALGORITHMS
33
ordering of subspaces
methods
purely additive
multilevel diagonal scaling hierarchical basis method BPX
multiplicative through the levels, additive within the levels
V-cycle multigrid with (damped) Jacobi smoothing
additive between the levels, multiplicative within the levels
multilevel Gauss–Seidel
purely multiplicative
hierarchical basis multigrid V-cycle multigrid with Gauss-Seidel smoothing
Table 3.1: Ordering of subspaces in several multilevel methods
the multigrid methods quite well and contain lots of references. We mention the books by Hackbusch [Hac85], McCormick [McC87, McC89], Wesseling [Wes92], Bramble [Bra93], R¨ude [R¨ud93], Griebel [Gri94] and Oswald [Osw94]. The relationship between multigrid methods and multiplicative multilevel Schwarz methods was observed by McCormick and Ruge [MR86] and also stated by Bramble et al. [BPWX91]. An extensive study of multilevel methods and some generalisations can be found in a paper by Xu [Xu92]. We restrict ourselves here to the overview in the Table 3.1 which is taken from [SBG96].
Chapter 4
Ritz and Harmonic Ritz Values and the Convergence of FOM and GMRES Then Renton was hit by a wave of shock which threatened to knock him incoherent. A girl came into the room. As he watched her, a coldness came over him. She was the double of Diane, but this girl looked barely secondary school age.1
The Ritz and Harmonic Ritz values are approximate eigenvalues, which can be computed cheaply within the FOM and GMRES Krylov subspace iterative methods for solving nonsymmetric linear systems. We prove that they are the zeroes of the residual polynomials of FOM and GMRES respectively. In this chapter we show that the Walker–Zhou interpretation of GMRES enables us to formulate the relation between the Harmonic Ritz values and GMRES in the same way as the relation between the Ritz values and FOM. We present an upper bound for the norm of the difference between the matrices from which the Ritz and Harmonic Ritz values are computed. The differences between the Ritz and Harmonic Ritz values enable us to describe breakdown of FOM and stagnation of GMRES. 1 Irvine
Welsh, Trainspotting, p. 145
35
CHAPTER 4. RITZ AND HARMONIC RITZ VALUES
36
4.1 Introduction A description of GMRES and FOM is given in Section 2.2.4. In this chapter, we assume that the initial guess x0 to the solution of the linear system Ax = b is zero, hence r0 = b, and that the system has been scaled so that the initial residual has unit length. Further we assume that the dimension of K m (A; r0 ) is m. If this is not the case, then this subspace is invariant under A and the approximations discussed in this chapter are exact. Both FOM and GMRES select an approximate solution from K m (A; r0 ) which can be written as x = ϕm 1 (A)r0 , where ϕm 1 (λ) = γm 1 λm 1 + + γ1 λ + γ0 2 Pm 1 is a real polynomial of degree (m 1). The residual corresponding to this approximate solution is
2 K m 1 (A r0 ) (4.1) FOM selects the approximate solution Vm ym , where ym 2 Rm , such that the residual r=b
Ax = (I
Aϕm
˜ m (A)r0 1 (A)) r0 = ϕ
rFOM = ϕ˜ FOM (A)r0 is orthogonal to K m VmH (r0
+
;
:
m (A; r0 ):
AVm ym ) = 0 , Hm ym = e1 :
(4.2)
Here e1 denotes the first column of the identity matrix. GMRES selects the approximate solution which minimises the norm of the residual krGMRESk2 :
krGMRESk2 = kVmH 1 (r0 +
AVm ym ) k2 = ke1
H¯ m ym k2 :
(4.3)
This is equivalent to the requirement that the residual is orthogonal to the image of the Krylov subspace AK m (A; r0 ). The overdetermined linear system can be solved using the normal equations, where the vector fm = Hm H em is only defined when Hm is nonsingular:
H¯ mH H¯ m ym = HmH e1 , Hm + h2m+1;m fm eH m ym = e1 :
(4.4)
The eigenvalues of Hm are called Ritz values and approximate the eigenvalues of A. We show in section 4.2 that these Ritz values are the zeroes of the FOM residual polynomial. In section 4.3 we give an outline of the Simpler GMRES algorithm of Walker and Zhou. The upper triangular matrix which is computed in this algorithm is used in our subsequent theory. We define a transformation matrix which is also used in the proofs. In section 4.4 we briefly recall the definition of Harmonic Ritz values given by Sleijpen and Van der Vorst (x5.1 in [SVdV96]) and the fact that they are eigenvalue approximations according to the minimal residual criterion. We present an upper bound for the norm of the difference between the matrices from which the Ritz and Harmonic Ritz values are computed. We prove in section 4.5 that the zeroes of the GMRES residual polynomial are the Harmonic Ritz values, based on the Simpler GMRES algorithm. In section 4.6 we relate breakdown of FOM and stagnation of GMRES to the differences between the Ritz and Harmonic Ritz values. Numerical results are presented for a problem
4.2. RITZ VALUES AND FOM RESIDUAL POLYNOMIAL
37
which nearly causes breakdown and stagnation. In section 4.7 we give numerical results for a convection-diffusion problem. Some remarks concerning related work conclude this chapter.
4.2 Ritz Values and FOM Residual Polynomial The classical Galerkin approach for computing approximate eigenpairs has been discussed by several authors, see e.g. [Par80, Saa92]. An approximate eigenvector x = Vm ym is sought in K m (A; r0 ) so that the residual of the eigenvalue equation is orthogonal to K m (A; r0 ) (Ax
µx) ? K
m (A; r0 )
, VmH (AVmym
µVm ym ) = 0:
(4.5)
The approximate eigenvalues can thus be computed from the matrix Hm = VmH AVm . Definition 4.2.1 The Ritz values are the eigenvalues of the Hessenberg matrix Hm = VmH AVm , defined in (2.16). (m)
Hence the Ritz values, denoted by ϑi , are the well-known “Arnoldi eigenvalue estimates”. We shall prove that the FOM residual polynomial ϕ˜ FOM (λ) is a multiple of the m characteristic polynomial of Hm . This implies that the Ritz values are the zeroes of the FOM residual polynomial. We restrict ourselves here to formulating the lemmas and the theorem. That theorem is a straightforward generalisation of a result by Paige et al. [PPdV95] for symmetric matrices. The proofs can be found in Appendix A. Lemma 4.2.1 Let χm (λ) = γm λm + + γ1 λ + γ0 be a polynomial of strict degree m, then χm (A)r0 2 K m+1 (A; r0 ) and χm (A)r0 62 K m (A; r0 ) and we have that χm (A)r0 = Vm χm (Hm )e1 + γm ζm vm+1 ;
(4.6)
where ζm = hm+1;m : : : h3;2 h2;1 . Lemma 4.2.2 All nonzero vectors in K m+1 (A; r0 ) that are orthogonal to K m (A; r0 ) can be written as αψm (A)r0 for some nonzero α 6= 0, where ψm (λ) = det (λI Hm ) is the characteristic polynomial of Hm . Theorem 4.2.1 The FOM residual polynomial is a multiple of the characteristic polynomial of the Hessenberg matrix Hm . (λ) is uniquely defined by the fact that Hence the FOM residual polynomial ϕ˜ FOM m (m) the m Ritz values ϑi are its zeroes and by the normalisation ϕ˜ FOM (0) = 1. Saad m [Saa83] has proved that the characteristic polynomial of Hm is the polynomial that minimises kψ(A)v1 k2 over all monic polynomials.
38
CHAPTER 4. RITZ AND HARMONIC RITZ VALUES
4.3 Simpler GMRES and the Transformation Matrix Walker and Zhou [WZ94] pointed out that by starting the Arnoldi process with Ar0 instead of r0 a simpler GMRES is obtained which does not require the factorisation of an upper Hessenberg matrix. First an orthonormal basis for the image of the Krylov subspace AK m (A; r0 ) is computed by starting the Arnoldi process with Ar0 . The upper triangular matrix Rm = (ρi; j ) 2 Rmm is defined by the scalars ρi; j computed in the orthogonalisation process. We have assumed that the dimension of K m (A; r0 ) is m, this implies that Rm is nonsingular. The columns of the matrix Zm = z0 z1 : : : zm 1 2 Rnm span K m (A; r0 ). The upper triangular matrix Rm , computed in the Arnoldi process, satisfies AZm = Wm Rm :
(4.7)
The columns of the orthogonal matrix Wm = z1 z2 : : : zm 2 Rnm span AK m (A; r0 ). The orthogonalisation of the residual vector rGMRES with respect to the subspace AK m (A; r0 ) can be done by orthogonalisation against z j for j = 1; 2; : : : ; m. The residual vector rm in Simpler GMRES satisfies WmH rm = 0 and m
z0 = r0 = rm + ∑ ξ j z j
= rm + Wm wm
(4.8)
j =1
H
is defined by the scalars that have where the vector wm = ξ1 ξ2 : : : ξm been computed during the orthogonalisation. The corresponding approximate solution xm is then given by Zm Rm 1 wm . We assume that ξm is nonzero. However if ξm = 0, the last vector zm is orthogonal to the residual rm . This means that GMRES stagnates and that the residual polynomial is not changed. In the remainder of this chapter we make use of the transformation matrix Tm 2 Rmm , which is given by 0
0 B 1 B
B B Tm = B B B
0 1
0 .. .
1=ξm ξ1 =ξm ξ2 =ξm .. .
..
. 1
0 1
1
C C C C C: C C ξm 2 =ξm A
ξm
(4.9)
1 =ξm
This matrix transforms wm into e1 and shifts e j 1 to e j for j = 2; 3; : : : ; m. The transformation matrix Tm is only required in combination with the upper triangular matrix Rm constructed in the Simpler GMRES algorithm. We define the square upper Hessenberg matrix R˜ m 2 Rmm as the product of Rm and Tm . Its explicit
4.4. HARMONIC RITZ VALUES
39
representation 0
ρ1;2 B ρ2;2 B
B R˜ m = Rm Tm = B B
ρ1;3 ρ2;3 ρ3;3
::: ::: :::
..
.
ρ1;m ρ2;m ρ3;m .. .
ρm;m
1
τ1 τ2 τ3 .. .
C C C C C A
(4.10)
τm
only requires the computation of the m scalars τi (i = 1; 2; : : : ; m). The determinant of R˜ m can easily be computed: det R˜ m = det Rm det Tm = ( 1)m+1
1 ξm
m
∏ ρj j ;
(4.11)
:
j =1
The results in the next two sections show that R˜ m is related to H¯ m as follows: R˜ m = Hm H H¯ mH H¯ m = Hm + h2m+1;m fm eH m:
(4.12)
4.4 Harmonic Ritz Values The definition of Harmonic Ritz values has been given by Sleijpen and Van der Vorst (x5.1 in [SVdV96]) and is motivated by the fact that the reciprocals of the Harmonic Ritz values are in the field of values of A 1 , whereas the Ritz values are in the field of values of A. This observation has been used by Manteuffel and Starke [MS96] to construct estimates of the spectrum. Since the Harmonic Ritz values arise from an implicit application of a Rayleigh–Ritz [Par80, Saa92] procedure to the inverted operator, it is clear that they can be used successfully for the computation of interior eigenvalues. This approach is followed in the modified Rayleigh-Ritz procedure for interior eigenvalues proposed by Morgan and Zeng [MZ98]. Definition 4.4.1 The Harmonic Ritz values are the reciprocals of the (ordinary) Ritz values of A 1 computed from AK m (A; r0 ). (m) The Harmonic Ritz values, denoted ϑ˜ i , are the eigenvalues of R˜ m , defined in (4.10), as can be seen as follows. We look for an approximate eigenvector x = Wm ym of A 1 in AK m (A; r0 ) and since WmH Zm = Tm 1 is the inverse of the transformation matrix Tm , the projected eigenvalue problem
A
1
x
µx
? AK m(A r0 ) , WmH ;
A 1Wm ym
µWm ym
=0
(4.13)
results in R˜ m ym = µ 1 ym . The Harmonic Ritz values provide approximations to the eigenvalues, as shown in Theorem 5.1 in [SVdV96], which is reformulated here.
CHAPTER 4. RITZ AND HARMONIC RITZ VALUES
40
Theorem 4.4.1 The Harmonic Ritz values are eigenvalue approximations according to the minimal residual criterion. Proof. We seek an approximate eigenvector x = Zm ym of A in K ing to the minimal residual criterion (Ax
µx) ? AK
m (A; r0 )
, TmRmym = µym
:
m (A; r0 ),
accord(4.14)
The eigenvalues of Tm Rm are also eigenvalues of the similar matrix Rm Tm . The Harmonic Ritz values can also be computed from (2.16) using the eigenvalue problem (see [Fre92, SVdV96]) H¯ mH H¯ m ym = µHmH ym
(4.15)
which is equivalent to
Hm + h2m+1;m fm eH m ym = µym :
(4.16)
Since fm = Hm H em we have that k fm k2 1=σmin(Hm ) where σmin (Hm ) is the smallest singular value of Hm . Hence we can bound the norm of the rank one update in (4.16)
kh2m
+1;m
f m eH m k2
h2m+1;m σmin (Hm )
:
(4.17)
The Harmonic Ritz values equal the Ritz values when an invariant subspace has been found, since in this case hm+1;m = 0. Equation (4.17) shows that the differences between the Ritz and Harmonic Ritz values can only be large when hm+1;m is large and when σmin (Hm ) is small, which is the case when GMRES stagnates. Paige et al. [PPdV95] showed that for a symmetric matrix the Ritz values interlace the Harmonic Ritz values and since both these values converge to the eigenvalues of the matrix we expect the differences between the Ritz and Harmonic Ritz values to be small in the case of a real, symmetric matrix. We show that the differences between the Ritz and Harmonic Ritz values can be large when GMRES stagnates.
4.5 GMRES Residual Polynomial In this section we show that the GMRES residual polynomial ϕ˜ GMRES (λ) is a mulm tiple of the characteristic polynomial of R˜ m . This implies that the Harmonic Ritz values are the zeroes of the GMRES residual polynomial. Lemma 4.5.1 If the polynomial χm (λ) = γm λm + + γ1 λ + γ0 has strict degree m, then χm (A)r0 2 K m+1 (A; r0 ) and χm (A)r0 62 K m (A; r0 ) and we have that χm (A)r0 = Wm χm (R˜ m )wm + γ0 rm ; where Wm and wm are defined by (4.8).
(4.18)
4.5. GMRES RESIDUAL POLYNOMIAL
41
Proof. We define u1 = Rm e1 . Then u1 = Rm Tm wm = R˜ m wm and u j = R˜ m u j 1 = j 1 j R˜ m u1 = R˜ m wm for j = 2; 3; : : : ; m. Since R˜ m is upper Hessenberg u j has nonzero entries only in its first j positions. An expression for Ar0 is readily available as this is the first vector computed in the Arnoldi process in Simpler GMRES: Ar0 = z1 ρ1;1 = Wm u1 = Wm R˜ m wm :
(4.19)
By induction we prove that Ak r0 = Wm uk = Wm R˜ km wm
(4.20)
for k = 1; 2; : : : ; m. Equation (4.19) shows that (4.20) is valid for k = 1. We assume that (4.20) is valid for k = 1; 2; : : : ; j and prove that it is also valid for k = j + 1. Using (4.7) we have for j < m A j+1 r0
=
H H Wm Rm e2 eH 1 u j + Wm Rm e3 e2 u j + + Wm Rm e j +1 e j u j
=
Wm Rm Tm u j = Wm R˜ m u j = Wm u j+1 = Wm R˜ mj+1 wm :
(4.21)
We use (4.20) in combination with the expansion (4.8) of the initial residual to find an expression for χm (A)r0 in terms of Wm and rm : χm (A)r0
= =
γmWm um + + γ1Wm u1 + γ0 (rm + Wm wm ) Wm χm (R˜ m )wm + γ0 rm :
(4.22) (4.23)
Equation (4.23) is the desired result. The last nonzero entry in u j is in position j and equals ζ˜ j = ρ1;1 ρ2;2 : : : ρ j; j 6= 0. Hence from (4.20) we can conclude that A j r0 = Wm u j = ζ˜ j z j + zˆ j with zˆ j
2 SPANfz1 z2 ;
;::: ;
zj
1
g for j = 1 2 ;
;::: ;
(4.24)
m.
Lemma 4.5.2 All nonzero vectors in K m+1 (A; r0 ) that are orthogonal to ˜ m (A)r0 for some nonzero α 6= 0, where ψ ˜ m (λ) = AK m (A; r0 ) can be written as αψ det λI R˜ m is the characteristic polynomial of R˜ m . Proof. Let the polynomial χ j (λ) = γ˜ j λ j + + γ˜ 1 λ + γ˜ 0 have strict degree j < m. If this polynomial has a nonzero constant term γ˜ 0 6= 0 then we can see from (4.22) and (4.24) that χ j (A)r0 has a nonzero component γ˜ 0 ξm zm along zm 2 AK m (A; r0 ) and thus is not orthogonal to AK m (A; r0 ). Recall that we have assumed that GMRES does not stagnate in the last step and this implies that ξm 6= 0 is nonzero. On the other hand if γ˜ 0 = 0 then we know from (4.24) that χ j (A)r0 has a nonzero component γ˜ j ζ˜ j z j along z j 2 AK m (A; r0 ) and thus is not orthogonal to AK m (A; r0 ). ˜ m (λ) = det λI R˜ m = γm λm + + γ1 λ + γ0 is the characteristic polynoSince ψ ˜ m (R˜ m ) = 0. Setting mial of R˜ m , we have by the Cayley-Hamilton theorem that ψ
42
CHAPTER 4. RITZ AND HARMONIC RITZ VALUES
˜ m (λ) in (4.18) we deduce that αψ ˜ m (A)r0 = αγ0 rm is the polynomial χm (λ) = αψ orthogonal to AK m (A; r0 ). Any polynomial ϕm (λ) of degree m that is not a scalar ˜ m (λ) can be written as ϕm (λ) = αψ ˜ m (λ) + χ j (λ) with χ j (λ) a nonzero multiple of ψ polynomial of degree j < m. We have that ˜ m (A)r0 + χ j (A)r0 = αγ0 rm + χ j (A)r0 ϕm (A)r0 = αψ which is not orthogonal to AK
(4.25)
m (A; r0 ).
Theorem 4.5.1 The GMRES residual polynomial is a multiple of the characteristic polynomial of the Hessenberg matrix R˜ m . (λ) = γm λm + + γ1 λ + γ0 Proof. Since the GMRES residual polynomial ϕ˜ GMRES m GMRES 2 K m+1 (A; r0 ) has degree m (4.18) yields an expression for r
˜ GMRES rGMRES = ϕ˜ GMRES (A)r0 = Wm ϕ (R˜ m )wm + γ0 rm : m m
(4.26)
(R˜ m ) = 0 is necessary to have rGMRES ? AK m (A; r0 ). By lemma Thus ϕ˜ GMRES m ˜ m (λ) must (λ) = αψ 4.5.2 we know that the GMRES residual polynomial ϕ˜ GMRES m be a scalar multiple of the characteristic polynomial of R˜ m in order to eliminate all the components of the residual in AK m (A; r0 ). The constant α can be deter˜ m (0) = 1 since ψ ˜ m (0) 6= 0 unless the dimension of (0) = αψ mined from ϕ˜ GMRES m K m (A; r0 ) is less than m or GMRES stagnates in the last step. The value of ψ˜ m (0) can easily be computed from (4.11)
˜ m (0) = ψ
1 det R˜ m = ( 1)m ξm
m
∏ ρj j ;
:
(4.27)
j =1
We obtain the following expression for the residual polynomial ϕ˜ GMRES (λ) = m
˜ m (λ) ψ : ˜ m (0) ψ
(4.28)
This completes the proof. Hence the GMRES residual polynomial is uniquely defined by the fact that its (m) m zeroes are the Harmonic Ritz values ϑ˜ i and by the normalisation ϕ˜ GMRES (0 ) = m 1. Related theoretical results have been proved by Freund [Fre92, FGN92] and by Manteuffel and Otto [MO94]. The GMRES residual polynomial is a standard kernel polynomial and its zeroes can be computed as the eigenvalues of the generalised problem (4.15) as was shown by Freund [Fre92]. However, Theorem 4.5.1 shows that these zeroes can be computed efficiently within the Simpler GMRES algorithm. The point made in this chapter is that the Walker–Zhou interpretation of GMRES leads to a simpler analysis for the involved Arnoldi matrices.
4.6. EXAMPLE 1: STAGNATION OF GMRES
43
4
3
2
1
0
−1
−2
−3
−4 −5
−4
−3
−2
−1
0
1
2
3
Figure 4.1: Ritz () and Harmonic Ritz (+) values from K 1.
4
5
10 (A; r0 )
for example
4.6 Example 1: Stagnation of GMRES We describe breakdown of FOM and stagnation of GMRES in terms of the Ritz and Harmonic Ritz values. To illustrate the crucial points we use the following example. Consider the linear system An xn = bn where 0 B B
0
An = B 1
1 .. .
1
..
. 0
C C C 1 A
0
1
0
2 Rnn and
0 B . C B .. C C 0 A
bn = B
2 Rn
:
(4.29)
1
The Arnoldi process computes (2.16) by generating an orthonormal basis for H the Krylov subspace starting from v1 = 0 : : : 0 1 . After the first step H and FOM breaks down. Hence the Ritz we have v2 = 0 : : : 0 1 0 value ϑ(1) = 0 equals zero. The FOM residual polynomial evaluated for nonzero λ shows that the norm of the residual krFOM k2 may increase without bound when FOM encounters a breakdown. In this case GMRES stagnates. The Harmonic Ritz value ϑ˜ (1) = ∞ is set equal to infinity. Hence the GMRES residual polynomial evaluated for finite λ shows the stagnation. In Simpler GMRES the stagnation can be seen from the fact that the corresponding projection ξ1 = 0 is zero.
CHAPTER 4. RITZ AND HARMONIC RITZ VALUES
44
2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
Figure 4.2: Ritz () and Harmonic Ritz (+) values from K 1.
2
2.5
19 (A; r0 )
for example
In the following numerical experiment we are concerned with the eigenvalue estimates when GMRES nearly stagnates. The matrix An is given in (4.29) and the H 2 Rn . The initial guess x0 = 0 right-hand side is bn (ε) = ε : : : ε 1 + ε is the zero vector. Our interest is in the convergence behaviour for small positive ε > 0. The eigenvalues of An satisfy λn = 1. Hence in the complex plane they form a regular n-polygon on the unit circle. The numerical results in this section were obtained with ε = 10 6 and n = 20. The norm of the residual krGMRES k2 decreases from 1 2:0 10 12 to 1 3:8 10 11 in 19 steps. For m < n the matrices H¯ m and Vm can be approximated by H¯ m 2 R(m+1)m and Vm 2 Rnm : 0 0
2ε 2ε B 1 B B 1 H¯ m = B B
:::
..
.
1
B B B C B C B C C and Vm = B B C B A B B 1
2ε
ε .. .
ε ε ε
ε .. .
ε ε ε
ε 1+ε ε 1+ε
:::
:::
::
:
:::
ε .. .
1
C C C ε C C 1+ε C C: ε C C .. C . A
ε
(4.30)
4.7. EXAMPLE 2: A CONVECTION-DIFFUSION PROBLEM
45
Table 4.1: Estimated and computed norm of the Ritz and Harmonic Ritz values from K m (A; r0 ) for example 1. m 10 19
min 0.263 0.491
Ritz Values est. max 0.269 0.278 0.501 0.521
Harmonic Ritz Values min est. max 3.595 3.714 3.802 1.919 1.995 2.037
p
With these approximations we see that the Ritz values are jϑ(m) j = m 2ε . It is easy to show that the Ritz values approximately form a regular m-polygon in the complex plane. In this case the Harmonic Ritz values are the reciprocals of the complex conjugates pm of the ordinary Ritz values ϑ(m) . The Harmonic Ritz values (m) ˜ are jϑ j = 1= 2ε , going off to infinity as ε tends to zero. Table 4.1 shows the quality of these estimations which increases with decreasing ε and decreasing m. From (4.30) we estimate σmin (Hm ) 2ε and hm+1;m 1. With this result we see that kh2m+1;m fm eH m k2 1=(2ε), showing that the bound in (4.17) can be reached. We have kHm k2 1 and kHm + h2m+1;m fm eH m k2 1=(2ε). Hence we cannot expect the Ritz values and the Harmonic Ritz values to be equal. Since GMRES stagnates, the norm of the FOM residual krFOM k2 1=(2ε) is large. These results show that the differences between the Ritz and Harmonic Ritz values are significantly large when GMRES (nearly) stagnates. To illustrate this we show in Fig. 4.1 the Ritz and Harmonic Ritz values computed from K 10 (A20 ; b20 (10 6 )). The Harmonic Ritz values are plotted using a plus sign (+), while the Ritz values are shown with a times sign (). An asterisk () is used to plot the eigenvalues. Figure 4.2 shows the Ritz and Harmonic Ritz values computed from K 19 (A20 ; b20 (10 6 )).
4.7 Example 2: A Convection-Diffusion Problem The aim is to compute the steady-state solution of a linearised convection-diffusion problem. The problem is formulated as follows: given a divergence-free (∇~w = 0) convective velocity field ~w, find a scalar variable u satisfying ν∇2 u + ~w∇u = f in Ω
(4.31)
with a Dirichlet boundary condition u(x) = g(x) on ∂Ω. A 2D convection dominated convection-diffusion problem is solved on a uniform psquarepgridwith mesh width h = 1=64. The constant velocity field ~w = 1= 2 1= 2 has a 45Æ inclination with the grid lines. The diffusion parameter was set to ν = 10 6 , resulting in a “mesh Peclet number” of Pe = kwk2 h=(2ν) = 7812:5 > 1, which shows that the convection is dominant.
CHAPTER 4. RITZ AND HARMONIC RITZ VALUES
46
2
10
0
10
−2
10
−4
10
−6
10
−8
10
−10
10
−12
10
−14
10
0
20
40
60
80
100
120
Figure 4.3: Norm of the residual jjrGMRES jj2 as a function of m for example 2. After SUPG finite element discretisation a linear system Au = f
(4.32)
is obtained in which the unknown u is the discretised version of the unknown in (4.31) and the vector f corresponds to the right-hand side. The nonsymmetric matrix A = νL + N is the sum of the discretisation of the diffusive term and the skewsymmetric discrete version of the convection operator N. The matrix L = ∇2h + S is the sum of the discretised Laplacian and a matrix S corresponding to stabilisation terms, since streamline upwinding is used. Details about the discretisation can be found in [ESW96]. We also give a more detailed description in Appendix B. We use unpreconditioned GMRES to solve the linear system (4.32). Figure 4.3 shows the residual norm krGMRESk2 as a function of the dimension m of K m (A; r0 ). In Figs. 4.4 – 4.6 the Ritz value ϑ(i m) = 1 and the Harmonic Ritz (m) value ϑ˜ j = 1 are not shown. The Harmonic Ritz values are plotted using a plus sign (+), while the Ritz values are shown with a times sign (). The Krylov
4.8. CONCLUSION
47
0.015
0.01
0.005
0
−0.005
−0.01
−0.015 0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
Figure 4.4: Ritz () and Harmonic Ritz (+) values from K 2.
0.018
130 (A; r0 )
for example
subspace K 130 (A; r0 ) of dimension m = 130 yields an accurate solution. Hence the differences between the Ritz and Harmonic Ritz values are small. Figure 4.4 shows the Ritz and Harmonic Ritz values computed from K 130 (A; r0 ). To illustrate that the Ritz and Harmonic Ritz values differ significantly when GMRES (nearly) stagnates, we show in Fig. 4.5 the Ritz and Harmonic Ritz values computed from K 41 (A; r0 ). Figure 4.6 shows the Ritz and Harmonic Ritz values computed from K 80 (A; r0 ).
4.8 Conclusion Nachtigal et al. [NRT92b] have presented a number of arguments why the GMRES residual polynomial should be used instead of Arnoldi eigenvalue estimates. They compute the coefficients of the polynomial explicitly by transforming back to the Krylov (power) basis and incorporate a root-finding step in their hybrid algorithm. However finding the roots of a polynomial is an ill-conditioned problem. The zeroes of the GMRES residual polynomial can be computed by solving an eigenvalue problem. This is not only a cheap but also a stable procedure. Saylor and Smolarski [SS91] have also presented an adaptive algorithm which is a hybrid combination of GMRES and Richardson’s method, but they prefer using roots that are in the field of values of A and point out that this is the crucial difference between their algorithm and the one by Nachtigal et al. [NRT92b]. Toh and Trefethen [TT96] advocate the use of H¯ m because it bypasses the usual
CHAPTER 4. RITZ AND HARMONIC RITZ VALUES
48
0.015
0.01
0.005
0
−0.005
−0.01
−0.015 0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
Figure 4.5: Ritz () and Harmonic Ritz (+) values from K 2.
0.018
41 (A; r0 )
for example
consideration of Ritz values or “Arnoldi eigenvalue estimates”. For highly nonnormal matrices we cannot expect the Arnoldi iteration to be effective at determining eigenvalues. On the other hand, Greenbaum et al. [GPS96] have shown that eigenvalues cannot be used to predict the convergence of GMRES for highly nonnormal matrices. Toh and Trefethen have pointed out that plotting the Ritz values in order to analyse the convergence of nonsymmetric Krylov solvers is not sufficient. We suggest that both the Ritz spectrum and the Harmonic Ritz spectrum should be considered when analysing the convergence of nonsymmetric Krylov solvers. The cost of computing the Harmonic Ritz values only depends on m, which in practice is small, while the computation of pseudospectra and lemniscates is usually far too expensive.
4.8. CONCLUSION
49
0.015
0.01
0.005
0
−0.005
−0.01
−0.015 0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
Figure 4.6: Ritz () and Harmonic Ritz (+) values from K 2.
0.018
80 (A; r0 )
for example
Chapter 5
Nested Krylov Subspace Methods To iterate is human, to recurse, divine.
In this chapter we present our nested Krylov subspace method.
5.1 Introduction An attractive way to extract a near-optimal approximation from a high dimensional Krylov subspace while keeping memory and computational requirements reasonably low, is to use nested Krylov subspace methods. Recently two nested Krylov methods, FGMRES/GMRES and GMRESR have been proposed. FGMRES is proposed by Saad [Saa93] and allows the use of a preconditioner which can be different in every iteration. GMRESR has been proposed by Van der Vorst and Vuik [VdVV94], and is based on the Generalized Conjugate Residual (GCR) method described by Eisenstat et al. [EES83]. In both these nested methods GMRES is used as the variable preconditioner. In general the search directions in these methods are different but the convergence behaviour is nearly the same. The objective is to compute quasi-optimal approximations by storing only a few wellchosen vectors. We also mention the Generalized Conjugate Gradient method by Axelsson and Vassilevski [AV91] which is very close to the GMRESR method. The flexible method (FGMRES or GCR) used to solve the given system is referred to as the outer iteration, while the preconditioner (GMRES) is referred to as the inner iteration. The schemes can be improved by observing that the right-hand side in the inner iteration is orthogonal with respect to some subspace generated 51
52
CHAPTER 5. NESTED KRYLOV SUBSPACE METHODS
in the outer iteration and therefore it is possible to orthogonalise with respect to this subspace in the inner iteration. This leads to FGMRES/EGMRES, which we describe in this chapter, and GCRO [dS96a]. For the same number of matrixvector multiplications these new methods always yield a solution which is at least as accurate as the one obtained in the absence of the additional orthogonalisations. In terms of CPU time, the methods with the additional orthogonalisations slow down as the number of outer iterations increases. The reason being that the cost of the additional orthogonalisations increases with the number of vectors kept in the outer iteration and the gain in residual norm reduction is limited. Therefore it is only advantageous to perform the additional orthogonalisations in the first few steps of the outer iteration when the overhead is small. Due to the additional orthogonalisations, breakdown may occur. This can be seen in GCRO where a singular operator is used to construct a Krylov subspace in the inner iteration. Breakdown is rare in practice, since it cannot occur before the total number of matrix-vector multiplications in the nested Arnoldi process exceeds the dimension of the Krylov space K m (A; r0 ).
5.2 Nested Iterations There are two different ways to construct a nested iteration based on FGMRES or GCR. The first is based on the residual of the approximate solution in every step of the outer iteration and the second is based on the last vector generated in the Arnoldi process. It is clear that the solution to the original system can easily be obtained as x = x j + z, if the linear system Az = r j ;
(5.1)
where r j = b Ax j is the residual corresponding to the approximate solution x j , can be solved exactly. This residual vector based approach is followed in GMRESR, where the outer iteration is GCR and the inner iteration is GMRES. The second approach is based on the observation that an optimal method such as FGMRES finds the exact solution, if the preconditioning is exact in one step of FGMRES, i.e. if the linear system Az = v j
(5.2)
is solved exactly by the preconditioner. A proof can be found in a paper by Saad [Saa97]. A shorter proof is as follows. Since the FFOM residual vector is a scalar multiple of this vector v j , it is clear that FFOM will find the exact solution to the original system if the preconditioner returns z = A 1 v j by the same residual equation argument as mentioned above. If FFOM finds the exact solution, so does FGMRES because it is an optimal method.
5.3. EXTENDED GMRES
53
Constructing a nested iteration based on (5.2) would be the obvious approach when FGMRES is used in the outer iteration. A nested iteration using FGMRES based on (5.1) requires the availability of the residual vector r j . In the classical implementations by Saad of GMRES [SS86] and FGMRES [Saa93] only the norm of the residual
jjr j jj2 = js1 s2
:::
s j jjjr0 jj2 ;
(5.3)
is available during the iteration. This norm is computed from the si matrix elements of the Givens rotations used to zero out the subdiagonal elements of the upper Hessenberg matrix H¯ j . A derivation of (5.3) can be found in [Bro91] and in [Saa96]. The residual vector can be computed in GMRES and FGMRES without forming the approximate solution x j and without a matrix-vector multiplication. This is done by applying the Givens rotations mentioned higher to the initial residual vector and scaling this vector appropriately. Since r0 = jjr0 jj2 v1 we can start by setting t1 = v1 =
r0 jjr0 jj2
(5.4)
and update this vector during the iteration by applying the Givens rotation tj
= s j t j 1 + c j v j+1 :
(5.5)
The residual vector is then given by rj
= r0
AZ j y j = js1 s2 : : : s j jjjr0 jj2t j :
(5.6)
Hence the residual vector is available during the iteration at the cost of one extra vector update. We remark that in the Simpler GMRES variant due to Walker and Zhou [WZ94] the residual vector is available during the iteration (see Section 4.3). The same idea can be used to construct a Simpler FGMRES variant which also results in the availability of the residual vector during the iteration.
5.3 Extended GMRES At the beginning of step j, the outer iteration has computed (2.21) for m = j 1. The inner iteration solves (5.2) approximately to find z j in order to obtain (2.21) for m = j. This is done using GMRES with additional orthogonalisations so that the vectors in the Arnoldi process are kept orthogonal with respect to the columns of the matrix V j 1 . We call this inner GMRES iteration Extended GMRES
CHAPTER 5. NESTED KRYLOV SUBSPACE METHODS
54
(EGMRES). The inner EGMRES iteration performs mi steps and we end up with the relation ( j)
AZ j+mi
( j) ¯ ( j) 1 = V j +mi H j +mi 1
(5.7)
( j)
where H¯ j is upper Hessenberg and
( j)
1
=
V j+mi
=
Z j+mi ( j)
z1 ( j)
v1
z2
:::
( j)
v2
zj
( j)
( j)
( j)
vj
:::
( j)
v j+1
vj
1
v j+1
( j)
v j+mi
:::
( j)
v j+mi
:::
:
1
(5.8) (5.9)
( j)
where vi = vi for i = 1; : : : ; j. The vertical bar indicates the separation between ( j) vectors from the outer and the inner iteration. The matrix V j+mi is orthogonal due to the additional orthogonalisations. Using (5.7) EGMRES computes the approximate solution to (5.2) from a least squares problem min jjAz z
( j) v j jj2 = min jjH¯ j+mi
y
e j jj2 ;
1y
(5.10)
( j)
where z = Z j+mi 1 y. The least squares problems in (5.10) has dimension ( j + mi ) ( j + mi 1). We only have to determine the last mi components of its solution since we are only interested in the component of z j orthogonal to the columns of Z j 1 so we do not have to compute the first ( j 1) components of the vector y. We need to introduce some notation before stating the next theorem. The comT ponents of the vector y = η1 η2 : : : η j+mi 1 are denoted by ηi . We define the vector zEj
j +mi 1
=
∑
( j)
ηi vi
(5.11)
:
i= j
The next theorem states that this vector zEj contains the essential components of the approximate solution computed in the inner iteration. We denote by PW (x) the orthogonal projection of the vector x onto the subspace W ? (the orthogonal complement of the subspace W ). ?
Theorem 5.3.1 Suppose FGMRES is preconditioned by EGMRES. The approxi( j) mate solution to (5.2) computed by EGMRES is z j = Z j+mi 1 y. The next vector generated by the outer FGMRES iteration is v j+1 = PV (Az j ). If, instead of con?
j
tinuing the outer iteration with z j , we would continue it with zEj , the next vector computed by the outer FGMRES iteration would vEj+1 = PV (AzEj ). We have that ?
j
vEj+1 = v j+1 :
(5.12)
5.4. BREAKDOWN AND STAGNATION
Proof. From (5.8) and (5.11) we have that z j
55
zEj
=
j 1
∑i=1 ηi zi
2 Zj
1.
After
multiplication of this expression by A we deduce from (2.21) that A z j zEj 2 AZ j 1 V j . Consequently the orthogonal projection of this vector onto the orthogonal complement of the subspace V j is zero: PV
?
A zj
zEj
=
0. This
implies that and z j generate the same v j+1 vector. The main implication of this theorem is that in the computation of the solution vector in (5.10) we can save 2( j 1)n flops. We can also save on some Givens rotations since we do not need the complete solution of the least squares problem, but the work associated with solving this problem is negligible since it only depends on the number of iterations on not on the problem size n. Compared to GMRES ( j 1) additional orthogonalisations are required in the EGMRES preconditioning step. Hence the extra cost is 4mi ( j 1)n flops, where mi is the number of step done in the inner iterations. If mo is the number of steps in the outer iteration, the total overhead of using EGMRES instead of GMRES is O (2mi m2o n) flops. The numerical results show that in general FGMRES(EGMRES) requires fewer iterations and matrix vector multiplications than FGMRES(GMRES) thus making the overall algorithm faster for modest sizes of mo , the number of outer iterations. When the number of outer iterations becomes large, it turns out to be advantageous to switch from FGMRES(EGMRES) to FGMRES(GMRES). j
zEj
5.4 Breakdown and Stagnation 5.4.1 Types of breakdown In GMRES and FGMRES breakdown occurs when hm+1;m = 0:
(5.13)
The algorithm breaks down because no vector vm+1 is computed. In GMRES we can only have a so-called lucky breakdown since nonsingularity of the matrix A implies nonsingularity of the Hessenberg matrix Hm . Hence we have found an invariant subspace and we can compute the exact solution from it. In FGMRES the nonsingularity of Hm depends on the search space Zm and we have a lucky breakdown if Hm nonsingular. In this case the exact solution can be computed. A true breakdown occurs if and only if Hm is singular and this can be avoided by ensuring that Zm has full rank. It is clear we have to enrich the search space every iteration. If we take this into account, breakdown is not really a problem in FGMRES.
56
CHAPTER 5. NESTED KRYLOV SUBSPACE METHODS
5.4.2 LSQR switch A straightforward approach to avoid breakdown is to use the LSQR [PS82] switch, which always leads to a decrease of the residual norm. In the outer iteration the vector z j = AH t j
(5.14)
is added to the search space, where t j is the normalised residual vector computed by (5.5). Introducing z j = AH v j would not be useful, because we cannot guarantee that adding this vector leads to a reduction of the residual norm in the outer FGMRES iteration. In the case of the example discussed in Section 5.4.3 introducing this vector in the outer FGMRES iteration would propagate the stagnation to the outer iteration. This is clearly undesired. Another reason for introducing the LSQR switch could be that one would like to avoid stagnation. This may occur when the inner iteration is not able to reduce the residual norm sufficiently and it cannot be guaranteed that a better approximation can be computed in the outer iteration. The LSQR switch is essentially one step of an algorithm based on the normal equations. It is well known that the normal equations can be ill-conditioned and that methods based on the normal equations often converge slowly. In general one tries to avoid the use of the transpose of the matrix. The combination of FGMRES and GMRESR allows the construction of a transpose-free cure for breakdown.
5.4.3 Transpose-free cure for breakdown Description Our transpose-free cure for breakdown is to start the Arnoldi process in the inner ( j) iteration with v j+mi , the vector in the highest dimensional Krylov subspace sought so far, i.e. the result of the last matrix-vector multiplication in the previous inner ( j) iteration orthogonalised with respect to the other columns of the matrix V j+mi . The Inner EGMRES iteration solves the normalised residual equation Az j = t j ;
(5.15)
where t j is the normalised residual vector computed by (5.5) and we make sure Z j has full rank. Example: Stagnation of GMRES We consider the linear system (4.29), which was used to describe stagnation of GMRES in Section 4.6. For this system the standard FGMRES/GMRES,
5.5. NUMERICAL RESULTS
n method
57
Table 5.1: Performance Results for GMRES, FG and FEG 32 64 128 G FG FEG G FG FEG G FG
ν Pe (α) MFLOP iterations ν Pe (α) MFLOP iterations ν Pe (α) MFLOP iterations
FEG
1.56 (α = 0:18) 10.9 9.82 8.67 64 15 11
ν = 10 2 0.781 (α = 0) 124 55.8 52.1 114 21 15
0.391 (α = 0) 1642 363 436 215 33 25
156 (α = 0:497) 16.8 11.2 10.9 81 17 13
ν = 10 4 78.1 (α = 0:494) 152 61.5 67.6 127 23 18
39.1 (α = 0:487) 1642 375 463 215 34 26
15625 (α = 0:5) 16.8 11.2 10.9 81 17 13
ν = 10 6 7813 (α = 0:5) 159 64.4 67.6 131 24 18
3906 (α = 0:5) 1763 400 519 223 36 28
GMRESR and GCRO algorithms break down and can only be fixed with LSQR switch! The FGMRES/GMRES algorithm solves (5.2). The inner GMRES iteration stagnates after mi < n iterations and no nonzero vector z1 is returned and (1) hence no vector v2 can be computed. Continuing on Az = vmi +1 is useless since this system is equivalent to the original problem (4.29) after mi shifts. The GMRESR and GCRO algorithms solve (5.1). The inner GMRES iteration also stagnates after mi < n iterations, resulting in no nonzero vectors z1 and v2 . The problem for the next outer iteration would of course be the same residual equation again. Our strategy forces the inner GMRES iteration to go through the entire Krylov space and finds the solution without using the transpose of the matrix.
5.5 Numerical Results We have used (full) GMRES, FGMRES/GMRES and FGMRES/EGMRES to solve the linear system (4.32), which results after SUPG finite element discretisation of the 2D Convection-Diffusion testcase introduced in Section 4.7. This testcase is described in more detail in Appendix B. We denote the nested schemes as FG (shorthand for FGMRES/GMRES) and FEG (shorthand for FGMRES/EGMRES). In Table 5.1 we compare performance results for GMRES, FG and FEG. The mesh size is h = 1=n, for n = 32, n = 64 and n = 128. The diffusion parameter
CHAPTER 5. NESTED KRYLOV SUBSPACE METHODS
58
0.01 FGMRES/EGMRES FGMRES/GMRES GMRES 0.0001
1e-06
1e-08
1e-10
1e-12
1e-14 50
100
150
200
250
300
Figure 5.1: Norm of the residual jjrm jj2 as a function of the dimension m of the Krylov subspace for FG and FEG compared to full GMRES. was set to ν = 10 2 , ν = 10 4 and ν = 10 6, resulting in different mesh Peclet numbers, so both convection dominated and diffusion dominated problems were solved. The convergence criterion was set to
jjrmjj2 10 jjr0jj2
12
:
(5.16)
We list the number of iterations required to satisfy (5.16) and the number of floating point operations performed (in million, i.e. MFLOP). The inner iteration is GMRES(10) with no restart, i.e. 10 matrix-vector multiplications are done in the inner iteration. We list the number of outer iterations mo for the nested schemes, so if FGMRES requires 15 iterations, a Krylov subspace of dimension 15 10 = 150 is used. Clearly the nested schemes search a larger Krylov subspace than full GMRES. This can be seen in Table 5.1 by observing that the number of iterations GMRES needs is always smaller that 10 times the number of iterations FGMRES needs. However in terms of floating point operations GMRES looses. Both nested schemes need less floating point operations to find a very accurate solution than full GMRES. These results also illustrate that FEG does a better job approximating GMRES
5.5. NUMERICAL RESULTS
59
Table 5.2: The number of outer iterations mo and the dimension k of the subspace used as a function of the dimension i in the inner iteration. i 1 2 3 4 5 8 9 10 11 12 13 15 20 25
mo 215 119 108 75 60 40 36 33 30 28 26 23 19 17
GMRES(i) k = mo i MFLOP 215 1624 238 808 324 765 300 502 300 419 320 362 324 359 330 363 330 362 336 371 338 377 345 396 380 482 425 598
mo 215 108 73 56 46 30 27 25 23 21 20 18 15 13
EGMRES(i) k = mo i MFLOP 215 1624 216 1327 219 864 224 678 230 586 240 455 243 437 250 436 253 428 252 414 260 425 270 437 300 499 325 569
than FG, since the number of iterations for FEG is always smaller than for FG. However if the number of outer iterations mo becomes large FG beats FEG in terms of floating point operations. Figure 5.1 shows the residual norm krk2 as a function of the dimension m of K m (A; r0 ) for the problem with n = 128 and ν = 10 2. These convergence histories clearly show that FEG does a better job approximating the convergence of full GMRES, than does FG. The FG method stagnates for a longer time, i.e. the nearly horizontal plateau in the convergence history is longer, and the convergence factor is larger, resulting in slower convergence. The figure clearly shows that the convergence curve for FG is not as steep as for FEG. In Table 5.2 we vary the number of matrix-vector multiplications, i.e. the dimension of the Krylov subspace, in the inner iteration for the same linear system. The table shows the number of outer FGMRES iterations mo , the dimension k = mo i of the Krylov subspace sought and the number of floating point operations. We compare FG, with inner iteration GMRES(i) with no restart, and FEG, with inner iteration EGMRES(i) with no restart. The case i = 1 corresponds to full GMRES and no nested iteration. It is interesting to see that for small values of i the space searched by FEG is a lot smaller than the space searched by FG and very close the the space searched by full GMRES. These results also support our claim that FEG approximates full GMRES better than FG. They also illustrate that in terms of floating point opera-
60
CHAPTER 5. NESTED KRYLOV SUBSPACE METHODS
tions FG beats FEG unless the number of outer iterations is small.
5.6 Conclusion We have shown that using a nested Krylov subspace method is an attractive way to extract a near-optimal approximation from a high dimensional Krylov subspace while keeping memory and computational requirements reasonably low. The FGMRES/EGMRES nested Krylov method we proposed in this chapter, is comparable to the GCRO [dS96a] method, but differs from it since we use another vector to start the inner iteration. The inner GMRES iteration approximately solves the normalised residual equation (5.15) instead of (5.2) which would be expected for an outer FGMRES iteration. These choices also provide a transpose-free solution for breakdown. Our FGMRES/EGMRES is a better approximation to full GMRES than FGMRES/GMRES, but when the number of outer iterations becomes large the cost of the additional orthogonalisations increases while the gain in residual norm reduction is limited. Therefore it is only advantageous to use the EGMRES inner iteration in the first few steps of the outer iteration when the overhead is small and to switch to the inner GMRES iteration when the number of outer iterations becomes large.
Chapter 6
A Krylov Subspace Iterative Solver for the Shallow Water Equations 355/113 - Not the famous irrational number PI, but an incredible simulation!
This chapter describes research by the author on a Krylov subspace method in the context of D ELFT 3 D.
6.1 Introduction D ELFT 3 D is Delft Hydraulics’ fully-integrated simulation program for the modeling of water flows, water quality, sediment transport, waves, morphology, and ecology. The D ELFT 3 D - FLOW submodule simulates tidal and wind-driven flow in rivers, lakes, estuaries, shallow seas and coastal areas. The equations underlying D ELFT 3 D - FLOW are the shallow water equations in both two (2D, depth-averaged) and three (3D) dimensions. In the D ELFT 3 D - FLOW software, time integration is done by an Alternating Operator Implicit (AOI) method, in which at every time step the ordering of explicit and implicit steps leads to a system of equations for the water elevation. Until recently this linear system was solved by an Alternating Direction Implicit (ADI) iteration process, which does not converge very well for large time steps and small mesh widths. Note that ADI iterative methods for the solution of a linear system are not to be confused with ADI time integration methods. We implemented a 61
62
CHAPTER 6. SHALLOW WATER EQUATIONS SOLVER
robust solver by using a Krylov subspace method with the ADI method acting as a preconditioner. As mentioned, the equations solved in D ELFT 3 D - FLOW are the 2D and 3D shallow water equations. From a physical point of view, there is no reason to carry out the simulation with small time steps for applications in which the hydrodynamics shows a weakly dynamic behaviour. Ideally, time steps should only be limited by accuracy demands and not by constraints imposed by the numerical method so that time steps can be large when the behaviour is weakly dynamic. In practice, numerical methods have numerical time step constraints. This is also the case for D ELFT 3 D - FLOW, which has several physical and numerical time step constraints. The strongest of these constraints determines the largest time step which can be used in the simulation. If the time step does not satisfy all the constraints, numerical instabilities might occur. The aim of the research presented in this chapter was to improve the robustness of the numerical methods in the D ELFT 3 D - FLOW code, by alleviating the numerical time step constraint which becomes dominant in the case of a longterm simulation of a weakly dynamic system. This was achieved by implementing a different algorithm for the solution of the systems of equations occurring in D ELFT 3 D - FLOW. Long-term simulations showed that our new method allows the use of larger time steps, which results in a substantial decrease of the required CPU time. A similar technique was used to accelerate the convergence and to improve the robustness of a domain decomposition method, in which the above-mentioned algorithm is used to solve the subdomain problems.
6.2 Shallow Water Equations The Shallow Water Equations (SWE) are a set of nonlinear partial differential equations, describing long waves relative to the water depth. Physical phenomena such as tidal waves in rivers and seas, breaking of waves on shallow beaches and even harbour oscillations can be modelled successfully with the SWE. The 3D SWE (6.1)–(6.3) given below are based on the hydrostatic assumption, that the influence of the vertical component of the acceleration of the water particles on the pressure can be neglected. In Cartesian (ξ; η) coordinates they read: ∂u ∂u ω ∂u ∂u +u +v + ∂t ∂ξ ∂η H ∂σ
∂ζ ∂ξ
∆u = 0
(6.1)
∂v ∂v ω ∂v ∂ζ ∂v +u +v + + fu+g ∂t ∂ξ ∂η H ∂σ ∂η
∆v = 0
(6.2)
fv+g
∂ζ ∂(Hu) ∂(Hv) ∂ω + + + ∂t ∂ξ ∂η ∂σ
=0
(6.3)
6.3. ALTERNATING OPERATOR IMPLICIT METHOD
where ∆u = νH
∂2 u ∂2 u + ν H ∂ξ2 ∂η2
+
1 ∂ ∂u νV 2 ∂σ ∂σ H
63
:
(6.4)
We denote by ζ the water elevation above some plane of reference, hence the total water depth is given by H = d + ζ;
(6.5)
where d is the depth below this plane of reference. The scaled vertical coordinate σ=
z ζ d +ζ
(6.6)
varies between 1 at the bottom and 0 at the free surface. The velocities in the ξ- and η-directions are denoted by u and v respectively, while ω represents the transformed vertical velocity. The parameter f accounts for the Coriolis force due to the rotation of the Earth. The viscosity is modelled using νH and νV . In each σ-plane νH models the “horizontal” viscosity, while νV describes the viscosity in the vertical (σ) direction. For a more detailed description of the SWE as solved by D ELFT 3 D - FLOW we refer to the report by de Goede [dG95].
6.3 Alternating Operator Implicit Method For the time integration we use the two-stage Alternating Operator Implicit (AOI) time splitting method, which has been developed at Delft Hydraulics and is described by de Goede [dG93]. This method is unconditionally stable and second order accurate in time. In the first stage (most of) the advection and diffusion terms in the momentum equations are handled implicitly, while the continuity equation is integrated explicitly. The resulting linear systems for the intermediate u and v are solved by Red-Black Gauss-Seidel iterations. During the second stage the continuity equation is treated implicitly. Substitution into the continuity equation of the momentum equations, in which the velocity components are now handled explicitly, leads to a nonlinear system for the water elevation ζ. For each time step n, we perform Q fixed point iterations to solve this nonlinear system. Introducing an iteration counter q (q = 1; 2; : : : ; Q) and multiplying the pressure terms in the momentum equations with H (n;q) =H (n;q+1), we obtain I
∂2 νξ 2 ∂ξ
∂2 νη 2 ∂η
!
ζ(n;q) = f ;
(6.7)
where the right-hand side f involves previously computed values and where ζ(n;q) denotes the water elevation at iteration q of time step n. In the remainder of the chapter we drop the superscripts. The imposed boundary conditions might be of
64
CHAPTER 6. SHALLOW WATER EQUATIONS SOLVER
Neumann type (e.g. closed wall) which could lead to a nonsymmetric linear system after discretisation. The pseudo viscosities νξ and νη mainly depend on the time step and the total depth, which makes the linear system nonsymmetric. Since the classical five point star stencil is used, a discrete equation of the form (x) (y) (bi; j + bi; j + bi; j )ζi; j + ai; j ζi 1; j + ci; j ζi+1; j + di; j ζi; j 1 + ei; j ζi; j +1 = f i; j
(6.8)
is obtained for each grid point (i; j) and the resulting linear system has a pentadiagonal structure. In practice it often suffices to take Q = 2. Until recently ADI iterative methods were used in D ELFT 3 D - FLOW-AOI to solve system(6.7). This requires the definition of H- and V -operators. The H operator is defined as the operator working along the i (x or more generally ξ) direction and the V operator works along the j (y or more generally η) direction. Thus the H operator uses b, b(x) , a, c and the V operator uses b, b(y) , e, f . For large time steps or small mesh widths νξ and νη become large and the ADI iteration method might fail to converge. The use of Krylov subspace methods overcomes this convergence problem. The main topic addressed in the next sections of this chapter is the application of a Krylov subspace method to solve (6.7). The original ADI iterative method is used as a preconditioner in the Krylov solver.
6.4 Alternating Direction Implicit 6.4.1 Basic Iterative Method The Alternating Direction Implicit method (ADI) was introduced by Peaceman and Rachford [PR55] for solving linear systems arising from finite difference discretisations of elliptic and parabolic partial differential equations. Consider the elliptic PDE on a rectangular domain with Dirichlet boundary conditions
∂u ∂ a(x; y) ∂x ∂x
∂ ∂u + b(x; y) ∂y ∂y
= f (x; y):
(6.9)
After discretisation with finite differences a linear system of the form (H + V )u = b
(6.10)
is obtained. H and V are matrices representing the discretisation of ∂ a(x; y) ∂u ∂x ∂x ∂ ∂u and b(x; y) respectively. Hence H can be interpreted as an horizontal op∂y ∂y erator since it only works along the x-direction and V is the corresponding vertical operator working along the y-direction.
6.4. ALTERNATING DIRECTION IMPLICIT
65
The ADI iteration method solves (6.10) by alternating H- and V -sweeps. Given an initial guess u0 the iteration proceeds by repeatedly solving the tridiagonal systems (H + ρk I ) uk+ 1 = (ρk I
V ) uk + b;
(6.11)
(V + ρk I ) uk+1 = (ρk I
H ) uk + 1
(6.12)
2
2
+ b:
The positive numbers ρk are called the Wachspress relaxation parameters. If only one relaxation parameter ρ is used it is determined by a minimisation argument of the spectral radius of iteration matrix G, G = (V + ρI )
1
ρI )(H + ρI )
(H
1
(V
ρI ):
(6.13)
Cyclic ADI methods make use of multiple relaxation parameters. The idea is that in each cycle i, ρi is chosen to quickly damp out a small range of eigenmodes. To this end the eigenspectrum of A is divided into k intervals. The classical choice is to define i
ρi = α
β α
k
for i = 0; : : : ; k.
(6.14)
Here α and β are lower and upper bounds for the eigenvalues λ and µ of H and V respectively: α < λ; µ < β.
6.4.2 Relaxation Parameters in D ELFT 3 D - FLOW In D ELFT 3 D - FLOW the first relaxation parameter is not used: i
ρi = α
β α
k
for i = 1; : : : ; k.
(6.15)
The relaxation parameters in D ELFT 3 D - FLOW are based on an averaged pseudo viscosity ν. This has been described by de Goede [dG93]. An alternative would be to introduce local relaxation parameters computed from local values of the pseudo viscosities νξ and νη . (H + Σ) uk+ 1 = (Σ
V ) uk + b;
(6.16)
(V + Σ) uk+1 = (Σ
H ) uk + 1
(6.17)
2
2
+ b;
where Σk is a diagonal matrix containing the local relaxation parameters. Whether this approach leads to a better ADI solver and/or preconditioner remains a topic of future research.
CHAPTER 6. SHALLOW WATER EQUATIONS SOLVER
66
6.4.3 ADI Solver for Water Elevation Equation In D ELFT 3 D - FLOW the ADI iterative method as described above was used to solve the water elevation equation (6.7). Starting from an initial guess ζn;0 for the water elevation at time level n, the iteration proceeds for q = 1; 2; : : : Q by repeatedly solving
1
(1 + ρi) I
νξ ∇2ξξ ζn;q+ 2
(1 + ρi) I
νη ∇2ηη ζn;q+1 = d + ρi I + νξ ∇2ξξ ζn;q+ 2 :
=d +
ρi I + νη ∇2ηη ζn;q ;
(6.18) 1
(6.19)
The iterative process, described in (6.18) and (6.19), may fail to converge for large timesteps and/or small mesh widths which lead to large νξ and νη . This problem is solved by using a more robust FGMRES solver in combination with ADI preconditioning. Applying GMRES or FGMRES to solve the linear systems arising in the AOI method is straightforward.
6.5 Preconditioning 6.5.1 Jacobi Preconditioning: Diagonal Row Scaling The convergence criterion in FGMRES is based on the residual norm kb Axk2. Therefore it is important that all the equations have the same weight in this residual norm. The left preconditioner ML was defined to be the Jacobi diagonal preconditioner ML = diag(A):
(6.20)
This amounts to scaling the linear system so that the diagonal entries are all equal to one. In terms of the D ELFT 3 D - FLOW variables, this leads to b + bx + by = 1:
(6.21)
It is essential to realize that the relaxation parameters, if computed prior to this scaling, should correspondingly be modified after scaling.
6.5.2 One Step ADI Preconditioning An obvious approach is to isolate one sweep of the original ADI method and to use it as the preconditioning step M j 1 v j in FGMRES. Starting from an initial guess z0 one step ADI preconditioning z = M 1 v amounts to solving two sets of tridiagonal linear systems (H + ρi I ) z 1 = (ρi I 2
(V + ρi I ) z = (ρi I
V ) z0 + v; H) z 1 2
+ v:
(6.22) (6.23)
6.6. ADI PRECONDITIONING IN DELFT3D-FLOW
67
An explicit expression for z can be obtained by eliminating z 1 . This results in 2
z
=
(V + ρi I )
1
(H
+
(V + ρi I )
1
I
ρi I )(H + ρi I ) 1 (V ρiI )z0 1 (H ρi I )(H + ρi I ) v:
(6.24)
When a stationary iterative method is used as a preconditioner in a Krylov subspace method, the initial guess has to be equal to the zero vector z0 = 0:
(6.25)
With this choice the one step ADI preconditioner reduces to z = (V + ρi I )
1
I
(H
ρi I )(H + ρi I )
1
v:
(6.26)
Starke [Sta94] has shown that the distance of the preconditioned operator to the identity matrix is bounded by the spectral radius of the ADI iteration operator G defined in (6.13). Hence the classical choice to select the relaxation parameters to minimise this spectral radius, is also a good choice when ADI is used as a preconditioner in a Krylov subspace method. This analysis by Starke was motivated by an earlier paper of Chin et al. [CMdP84], who have applied ADI preconditioning in a Chebyshev iteration.
6.5.3 Cycle ADI Preconditioning An alternative would be to let the preconditioner be defined by a complete cycle of ADI sweeps. The implementation of this preconditioner is straightforward. We only have to loop over all the relaxation parameters ρi (i = 0; : : : ; k) and solve the tridiagonal linear systems (6.28) and (6.28): (H + ρi I ) zi+ 1 = (ρi I
V ) zi + v;
(6.27)
(V + ρi I ) zi+1 = (ρi I
H ) zi+ 1
(6.28)
2
2
+ v:
The result of this cycle ADI preconditioner is obtained by simply setting z = zk after completion of this loop. The explicit form of M 1 can be found by unrolling the loop above. Once more, the initial guess has to be equal to the zero vector z0 = 0.
6.6 ADI preconditioning in D ELFT 3 D - FLOW It is obvious that better results are obtained with complete cycle ADI preconditioning than with one (single) step ADI preconditioning, since a complete cycle consists of (k + 1) steps. A comparison between a preconditioning based on one
CHAPTER 6. SHALLOW WATER EQUATIONS SOLVER
68
step and a preconditioning based on (k + 1) steps is of course not a fair comparison. Note that one step in any ADI preconditioning scheme requires a sweep with (H + ρi I ) as well as a sweep with (V + ρi I ). Full cycle ADI preconditioning, thus (k + 1) steps1 , yields better results than (k + 1) times the single step preconditioner even when the same relaxation parameters are used. At first sight this might be a little surprising, since in the latter case FGMRES has more search vectors and hence one could expect it to be able to find a better approximation. But full cycle ADI is a much more powerful preconditioning technique. The complete cycle ADI preconditioner has been chosen as the preconditioning technique used in D ELFT 3 D - FLOW. As an illustration the ADI-preconditioned FGMRES scheme is given in Table 6.1. A similar variant has been implemented for the ADI-preconditioned GMRES algorithm. Since the complete cycle ADI preconditioner is a fixed preconditioner GMRES can be used instead of FGMRES at the cost of one extra preconditioning step to obtain the update for the approximate solution. The memory requirements of GMRES are about half those of FGMRES in the case of a fixed preconditioner. The different algorithms and preconditioning techniques as well as the results can be found in [GT97]. The results in Sections 6.7 and 6.8 show that a more robust Krylov subspace method will be useful when numerical constraints rather than accuracy constraints hamper the convergence of the iterative solver in the AOI method. The advantages of the robust AOI method come into play, when accurate simulations are necessary, requiring 3D computations and/or fine grids, or when the system under consideration is essentially weakly dynamic. The latter is the case for the Clyde model (see Section 6.8) where the possibility to use larger time steps has proven to lead to significant reductions in CPU time.
6.7 The Benqu´e test case The Benqu´e testcase is used to see the effect of increasing Courant numbers. We restrict ourselves here to a few details, a complete description is given by Benqu´e [Ben82]. The grid and the geometry are shown in Fig. 6.1. A rectangular, shallow basin of size 3600 7200 m is considered, containing a zig-zag channel. The depth in the basin is set to 6 m except in the channel where it is set to 13 m. This basin is covered by a grid with mesh widths of ∆x = ∆y = 300 m. The initial condition is ζ = 3 m, u = v = 0 m/s. At the open boundary the water elevation is prescribed u = 0, ζ = 3 cos(ωt ) m, ω = 2π=T , T = 60 hours. The remaining 1 three boundaries are closed. The Ch´ezy coefficient is 65 m 2 =s and the horizontal viscosity is 1 m2 =s. Although this example may at first sight seem very unrealistic, it covers the effect of increasing Courant numbers. Instead of reducing the mesh 1 In
the Clyde model, described in section 6.8, k equals 7.
6.8. THE CLYDE RIVER MODEL
69
width of the grid, this has been simulated by incrementing the time step. In order to do so, we needed a somewhat unrealistic tide (T = 60 hours). Table 6.2 shows the total computation time (on a Silicon Graphics Indigo 2 workstation) required by D ELFT 3 D - FLOW using the AOI time integration method with the ADI solver and the FGMRES-ADI solver, respectively. These results clearly show that for increasing time steps and thus increasing pseudo viscosities νξ and νη in (6.7) the FGMRES-ADI solver requires less CPU time than the ADI solver. This is due to the increasing difficulty the ADI solver encounters when solving the systems. For ∆t = 200 s, the ADI solver does not even converge to a solution. The FGMRESADI solver is more robust and always capable of solving the systems of equations. For smaller time steps the overhead of the FGMRES algorithm and the fixed cycle length of the full cycle ADI preconditioner cause ADI to be more efficient.
6.8 The Clyde river model The Clyde model described by Steeghs [Ste95] simulates the long-term water quality in the Clyde river. The geometry is shown in Fig. 6.2. The long-term simulation of the salt stratification is important in this model. To this end a brute force approach is used in which the hydrodynamics are computed accurately in time. For this simulation of the hydrodynamics, a computational grid is used which consists of 10 layers of 178 30 points. Figure 6.3 shows the grid. The grid spacing is 250 m longitudinally and 200 m laterally. At the seaward boundary the water levels are given and at the upstream boundaries discharges are prescribed. D ELFT 3 D - FLOW computes the velocity fields and the water levels. Simulations have been carried out both with and without tide. Table 6.3 shows the different timings obtained on a Silicon Graphics Indigo 2 workstation. The results in Table 6.3 illustrate the increased robustness of the FGMRES-ADI solver even better. As a result, time steps could be taken four to six times longer without any significant loss of accuracy. Application of our new method to this Clyde model has resulted into reductions of the computation time with factors varying from three to five.
6.9 Conclusion In this chapter we have shown that a robust and efficient solver for the Shallow Water Equations can be constructed with the ADI method as a preconditioner in a GMRES or FGMRES accelerator. This solver is also used as the subdomain solver in the domain decomposition method, which is described in the next chapter.
70
CHAPTER 6. SHALLOW WATER EQUATIONS SOLVER
Table 6.1: The ADI-preconditioned FGMRES algorithm. Initialisation of FGMRES choose initial guess x0 choose dimension m of the subspace real (m + 1) m Hessenberg matrix H¯ m = 0 Arnoldi process 100 continue compute the left preconditioned initial residual r0 = ML 1 (b Ax0) compute the norm of the left preconditioned initial residual β = kr0 k2 define the vector v1 = r0 β for j = 1; : : : ; m z j;0 = 0 for i = 0; : : : ; k do (full cycle ADI preconditioning) (H + ρi I )z j;i+ 1 = (ρi I V )z j;0 + v j 2 (V + ρi I )z j;i+1 = (ρi I H )z j;i+ 1 + v 2 end for z j = z j;k compute the matrix-vector product p = Az j apply the fixed left preconditioner w = ML 1 p for i = 1; : : : ; j compute the matrix element hi; j = wT vi orthogonalise w = w hi; j vi end for compute the norm h j+1; j = kwk2 define the vector v j+1 = h w j +1; j
end for Approximate Solution define the matrix Zm = z1 : : : zm T define the vector s = βe1 = β 0 : : : 0 vector ym solves the minimisation problem miny k H¯ m y update the approximate solution xm = x0 + Zm ym Restart if necessary: set x0 = xm and go to 100.
sk2
6.9. CONCLUSION
71
Table 6.2: CPU times of D ELFT 3 D - FLOW AOI(ADI) and AOI(FGMRES-ADI) for the Benqu´e test case. Time step ∆t 1 min 20 min 40 min 60 min 80 min 100 min 200 min
ADI 491.5 s 43.4 s 25.2 s 28.8 s 22.7 s 18.5 s not converged
FGMRES 657.3 s 48.2 s 26.7 s 19.15 s 14.7 s 12.2 s 6.5 s
Table 6.3: CPU times of D ELFT 3 D - FLOW AOI(ADI) and AOI(FGMRES-ADI) for the Clyde model. Tide excluded excluded included included
∆t 2.5 min 10.0 min 1.0 min 6.0 min
ADI 13 hours not converged 32 hours not converged
FGMRES 12.0 hours 3.5 hours 30.0 hours 6.0 hours
CHAPTER 6. SHALLOW WATER EQUATIONS SOLVER
72
trim-t1.dat t1 961009 101938
N 7000 6000 5000 4000 3000 2000 1000 0
2000 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000
trim-t1.dat t1 961009 101938 above 13.0 12.0 11.0 10.0 9.0 8.0 7.0 below
7500
7000
6500
6000
5500
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
0
1000
2000
3000
4000
Depth
1996-10-09 11:44:16
DELFT HYDRAULICS
Figure 6.1: Benqu´e model: grid and depth.
ben1
above 52.5 45.0 37.5 30.0 22.5 15.0 7.5 .0 below
694000
692000
Figure 6.2: Clyde model: depth.
DELFT HYDRAULICS
690000
6.9. CONCLUSION
trim-kh1.dat kh1 960925 095618
688000
686000
684000
682000
680000
678000
676000
674000
672000
670000
668000
666000
664000
662000
1996-10-09 14:47:56
cl1
660000 220000
225000
230000
235000
240000
245000
250000
255000
260000
Clyde
73
74
trim-kh1.dat kh1 960925 095618
N
694000
692000
688000
686000
684000
682000
680000
678000
676000
674000
672000
670000
668000
666000
664000
662000
1996-10-09 14:46:07
cl1
660000 220000
Clyde
225000
230000
235000
240000
245000
250000
255000
260000
CHAPTER 6. SHALLOW WATER EQUATIONS SOLVER
Figure 6.3: Clyde model: grid.
DELFT HYDRAULICS
690000
Chapter 7
Generalised Additive Schwarz Method for the Shallow Water Equations Science is a differential equation. is a boundary condition.1
Religion
This chapter describes research by the author on a domain decomposition method in the context of D ELFT 3 D.
7.1 Introduction The main topic addressed in this chapter is the application of a domain decomposition preconditioner in combination with FGMRES to solve (6.7). The solver described in Chapter 6 is used as the subdomain solver in the domain decomposition method, which is also accelerated by FGMRES. The adopted domain decomposition method is used as an additive preconditioner so it is inherently parallel. All subdomain corrections are computed simultaneously in a solution step involving a large block diagonal matrix M. We assume that this preconditioning matrix M can be derived from the matrix A. The domain decomposition method in D ELFT 3 D - FLOW and its advantages have been discussed by de Goede et al. [dGTB+ 96]. The local residual is defined as the projection of the global residual onto the subdomain and the ADI iterative 1 Alan
Turing
75
76
CHAPTER 7. SHALLOW WATER EQUATIONS SOLVER
Γ t -t ..... t t . .. t .... ... t .... .. t ..... .. t .... .
t t
Ω1
t t
t
t
t
t
t
t
t
t
Ωl Ωr
Ω2
...t .. .t..............t...........t ... .t
Figure 7.1: Grid before partitioning method was used to solve each of the local systems. Occasionally the ADI iterative method failed in solving these local systems and for (very) large time steps, a lot of outer iterations were needed in order to obtain a smooth solution across the interfaces. The former problem was overcome by replacing the ADI iterative method by the previously described FGMRES algorithm. The second problem is solved by accelerating the domain decomposition method by FGMRES, in a similar way as the ADI method has been accelerated by FGMRES.
7.2 Generalised Additive Schwarz Method The domain decomposition preconditioner which is used in FGMRES to solve the linear system (6.7) on the entire domain is the Generalised Additive Schwarz Method (GASM). We wish to solve Au = f where A represents the discretisation of a PDE defined on a domain which is partitioned into nonoverlapping subdomains. Let Ri : Ω 7! Ωi denote the (linear) restriction operator that maps onto subdomain i by selecting the components corresponding to this subdomain. The matrix Mi = Ri ARTi denotes the principal submatrix of the matrix A associated with subdomain Ωi . The result of applying the GASM can be written as a sum of the extensions of the solutions of independent subdomain problems, which can be solved in parallel: p
M
1
=
∑ RTi Mi 1 Ri
:
(7.1)
i=1
We elaborate on this GASM for the case of two subdomains separated by the interface Γ as shown in Fig. 7.1. Extension to more subdomains is straightforward. A description can be found in [Tan95] and [GTR98a]. At the heart of our GASM lies an extension of the subdomains to (physically) slightly overlapping grids. With a proper definition of the overlap, the restrictions Ri can be defined in such a way that the original discretisation is “distributed” across the subdomain operators Mi . Since the classical five-point star stencil is used, an overlap of two grid lines is sufficient.
7.2. GENERALISED ADDITIVE SCHWARZ METHOD
Γ t-t ..... t . t t t
Ω1
t
.. t .... .. t ..... .. t .... ... t .... .
t
t t t
.. t ... ... t .... .. t .... ... t .... .. t .... ..
t
t
t
t
t
t
t
t
t
t
Ωl Ωr˜ Ωl˜ Ωr
77
Γ
Ω2
Figure 7.2: Grid after partitioning
Figure 7.2 illustrates the extension process. In the discretisation, points in subdomain Ω1 are only connected to points in Ω1 or in Ωl . Similar statements can be made about the points in Ωl , Ωr and Ω2 . This leads to the following block structured linear system. 0
A11 B Al1 B 0 0
A1l All Arl 0
0 Alr Arr A2r
10
1
0
0 ζ1 B ζl C B 0 C CB C=B Ar2 A ζr A A22 ζ2
1
f1 fl C C fr A f2
(7.2)
After extension towards overlap, and thus duplication of Ωl and Ωr into Ωl˜ and Ωr˜ , we obtain an enhanced system of equations in which we still have to specify the relation between the “overlapping” unknowns. The obvious way is just to state that the values in the duplicated subdomains Ωl˜ and Ωr˜ should be copied from the values in the original subdomains Ωl and Ωr respectively. This is known as the Dirichlet-Dirichlet (DD) coupling. The enhanced system of equations with this DD coupling can be written as follows. 0 B B B B B B
A11 Al1 0 0 0 0
A1l All 0 I 0 0
0 Alr I 0 0 0
0 0 0 I Arl 0
0 0 I 0 Arr A2r
0 0 0 0 Ar2 A22
10 CB CB CB CB CB CB A
ζ1 ζl ζ˜ r ζ˜ l ζr ζ2
1
0
C B C B C B C=B C B C B A
f1 fl 0 0 fr f2
1 C C C C C C A
(7.3)
Tang [Tan92] has shown that fast convergence can be obtained by choosing a good splitting, instead of increasing the overlap when a Schwarz enhanced matrix is used. Tan [Tan95] has shown that the spectral properties of the preconditioned operator AM 1 and thus the convergence behaviour of a Krylov subspace method preconditioned by a GASM, are improved by pre-multiplying the enhanced linear
CHAPTER 7. SHALLOW WATER EQUATIONS SOLVER
78
system Au = f with a properly chosen nonsingular matrix P of the form: 0 B B B P=B B B
I 0 0 0 0 0
0 I 0 0 0 0
0 0 Clr Crr 0 0
0 0 Cll Crl 0 0
0 0 0 0 I 0
0 0 0 0 0 I
1 C C C C: C C A
(7.4)
This pre-multiplication with P boils down to imposing more general conditions at the subdomain interfaces. Hagstrom et al. [HTJ88] advocate the use of nonlocal transmission conditions. This approach has been introduced in the Schwarz framework by Lions [Lio90]. Tan and Borsboom [TB95] have applied the Generalised Schwarz Coupling to advection-dominated problems. Nataf and Rogier [NR95a, NR95b] have shown that the rate of convergence of the Schwarz algorithm is significantly higher when operators arising from the factorisation of the convection-diffusion operator are used as transmission conditions. Based on these results, Japhet [Jap98] has developed the so-called optimised order 2 (OO2) conditions which result in even faster convergence. The submatrices Clr , Cll , Crr and Crl represent the discretisation of the transmission conditions and can be chosen freely subject to the condition that the matrix
C=
Clr Crr
Cll Crl
(7.5)
remains nonsingular. This gives rise to the Generalised Additive Schwarz Method which is thus based on the enhanced system of equations Au = f : 0 B B B B B B
A11 Al1 0 0 0 0
A1l All Cll Crl 0 0
0 Alr Clr Crr 0 0
0 0 Cll Crl Arl 0
0 0 Clr Crr Arr A2r
0 0 0 0 Ar2 A22
10 CB CB CB CB CB CB A
u1 ul u˜r u˜l ur u2
1
0
C B C B C B C=B C B C B A
f1 fl 0 0 fr f2
1 C C C C: C C A
(7.6)
In the application described in this chapter the submatrices Clr , Cll , Crr and Crl are chosen to achieve a clustering of the eigenvalues of the preconditioned operator and have been computed using the (expensive) optimisation algorithm by Tan [Tan95]. The GASM differs from the classical Additive Schwarz Preconditioner introduced by Dryja and Widlund [DW87] in that the transmission conditions at the interfaces, i.e. the boundary conditions for the subdomain problems, can be changed in order to improve the spectral properties of the preconditioned operator. The classical Additive Schwarz Preconditioner is described in Section 3.2.5.
7.3. KRYLOV CONVERGENCE ACCELERATION
79
This enhanced system of equations can be written in terms of the 3 3 blocks. Defining the restriction operators R1 and R2 in terms of the index sets corresponding to ζ1 , ζl and ζ˜ r on the one hand and ζ˜ l , ζr and ζ2 on the other hand, the preconditioner can be written as the block diagonal matrix
M=
R1 ART1 0
0 R2 ART2
:
(7.7)
The discussion of the GASM here is based on the assumption that the grids on the subdomains Ω1 and Ω2 are matching grids, which is trivial in the application described in this chapter since they are subgrids of one given grid. The aim of the research presented in the subsequent chapters was to extend the applicability of this method to (overlapping) nonmatching grids, when Ω1 and Ω2 are given grids with different mesh sizes. In the case of nonmatching grids, it is necessary to transfer information from one grid to the other grid, e.g. using an interpolation formula. Tan and Borsboom [TB97] have already shown how to apply the GASM on patched subgrids. The domain they are using, consists of a set of naturally ordered parallelograms, all of which have the same mesh width tangent to the interface. The mesh widths normal to the interface can be different on opposite sides of the interface.
7.3 Krylov Convergence Acceleration We use FGMRES to accelerate the convergence of the linear system solver. The GASM described above is a fixed preconditioner if and only if the linear systems in the subdomains are solved to full precision or with a fixed number of iterations of a stationary iterative method. The convergence of FGMRES is mainly governed by the eigenvalue distribution of the preconditioned operator AM 1 . In particular, convergence acceleration can be expected for well-separated extreme eigenvalues. Also, the stage of the FGMRES process in which acceleration occurs is related to the convergence of the Ritz values to extreme eigenvalues (see [VdSVdV86] and [VdVV93]). This phenomenon can be made visible by explicitly computing the Ritz values, i.e. the eigenvalues of Hm = VmT (AM 1 )Vm , in the course of the FGMRES process. The eigenvectors corresponding to the outliers of this spectrum represent the eigenvector components to be removed from the initial residual which “uphold” the convergence. Due to the construction of the GASM considered here, this Ritz spectrum, at least for meshes not too fine, typically resembles the spectrum as depicted in Fig. 7.4, i.e. a few well-separated outliers and a cluster around 1. For comparison we show in Fig. 7.3 the Ritz spectrum of the domain decomposition preconditioner for the same problem when Dirichlet-Dirichlet coupling is used. This spectrum does not show a clear separation of a cluster of eigenvalues around 1 and some outliers. On the contrary, the eigenvalues are spread out over
CHAPTER 7. SHALLOW WATER EQUATIONS SOLVER
80
Eigenvalues of Hessenberg Matrix 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Figure 7.3: Spectrum of Hm : Dirichlet-Dirichlet Coupling the open interval (0; 2) and a lot of the eigenvalues are either close to 0 or to 2. The eigenvalue distribution explains the slow convergence of this domain decomposition method with DD coupling when it is used as a solver, because the spectral radius of the matrix (I AM 1 ) is close to 1. In the next section we try to exploit the nice spectral properties of the GASM.
7.4 Reuse strategies The main motivation for using FGMRES instead of GMRES is that the former — in contrast to the latter — accommodates variable preconditioning; any vector z can be put into the search space Zm as long as its image Az is known in order to be able to compute the correction to the residual. This property in combination with the observation that our specific time integration method results for each time step in a sequence of systems (6.7), has raised the question whether it is possible to reuse previously computed search vectors during the solution of the next systems by FGMRES. Obviously, one advantage of reusing vectors is that it is a lot cheaper than applying the (expensive) GASM which after all requires the solution of a linear system in each subdomain. Also, when (approximations of) the preconditioned
7.4. REUSE STRATEGIES
81
Eigenvalues of Hessenberg matrix 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Figure 7.4: Spectrum of Hm : Locally Optimised Coupling eigenvectors that uphold the convergence are collected in the search space, accelerated convergence may be achieved from the first newly computed z-vector on. Several strategies to reuse vectors from an already generated subspace have been tested. In practice, we always use Q = 2, i.e. two systems must be solved in each time step. We focus on the solution of the second system in each time step (q = 2), possibly reusing information from the search space Zm built during the solution of the first system (q = 1). We formulate the following reuse strategies: (R)
1. truncation: introduce the first k z-vectors zi
= zi
(i = 1; : : : ; k).
2. assembling: introduce the best rank k approximation of SPANfz1 ; : : : ; zm g. 3. assembling of preconditioned Ritz vectors: introduce k preconditioned (R) approximate eigenvectors corresponding to outliers: zi = Zm yi (i = 1; : : : ; k), where yi is the eigenvector of Hm corresponding to the eigenvalue λi , i.e. Hm yi = λyi . The trivial truncation strategy 1 gives an indication of what can be expected from more sophisticated reuse strategies. The case in which all z-vectors are reused, enables us to verify whether the Arnoldi process can quickly generate a
CHAPTER 7. SHALLOW WATER EQUATIONS SOLVER
82
1 First linear system Second system linear with reuse 0.1 0.01 0.001 0.0001 1e-05 1e-06 1e-07 1e-08 1e-09 1e-10 0
2
4
6
8
10
12
Figure 7.5: Convergence histories of the preconditioned FGMRES method reasonably approximate eigenspectrum. The second strategy requires the computation of the singular value decomposition of Zm = UΣV T . When the singular values are ordered σ1 σ2 : : : σm 0, the best rank k approximation of SPANfz1 ; z2 ; : : : ; zm g is given by SPANfu1 ; u2 ; : : : ; uk g. The use of a best rank k approximation is motivated by the fact that the column space of Zm contains preconditioned approximate eigenvectors corresponding to the outliers. The hope is that a lower dimensional approximation still contains an approximation of these eigenvectors. The results seem to indicate that this strategy is not entirely capable of filtering out the preconditioned eigenspace. The third reuse strategy relies on the observation of clustered eigenspectra in combination with a (small) number of clearly distinguishable outliers as is the case in Fig. 7.4. An explicit construction of the preconditioned eigenspace corresponding to the outliers is then possible. Note that the construction is also based on (2.22). Based on the results in Table 7.1 the assembling strategy of preconditioned Ritz vectors has been chosen for further experiments.
7.5 Rectangular basin This test case is concerned with the flow in a 8000 m by 1200 m rectangular basin which is 8 m deep. The uniform grid has one layer in the vertical (σ) direction which contains 160 24 grid points in the horizontal direction. The prescribed
7.5. RECTANGULAR BASIN
83
1 Time step 1: first linear system Time step 1: second linear system Time step 72: first linear system Time step 72: second linear system
0.1 0.01 0.001 0.0001 1e-05 1e-06 1e-07 1e-08 1e-09 1e-10 1e-11 0
2
4
6
8
10
12
14
Figure 7.6: Convergence histories of the preconditioned FGMRES method boundary conditions are as follows. The north and south boundaries are closed; this leads to Neumann boundary conditions for (6.7). At the east boundary the water elevation is kept constant at ζ = 2 m. At the west boundary the water elevation is prescribed to model the tide, yielding ζ = 2 + sinωt m (ω = 2π=3600). Tests were done with stripwise partitionings of the rectangular domain into 4 and 8 subdomains. The time step was ∆t = 5 min and the simulation period was T = 6 hours. Table 7.1 shows the number of times the domain decomposition preconditioner needs to be applied to solve the second linear system for the water level in each time step. The search vectors z j , which are assembled from the first system, are not counted. For example, when 10 z-vectors are reused according to the truncation strategy from the search space constructed during the solution of the first (q = 1) linear system, the solution of the new system (q = 2) requires only 5 domain decomposition preconditioning steps. The dimension of the search space used by FGMRES to solve this second linear system is of course 15. Based on the results in Table 7.1 the assembling strategy of preconditioned Ritz vectors has been chosen for further experiments. With the Ritz vector assembling strategy the convergence histories for the test problem typically show the following behaviour for the scaled (with the two-norm k f k2 of right-hand side vector) residual norm. The convergence history, shown in Fig. 7.5, for the first linear system (q = 1) starts at about 0:2 and drops below the adopted threshold of 10 9 after 11 iterations. The threshold for the first linear system could be weakened, but a fairly accurate solution is required for good Ritz
84
CHAPTER 7. SHALLOW WATER EQUATIONS SOLVER
Table 7.1: Number of preconditioning steps needed to solve the second linear system when the reuse of vectors in the subspace is done by truncation, assembling (rank k) or assembling of preconditioned Ritz vectors (k outliers) for the rectangular basin partitioned in 4 strips. truncation k 2 3 4 5 6 7 8 9 10
steps 9 or 10 9 or 10 8 or 9 8 7 or 8 6 or 7 6 5 or 6 5
rank k 2 3 4 5 6 7 8 9 10
steps 9 8 or 9 8 7 6 6 6 5 or 6 5 or 6
k outliers 2 3 4 5 6
steps 8 7 or 8 7 6 5
values and vectors. This requires 11 applications of the domain decomposition preconditioner. Figure 7.4 shows the eigenvalues of the Hessenberg matrix Hm constructed by FGMRES during the solution of this first linear system. The six eigenvalues that are not close to 1 are the outliers which hamper the convergence of FGMRES. The convergence history, shown in Fig. 7.5, for the second linear system starts off with a plateau at about 0:001, dropping sharply to reach the tolerance criterion after 11 iterations as well. The plateau corresponds to the reuse of the assembled six Ritz vectors. This corresponds to the removal of the approximate eigenvectors associated with the outliers from the initial residual, a process which hardly reduces the norm of it. However it makes FGMRES converge as if the outliers were not present at all; starting from the first newly computed search vector the residual norm decreases rapidly, at the same rate as in the final stage of the solution of the first system. Because of the reuse, solving the second linear system requires only 5 applications of the domain decomposition preconditioner. Instead of computing the approximate eigenvectors at each time step from the matrix Zm constructed during the solution of the first linear system (q = 1), we can also construct the approximate eigenspace only once, from the first linear system arising in the first time step. This space is then reused in the solution process of all other linear systems (q = 1; 2) of subsequent time steps. The convergence histories for the second linear system indicate that it is not necessary to compute the approximate eigenvectors at each time step, since the results with the assembled vectors from the first time step are sufficiently close to those with the assembled vectors from the first (q = 1) solution process of the current time step. This assembling technique can thus significantly reduce the computational complexity of the
7.6. CONCLUSION
85
Table 7.2: CPU time for the test problem, with and without assembling and with different tolerances for inner systems. Assembling No Yes Yes Yes
Tolerance 10 7 10 7 10 4 10 2
CPU time 460 s 264 s 200 s 154 s
method. In addition to this, one can save on floating point operations by lowering the accuracy of the solutions of the subproblems. This is illustrated in Table 7.2 which shows the required CPU time for solving the test problem. The results show that our domain decomposition method can be accelerated by FGMRES in a way similar to the acceleration of ADI by FGMRES. Not only does this enhance the robustness of our domain decomposition method; it also allows the construction of more efficient methods due to the possibility to keep and reuse vectors that have already been computed in previous time steps.
7.6 Conclusion We have developed a Generalised Additive Schwarz Method for use within FGMRES to solve linear systems arising in the solution of the time-dependent shallow water equations. The preconditioned operator AM 1 has a clustered eigenspectrum with only a few outlying eigenvalues. This property together with the specific time integration method enables the reuse of search vectors in the FGMRES process for the solution of the subsequent linear systems. Reusing some vectors in the subsequent FGMRES processes makes the method even more efficient and leads to substantial reductions in computation time.
Chapter 8
Lower Dimensional Interpolation in Overlapping Composite Mesh Difference Methods Yeah ... right enough. Should’ve made the effort. Mate’s funeral likesay, ken. Spud thought that the Conservative Party in Scotland could do with a few Begbies. It’s not what the message is, the problem is just communication. Begbie is good at getting the message across.1
This chapter reports on the research we have done on the extension of our domain decomposition method to the case of nonmatching grids.
8.1 Introduction The discussion of the Generalised Additive Schwarz Method in Section 7.2 is based on the assumption that the grids on the subdomains Ω1 and Ω2 are matching grids. This assumption is trivially satisfied when the grids in the subdomains 1 Irvine
Welsh, Trainspotting, p. 298
87
88
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
are subgrids of one given grid. Since the PDE is discretised first on the original grid, the subproblems are well defined and the boundary conditions on the interior boundaries can be found by copying the value from the corresponding grid point in the neighbouring subdomain. The domain decomposition method can be used as a preconditioner to construct a fast linear system solver. In this chapter we drop the assumption that the grids on the subdomains are matching grids and assume we are given several (overlapping) subdomains, each with an independently constructed grid. Hence we end up with nonmatching grids with (possibly) different mesh sizes. In this case, the information transfer from one grid to the other grid is not trivial anymore since there is no global discretisation from which this can be derived. We focus on interpolation formulae and modified discretisation stencils to construct a consistent and second order accurate global discretisation, starting from a second order accurate discretisation in the subdomains. We propose a modified Composite Mesh Difference Method (CMDM) in which a lower dimensional interpolation can be used along the interface of the nonmatching grids. The advantage of this approach is that fewer interpolation points are needed while the same order of global accuracy is preserved. This is important especially for distributed memory implementations on parallel computers since smaller amounts of data need to be communicated among the overlapping subdomains. A CMDM on two subdomains has been described by Starius [Sta77], while Cai et al. [CMS00] have studied the case of many subdomains. The relation between the accuracy and the order of interpolation in a CMDM has been studied by Chesshire and Henshaw [CH90]. We focus on the 2D Poisson’s equation and discuss several interface interpolation schemes. We restrict ourselves to the case of two subdomains, i.e. Ω = Ω1 [ Ω2 where Ω1 = [0; 1℄ [0; 1℄ and Ω2 = [1; 2℄ [0; 1℄. The enlarged subdomains we use to solve this problem are Ω1 = [0; l1 ℄ [0; 1℄ and Ω2 = [l2 ; 2℄ [0; 1℄. We assume l1 > 1 and l2 < 1. The usual five-point second order finite difference method is used in the two subdomains, with mesh sizes resp. h and k. Our results show that it is possible to obtain global second order accuracy on nonmatching grids with a local coupling using only 4 points at the interface in the discretisation equations. In contrast, the overlapping nonmatching mortar method [CDS99] requires a global mortar projection involving all the mesh points on the interface. The effect of using a mortar projection in a CMDM is studied in the next chapter. 0
0
8.2 Composite Mesh Difference Method We first describe the Composite Mesh Difference Method (CMDM) for solving a general second order elliptic partial differential equation L u = f in Ω with a Dirichlet boundary condition u = g on ∂Ω. Given a domain Ω consisting of p
8.2. COMPOSITE MESH DIFFERENCE METHOD
89
/ for i 6= j, such that nonoverlapping subdomains Ωi \ Ω j = 0, ¯ ¯ = [p Ω Ω i=1 i ;
(8.1)
we enlarge each subdomain Ωi to include all points in Ω within a distance θi > 0. The resulting enlarged subdomain is denoted by Ω0i Ω0i = fx 2 Ω j dist(x; Ωi ) θi g:
(8.2)
On each of the enlarged subdomains Ω0i , we independently construct grids of size hi . Due to the extension of the subdomains these grids overlap. The set of grid points on the enlarged domain is also denoted by Ω0i . We denote by Γi = ∂Ω0i \ ∂Ω the intersection of the boundaries ∂Ω0i and ∂Ω. The boundary ∂Ω0i is partitioned in Γi and its complement Γci = ∂Ω0i n Γi . Several assumptions have to be made in order to prove error bounds for the CMDM. We briefly recall the assumptions, for the case of many subdomains, as stated in [CMS00]. Assumption 8.2.1 The truncation error αi (x) = (Lhi
kαi (x)k∞ Cα hri kukr i
i
L ) u(x) is of order ri :
i +2;∞;Ωi
0
(8.3)
;
where Cαi is a constant independent of the mesh size hi and kukk;∞;Ωi denotes the Sobolev norm for the space W∞k (Ω0i ). 0
Assumption 8.2.2 The interpolation operator Ii only uses values from grid points ¯ j and does not use values from grid points in Ω0 . in [ j6=i Ω i Assumption 8.2.3 The interpolation error βi (x) = (u
kβi(x)k∞ Cβ hsi kuks ∞ Ω i
i
i;
;
: Γci
Ii u) (x) is of order si : (8.4)
We only need the interpolation operator Ii in an enlarged region ΩΓci containing Γci . The interpolation constant σi is due to Starius [Sta77]. Definition 8.2.1 The interpolation constant σi is the maximum sum of the absolute values of the coefficients in the interpolation formula, i.e. the infinity norm σi = kIi k∞
(8.5)
of the matrix Ii representing the interpolation. If piecewise linear or bilinear interpolation is used, a convex combination of adjacent nodal values is made and the interpolation constant σi = 1 is optimal. For quadratic or cubic interpolation we have σi = 5=4 in the 1D case and σi = 25=16
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
90
in the 2D case. We denote by σ = maxi σi the largest of all the interpolation constants. Interpolation formulae with smaller interpolation constants can be obtained by subtracting scalar multiples of discretisation equations from the interpolation equations. Using this technique Starius [Sta77] constructed a fourth order interpolation formula with a small interpolation constant for smooth functions satisfying an elliptic equation of the form (Aux )x (Buy )y + au = f , where A; B > 0 and a 0. The global discretisation uh = (uh1 ; uh2 ; ; uh p ) on the composite grid is obtained by coupling the local discretisations through the requirement that the solution matches the interpolation of the discrete solution from adjacent grids. The system of equations consists of p subproblems, each having the following form: 8 < :
in Ω0i , on Γi , on ∂Ω0i n Γi .
Lhi uhi = fhi u hi u hi
= g hi = Ii uh
(8.6)
For notational convenience, we define zhi
= Ii uh :
(8.7)
The local problems (8.6) have to satisfy the following assumptions. Assumption 8.2.4 The local finite difference discretisations (8.6) are stable in the maximum norm, i.e. a constant Ki independent of hi exists so that
kuh k∞ Ω Ki k fh k∞ Ω + maxfkgh k∞ Γ kzh k∞ ∂Ω nΓ g i
;
0
i
i
;
0
;
i
i
i;
i
;
0
i
i
(8.8)
:
Assumption 8.2.5 The discretisations (8.6) satisfy a strong discrete maximum ¯i principle, i.e. the solution uhi of (8.6) with fhi = 0 and ghi = 0 restricted to Ω satisfies
kuh k∞ Ω¯ ρi kzh k∞ ∂Ω nΓ Equation (8.9) shows that the contraction factor 0 ρi i
;
i
i
;
0
i
i
(8.9)
:
<
1 measures the error
reduction.
Assumption 8.2.6 The product of the interpolation constant and the contraction factor is less than 1 τ = maxi (ρi σ) < 1:
(8.10)
Under the above assumptions 8.2.1 – 8.2.6 Cai et al. [CMS00] proved the maximum norm stability of the global discretisation and showed that the following error bound holds. Theorem 8.2.1 The error in the discrete solution satisfies p
maxi kehi k∞ ∑ kehi k∞ 1 + i=1
σ 1
τ
p
p
i=1
i=1
∑ Ki kαi k∞ + ∑ kβik∞
! :
(8.11)
8.3. A PARALLEL SCHWARZ ITERATIVE METHOD
91
8.3 A Parallel Schwarz Iterative Method In this section we discuss an iterative method for solving the linear system corre(0) sponding to the global discretisation (8.6). The initialisation step computes uh by solving the p subdomain problems (for i = 1; 2; : : : ; p) in parallel 8 > < > :
Lhi u(h0i ) = fhi (0 )
u hi
= g hi
u hi
=0
(0 )
in Ω0i , on Γi , on
(8.12)
Γci .
The method then proceeds for n = 0; 1; : : : by repeatedly solving the p subdomain problems (for i = 1; 2; : : : ; p) 8 > < > :
Lhi u(hni +1) = fhi (n+1)
u hi
= g hi
u hi
= Ii uh
(n+1)
(n)
in Ω0i , on Γi , on
(8.13)
Γci ,
(n+1)
in parallel for uh until convergence. This is a parallel variant of the Schwarz alternating method. Rather than using the additive Schwarz method as a solver, it is better to use it as a preconditioner in a Krylov subspace method. Since the iteration described above is a contraction mapping, the resulting system of equations can be solved by repeatedly solving the p subproblems (8.6) in parallel, where zhi is computed from the previous iteration. The convergence rate of this iteration is bounded by the contraction factor τ of the mapping. Theorem 8.3.1 The iterates fuh
g converge to the exact discrete solution uh and n 0 (8.14) d (uh uh ) τn d (uh uh ) Here d (wh vh ) = maxi fkwh vh k∞ Ω¯ g. (n)
( )
;
i
i
( )
;
;
;
:
0
i
Convergence proofs of the Schwarz algorithm based on a maximum principle have been given by Miller [Mil65] and by Cai et al. [CMS00]. We remark that when solving (8.6) using iterative methods, the total arithmetic cost is determined by the subdomain mesh sizes, while the communication cost, for implementations on distributed memory machines, depends on the interface interpolation operators.
8.4 Subdomain Error Analysis based on the Maximum Principle An error analysis of an elliptic model problem based on the maximum principle can be found in several textbooks, see e.g. [MM94, TW98]. We adopt the notation
92
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
used in [MM94] and denote by p the index pair of a mesh point. The set JΩ contains the index pairs of mesh points in the domain Ω. We make the following assumptions. Assumption 8.4.1 For all p 2 JΩ the finite difference operator Lh has the form:
Lh u p = c p u p + ∑ ck uk ;
(8.15)
k
where the coefficients are positive and the sum over k is taken over mesh points which are neighbours of p. Assumption 8.4.2 For all p 2 JΩ : c p ∑k ck . Assumption 8.4.3 The set JΩ is connected. By definition a point is connected to each of its neighbours occurring in (8.15) with a nonzero coefficient. A set is defined to be connected if, given any two points p and q in JΩ , there is a sequence of points p = p0 , p1 , : : : , pm+1 = q, such that each point pi is connected to pi 1 and pi+1 , for i = 1; 2; : : : ; m. Assumption 8.4.4 At least one of the equations must involve a Dirichlet boundary condition. The maximum principle can be stated as follows. Lemma 8.4.1 (Maximum Principle) Suppose that Lh , JΩ and J∂Ω satisfy all the assumptions 8.4.1 – 8.4.4 and that a mesh function u p satisfies Lh u p 0 for all p 2 JΩ . Then u p cannot attain a nonnegative maximum at an interior point:
max p2JΩ u p max maxa2J∂Ω ua ; 0
:
(8.16)
Based on Lemma 8.4.1, we can prove the following error bound, which is a straightforward extension of Theorem 6.1 in [MM94]. Theorem 8.4.1 Suppose that the region JΩ is partitioned into two disjoint regions JΩ = J1 [ J2 with J1 \ J2 = 0/ ;
(8.17)
that a nonnegative mesh function Φ p , defined on JΩ [ J∂Ω , satisfies
Lh Φ p C1 > 0 Lh Φ p C2 > 0
8 p 2 J1 8 p 2 J2
(8.18)
and that the truncation error α p is bounded
jα p j T1 8 p 2 J1 jα p j T2 8 p 2 J2
; :
(8.19)
8.5. CONSISTENCE OF GRID INTERPOLATIONS
93
Then the error in the approximation is bounded by
je p j
maxa2J∂Ω Φa max
T1 T2 ; C1 C2
+ maxa2J∂Ω
jea j
(8.20)
where the last term originates from the boundary conditions. Proof. Consider the mesh function KΦ p + e p. The constant K 0 is chosen so that the maximum principle applies. Since Lh e p = α p , we have
Lh (KΦ p + e p) KCi α p 0; n
(8.21)
o
T1 ; T2 . Since Φ is nonnegative, we have that for i = 1; 2 and for K max C p 1 C2 max p2JΩ (e p )
max p2JΩ (KΦ p + e p ) maxa2J∂Ω (KΦa + ea ) maxa2J∂Ω (Φa ) K + maxa2J∂Ω (ea ) :
Applying Lemma 8.4.1 tonthe mesho function (KΦ p e p ) yields a similar bound T1 ; T2 yields the desired result. for e p . Setting K = max C 1 C2
8.5
Consistence of Grid Interpolations
The following definition of consistent interpolation provides an important condition on the combinations of discretisation and interpolation formulae. Definition 8.5.1 Let Ii be the interpolation operator from ΩΓci to Γci , and let L be the differential operator to be approximated by a finite difference operator Di (Lhi ; Ii ), which depends on the usual finite difference operator Lhi and on Ii . We call the discretisation and interpolation pair consistent on Ω0hi if (L
Di (Lhi ; Ii )) u(x) = O (hi )
(8.22)
for all x 2 Ω0hi . Equation (8.22) states that the truncation error of the combined discretisation and interpolation pair, should vanish if the mesh size hi tends to 0. This can easily be verified. First the interpolation formula is substituted into the discretisation formula. The resulting expression is expanded in a Taylor series. Then we subtract L to obtain the truncation error, which should go to 0 as hi tends to zero.
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
94
t h
t
h
h (0; 0)
t
t
h
t Figure 8.1: The standard five-point stencil
8.6 The Standard Five-point Stencil with Bilinear Interpolation 8.6.1 Inconsistency in Discretisation The standard five-point stencil is shown in Fig. 8.1. Since both this standard fivepoint stencil and the bilinear interpolation are second order, the error bound (8.11) shows that the resulting CMDM is also second order. However we show that this combination does not satisfy the consistent interpolation condition (8.22). We describe the inconsistency present in this approach. Let (0; 0) be the local coordinates of a mesh point in Ω0h1 that is next to the interface Γc1 . In order to define the finite difference stencil, assume that the value at (h; 0) on the interface Γc1 , has to be obtained from Ωh2 through interpolation. More precisely, the stencil S at (0; 0) has the form S = 4u(0; 0) + u(0; h) + u(0; h) + u( h; 0) + v;
(8.23)
where v is computed using a bilinear interpolation for u(h; 0), i.e. v = (1 + ξ (1
ξ)(1
η)u(h
η)u(h + (1
ξk; ηk) + (1
ξ)ηu(h
ξ)k; ηk) + ξηu(h + (1
ξk; (1
η)k)
ξ)k; (1
η)k);
(8.24)
where k is the mesh size in Ωh2 . Figure 8.3 shows the scaled local coordinates (ξ; η) used in the interpolation on the overlapping meshes. The resulting stencil, obtained after substitution of the interpolation formula in the discretisation formula, is shown in Fig. 8.2. Expanding (8.23) in a Taylor series around (0; 0) we find the inconsistent discretisation for the nodes where the bilinear interpolation is used: S h2
(uxx + uyy ) =
γ2k (ξ(1 2
ξ)uxx + η(1
η)uyy) + O (h);
(8.25)
8.6. FIVE-POINT STENCIL WITH BILINEAR INTERPOLATION
95
t h
t
h
t
HHH t 00 A AAt t t
(
;
)
h
t Figure 8.2: The standard five-point stencil with bilinear interpolation
where γk = k=h is the ratio of the mesh sizes. Note that the scheme is consistent only if ξ and η are either 0 or 1, which implies that the two meshes match each other on the interface. Numerical results illustrating the effect of this inconsistent discretisation are given in Sect. 8.10.
8.6.2 Subdomain Error Analysis We need to define a few sets of grid points for the next theorem. These definitions are illustrated in Fig. 8.3. The region JΩ1 is partitioned into four disjoint regions J0 , J1 , J2 and J3 . The set of index pairs of grid points on the boundary where boundary conditions are given is denoted by J0 . We denote by J1 the set of index pairs of grid points in the interior of the domain Ω01 where the standard stencil 0
S = 4u(0; 0) + u(0; h) + u(0; h) + u( h; 0) + u(h; 0);
(8.26)
is used. The set J2 contains the index pairs of grid points in the domain Ω01 , next to, but not on, an internal boundary where the stencil (8.23) with bilinear interpolation is used. We denote by J3 the set of index pairs of grid points on the internal boundary ∂Ω01 n Γ1 where Dirichlet boundary conditions are given which are obtained by means of bilinear interpolation. The strip of grid points in the mesh on Ω2 required in the bilinear interpolation is at most 2 grid points wide and is divided into 2 groups: the one to the left JL and the one to the right JR . We denote by γk = k=h the ratio of the mesh sizes. The comparison function Φ is defined by (8.31). Equation (8.35) defines the constant E˜2 .
Theorem 8.6.1 The standard stencil with bilinear interpolation results in a sec-
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
96
t
t
t
-
h
J1
t
t
t t
Ω1
J0
t
t
t t
t
t J2
t
t
t
t
t t
t
t
t
t JL
t
t
t
k
-
JR
t
J3
6t ? t ξk -
t
t
t
ηk
t
t Ω2
t
Figure 8.3: A composite mesh with overlapping and nonmatching grids. The scaled local coordinates (ξ; η) are used in the interpolation. ond order scheme on every enlarged subdomain
je p j CΦ max a
3 + O (h )
(
CI ; max p2J3 (ξ(1
8 p 2 JΩ
0
i
ξ)uxx + η(1
γ2 η)uyy) k 2E˜ 2
)
h2 + CD h2
;
(8.27) where CI = (Mxxxx + Myyyy ) =(48E1) and CΦa = [maxa2J0 [JL [JR Φa ℄ denotes the maximum of the function Φ, defined by (8.31), over the grid points where a Dirichlet boundary condition is used for the subdomain and CD is a nonnegative constant associated with the accuracy of the Dirichlet boundary conditions on the internal boundaries. Proof. The proof consists in showing that the conditions for Theorem 8.4.1 are satisfied. Without loss of generality, we restrict ourselves to showing that the theorem holds for the situation depicted in Fig. 8.3. For all the points p 2 J1 where the stencil (8.26) is used, we have that the discrete operator satisfies
Lhi u =
S h2
= (uxx + uyy ) +
h2 4 (uxxxx + uyyyy ) + O (h ) 12
8 p 2 J1
:
(8.28)
The truncation error α of this stencil is bounded 2
h jαj T1 = 12 (Mxxxx + Myyyy ) 8 p 2 J1
;
(8.29)
8.6. FIVE-POINT STENCIL WITH BILINEAR INTERPOLATION
97
where Mxxxx is an upper bound on the absolute value of the derivative juxxxx j. The truncation error α of the stencil (8.23) with bilinear interpolation (8.24) used for points p 2 J2 is given in (8.25) and can be bounded
jαj T2 = max p2J
3 (ξ(1
ξ)Mxx + η(1
η)Myy )
γ2k 2
8 p 2 J2
:
(8.30)
Remark 8.6.2 explains why this bound is chosen, instead of a bound independent of ξ and η. We define the nonnegative function 8 < E1 (x Φ= : E1 (x
x0)2 + (y x0 )
2
y 0 )2
+ (y
y0 )
2
x xc ;
+ E2
x > xc ;
(8.31)
where the constants E1 and E2 are positive and xc is the x-coordinate of grid points in J2 . The function Φ is a quadratic and the constant E2 is only added if Φ is evaluated in a point to the right of the grid line containing the points in J2 . The mesh function Φ p is defined as the restriction of the function Φ to the grid points. We show that Lh Φ p is bounded away from zero. For points in J1 , we have
Lh Φ p = 4E1 > 0 so that (8.18) is satisfied with C1 calculation yields
Lh Φ p =
8 > > 4E1 + γ2k (ξ(1 > < > > > : 4E1 + γ2k (ξ(1
=
4E1
>
8 p 2 J1
(8.32)
;
0. For points in J2 , a straightforward E2 h2
ξ) + η(1
η)) E1 +
ξ) + η(1
E2 η)) E1 + ξ 2 h
ξ < γ1 ; k ξ γ1 : k
(8.33)
If ξ 1=γk , the points in JL are to the left of the line x = xc and the term (1 ξ)E2 =h2 is missing. Equation (8.33) shows that Lh Φ p can be bounded away from zero:
Lh Φ p
E˜2 h2
>
0
8 p 2 J2
;
(8.34)
where 1 E˜2 = minf1; gE2 : γk
(8.35)
Hence (8.18) is satisfied with C2 = E˜2 =h2 > 0. The last term in (8.27) originates from the boundary conditions for the subdomain problem. On the real boundary, i.e. for points in J0 , the boundary conditions
98
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
are given without error. The Dirichlet boundary conditions given on the artificial boundaries, i.e. for points in JL and JR are second order accurate, as can be seen from (8.11): p
maxjea j ∑ kehi k∞ CD h2 + O (h3) i=1
8a 2 JL [ JR
:
(8.36)
Hence these Dirichlet boundary conditions are at least second order accurate: maxa2J∂Ω jea j CD h2 + O (h3): 0
(8.37)
i
where CD 0 is a constant. This concludes the proof.
Remark 8.6.1 The error bound (8.27) is optimised by selecting the constants E1 γ2 and E2 in (8.31) so that CI = max p2J3 (ξ(1 ξ)uxx + η(1 η)uyy) k and by 2E˜ 2 choosing the point (x0 ; y0 ) so that CΦa is minimal. Remark 8.6.2 Instead of using the bound on the truncation error given in (8.30), we could use a bound, which does not depend on the position of the interface in the other mesh
jαj T2 = γ8k (Mxx + Myy) 2
(8.38)
since 0 ξ(1 ξ) 1=4. However if we use this bound, we shall not be able to explain the ratios observed in Table 8.1, because those results show a dependency on ξ(1 ξ). The numerical results in Section 8.10 show that for the combination of the standard five-point stencil and bilinear interpolation, the accuracy depends on the size of the overlap. This is due to the fact that the consistency condition (8.22) is not satisfied. In the subsequent sections we construct schemes that satisfy this condition.
8.7 Second Order Scheme on a Modified Stencil 8.7.1 Discretisation Stencil In order to obtain a consistent discretisation, we construct difference formulae on modified stencils. The idea is to slightly modify the meshes along their boundaries so that the interpolation is needed only in one of the directions in contrast to the standard bilinear method, which needs information in both x and y directions. The modified mesh is shown in Fig. 8.4. The difference with the original mesh depicted in Fig. 8.3 is that the mesh width k h is selected so that a grid line in
8.7. SECOND ORDER SCHEME ON A MODIFIED STENCIL t t
t
h
J1
t
t
t
t t
Ω1
t
J0
t
t
t t
ηl
t
t
t
t J2
t
k
t
t
t
t
t
t t
t
t
99
l
JI
t
J3 t
6 ?t -
t Ω2 t
t
-
t
t
Figure 8.4: The mesh width k h is selected so that a grid line in the other mesh is matched. The mesh on Ω1 has γk = k=h = 1:2, while for the mesh on Ω2 this factor is 2.
t h
t
h
t
h
t
(0; 0)
k
t
h
t Figure 8.5: Modified stencil for the second order scheme. the other mesh is matched. In case k 2h, extra grid points at a distance h can be introduced, to make sure that h k 2h. The mesh on Ω1 , in the example of Fig. 8.4, has γk = k=h = 1:2, while for the mesh on Ω2 this factor is 2. In this case an extra grid line could be introduced, but it is not necessary. We start off with the construction of a second order accurate difference formula on the modified stencils shown in Fig. 8.5. The mesh width k h is selected so that a grid line in the other mesh is matched and no interpolation along the x-axis is required. The point u( 2h; 0) is needed to obtain a second order discretisation in the stencil S = c0;0 u(0; 0) + c0; 1u(0; h) + c0;1u(0; h) + c 1;0 u( h; 0) + c1;0u(k; 0) + c 2;0u( 2h; 0)
(8.39)
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
100
with γk 1 ; γk + 2 2(2 γk ) ; c 1;0 = γk + 1 c0 ; 1 = 1 ; 3 c0;0 = 1 ; γk c0;1 = 1; 6 c1;0 = : γk (γk + 1)(γk + 2) c
2;0 =
(8.40) (8.41) (8.42) (8.43) (8.44) (8.45) (8.46)
The truncation error is given as S h2
(uxx + uyy ) =
h2 ((3γk 12
2)uxxxx + uyyyy ) + O (h3):
(8.47)
Note that for 1 γk 2 this stencil satisfies a discrete maximum principle.
8.7.2 Subdomain Error Analysis We need to define a few sets of grid points for the next theorem. These definitions are illustrated in Fig. 8.4. The region JΩ is partitioned into four disjoint regions J0 , J1 , J2 and J3 . The set of mesh points on the boundary where boundary conditions are given is denoted by J0 . We denote by J1 the set of index pairs of grid points in the interior of the domain Ω01 where the standard stencil (8.26) is used. The set J2 contains the mesh points where the modified stencil (8.39) is used. We denote by J3 the set of index pairs of grid points on the internal boundary ∂Ω01 n Γ1 where Dirichlet boundary conditions are given. The comparison function Φ is defined by (8.50). Theorem 8.7.1 The subdomain problem using the standard stencil and the modified stencil (8.39) result in a second order scheme for 1 γk 2. We have the error bound
je pj CΦ f((3γk a
h 2 + C h 2 + O (h 3 ) 2)Mxxxx + Myyyy )g 12 D
8 p 2 Ω0i
;
(8.48)
where CΦa = [maxa2J0 [J3 Φa ℄ denotes the maximum of the function Φ, defined by (8.50), over the grid points were a Dirichlet boundary condition is used for the subdomain and CD is a nonnegative constant associated with the accuracy of the Dirichlet boundary conditions on the internal boundaries.
8.8. MODIFIED STENCIL WITH LINEAR INTERPOLATION
101
Proof. The proof consists in showing that the conditions for Theorem 8.4.1 are satisfied. The bound T1 on the truncation error α of the stencil (8.26) is given in (8.29), and is valid for all index pairs in J1 . From (8.47) we derive the bound T2 on the truncation error α of the modified stencil (8.39), valid for all index pairs in J2 : 2
h jαj T2 = 12 ((3γk
8 p 2 J2
2)Mxxxx + Myyyy )
:
(8.49)
Note that 1 (3γk 2) 4. We define the nonnegative function Φ = E1
(x
x0)2 + (y
y 0 )2
(8.50)
where the constant E1 is positive. The mesh function Φ p is defined as the restriction of the function Φ to the grid points. Equation (8.18) is satisfied with C1 = C2 = 4E1 > 0, since
Lh Φ p = 4E1 > 0
8 p 2 J1 [ J2
(8.51)
:
The last term in (8.48) originates from the boundary conditions for the subdomain problem. On the real boundary, i.e. for points in J0 , the boundary conditions are given without error. The Dirichlet boundary conditions given on the artificial boundaries, i.e. for points in J3 are second order accurate, as can be seen from (8.11): p
maxjea j ∑ kehi k∞ CD h2 + O (h3)
8a 2 J3
i=1
:
(8.52)
Hence these Dirichlet boundary conditions are at least second order accurate: maxa2J∂Ω jea j CD h2 + O (h3): 0
(8.53)
i
where CD 0 is a constant. This concludes the proof.
8.8 Modified Stencil with Linear Interpolation 8.8.1 Discretisation Stencil We now consider the effect of replacing u(k; 0) by an interpolation formula along the y-axis. First of all we show that a consistent approximation exists when linear interpolation is used, i.e. v = (1
η)u(k; ηl ) + ηu(k; (1
η)l )
(8.54)
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
102
t h h
t
t
k
t
(0; 0)
ηl
l
t
h
t Figure 8.6: Stencil for the first order scheme with 1D linear interpolation. is used instead of u(k; 0). We use the standard stencil and seek the coefficients in S = c0;0 u(0; 0) + c0; 1u(0; h) + c0;1u(0; h) + c
1;0u(
h ; 0 ) + cv v
(8.55)
so that a consistent approximation to (uxx + uyy ) results. Setting 1;0 =
c
c 0 ;0 = 2 c0 ;
= l =h,
γk + 1
;
(8.56)
η(1
η)γ2l (γk + 1)2 ; γk (γk + 1)
(8.57)
1 = c0;1 = 1
cv = where γl
2
η(1 η)γ2l ; γk (γk + 1)
(8.58)
2 ; γk (γk + 1)
(8.59)
results in S h2
(uxx + uyy ) = O (h):
(8.60)
The truncation error for this scheme is !
α=
γ2k + 1 γ2 η(1 η) γ3 η(2η 1)(η 1) uxxx + l uyyy h + O (h2): uxyy + l 3(γk + 1) γk + 1 3γk (γk + 1) (8.61) p
If γl 2 γk (γk + 1) this formula satisfies a discrete maximum principle. For η equal to 0 or 1 a first order discretisation on a modified 5-point stencil is obtained. Note that the difference formula has to be modified to account for the low order interpolation, i.e. the coefficients c0;0 in (8.57) and c0; 1 and c0;1 in (8.58) depend on η and γl .
8.8. MODIFIED STENCIL WITH LINEAR INTERPOLATION
103
8.8.2 Subdomain Error Analysis Assumption 8.8.1 The boundary values for the subdomain problem are at least second order accurate. This assumption is one we do not want to make. It should follow from (8.11) as in (8.37). The problem is that this scheme does not satisfy assumption 8.2.1 because the truncation error is O (h) on the boundary as can be seen from (8.61). However the numerical results in Section 8.10 lead us to believe that Assumption 8.8.1 holds. We also believe that (8.11) can be proved even if Assumption 8.2.1 is relaxed in the sense that the truncation error of the discretisation equations that use the interpolation along the boundary is lower order. We need to define a few sets of grid points for the next theorem. These definitions are illustrated in Fig. 8.4. The region JΩ is partitioned into four disjoint regions J0 , J1 , J2 and J3 . The set of mesh points on the boundary where boundary conditions are given is denoted by J0 . We denote by J1 the set of index pairs of grid points in the interior of the domain Ω01 where the standard stencil (8.26) is used. The set J2 contains the mesh points where the modified stencil (8.55) is used. We denote by J3 the set of index pairs of grid points on the internal boundary ∂Ω01 n Γ1 where Dirichlet boundary conditions are given. The comparison function Φ is defined by (8.64). Theorem 8.8.1 Under Assumption 8.8.1, the modified stencil with linear interpolation results in a second order scheme on every enlarged subdomain.
je p j CΦ
a
Mxxxx + Myyyy 2 h + CD h2 + O (h3) 48E1
8 p 2 Ω0i
;
(8.62)
where CΦa = [maxa2J0 [J3 Φa ℄ denotes the maximum of the function Φ, defined by (8.64), over the grid points were a Dirichlet boundary condition is used for the subdomain and CD 0 is a constant associated with the accuracy of the Dirichlet boundary conditions on the internal boundaries. Proof. The proof consists in showing that the conditions for Theorem 8.4.1 are satisfied. The bound T1 on the truncation error α of the stencil (8.26) is given in (8.29), and is valid for all index pairs in J1 . From From (8.61) we derive the bound T2 on the truncation error α of the modified stencil (8.55), valid for all index pairs in J2 :
jαj T2 = (Cxxx Mxxx + CxyyMxyy + CyyyMyyy ) h 8 p 2 J2
;
(8.63)
where the constants Cxxx , Cxyy and Cyyy are bounds on the coefficients of the derivatives occurring in (8.61).
104
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
We define the nonnegative function 8 < E1 ( x Φ= : E1 ( x
x0)2 + (y
y 0 )2
x0)2 + (y
y 0 )2
x xc ;
+ E2
(8.64)
x > xc ;
where the constants E1 > 0 and E2 > 0 are nonnegative and xc is the x-coordinate of grid points in J2 . The function Φ is a quadratic and the constant E2 is only added if Φ is evaluated in a point to the right of the grid line containing the points in J2 . The mesh function Φ p is defined as the restriction of the function function Φ to the grid points. We show that Lh Φ p is bounded away from zero. For points in J1 , we have
Lh Φ p = 4E1 > 0 so that (8.18) is satisfied with C1 calculation yields
=
Lh Φ p = 4E1 +
4E1
>
8 p 2 J1
0. For points in J2 , a straightforward
2E2 γk (γk + 1)h2
so that (8.18) is satisfied with C2 =
(8.65)
;
>
2E2 γk (γk + 1)h2
0
>
8 p 2 J2
;
(8.66)
0. With these results the error
bound (8.20) results in
CT1 2 CT2 γk (γk + 1) 3 je pj CΦa max 48E h ; h 2E2 1
2 3 + CD h + O (h );
(8.67)
where CT1 = Mxxxx + Myyyy and CT2 = Cxxx Mxxx + Cxyy Mxyy + Cyyy Myyy are the constants arising in the bounds (8.29) and (8.63) on the truncation error. The term CD h2 in (8.67) originates from the boundary values for the subdomain problem. On the real boundary, i.e. for points in J0 , the boundary values are given without error. Under Assumption 8.8.1 the boundary values for points in J3 are at least second order accurate. This concludes the proof. We repeat that this theorem is based on Assumption 8.8.1. The problem is that this scheme does not satisfy assumption 8.2.1 because the truncation error is O (h) on the boundary, so we cannot prove that the Dirichlet boundary conditions for the interior subdomains boundaries are second order accurate.
8.9 Modified Stencil with Cubic Interpolation As shown in Section 8.7 the interpolation along the x-axis can be avoided by using a modified stencil. We show that cubic interpolation along the y-axis results in a second order scheme. This is equivalent to constructing a second order accurate
8.9. MODIFIED STENCIL WITH CUBIC INTERPOLATION t
t l
h
t
h
t
h
105
t
t
k
(0; 0)
h
ηl
l
t
t Figure 8.7: Stencil for the scheme with only 3 points for the 1D interpolation.
difference formula on the modified stencil depicted in Fig. 8.8. The next theorem states that at least 4 points are required on the line where the interpolation is performed. Theorem 8.9.1 It is not possible to obtain a second order accurate discretisation, on the stencil depicted in Fig. 8.7, of (uxx + uyy ) at the centre point (0; 0) unless at least 4 points points are used along the line x = k or one of the points on the line x = k has a zero y-coordinate. Proof. Consider the stencil depicted in Fig. 8.7 with only 3 points along the line x = k. The requirements that the coefficients of all terms of up to order 3 vanish, except the coefficients of uxx and uyy , which should be equal and nonzero, in the Taylor expansion of the difference formula about (0; 0), yield an overdetermined system of equations which does not have a solution. The next theorem shows that the combination of a 1D cubic interpolation with the discretisation (8.39) on the modified stencil can be used to construct a second order discretisation for the points on the interface. Theorem 8.9.2 The only second order accurate discretisation, on the stencil depicted in Fig. 8.8, of (uxx + uyy ) at the centre point (0; 0) using only 4 points along the line x = k with none of these points having a zero y-coordinate, is the second order scheme (8.39) on the modified stencil, with cubic interpolation along the line x = k for the point (k; 0). Proof. We seek the coefficients in the stencil S = c0;0 u(0; 0) + c0; 1u(0; h) + c0;1u(0; h) + h; 0) + c 2;0u( 2h; 0) + c1; 1u(k; (1 + η)l ) + c1;0 u(k; ηl ) + c1;1 u(k; (1 η)l ) + c1;2 u(k; (2 η)l ):
+ c 1;0 u(
(8.68)
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
106
Setting γk 1 ; γk + 2 2 γk c 1;0 = 2 ; γk + 1 3 c0;0 = 1 ; γk c0; 1 = c0;1 = 1; η(1 η)(2 η) ; c1 ; 1 = γk (γk + 1)(γk + 2) (1 η)(2 η)(η + 1) ; c1;0 = 3 γk (γk + 1)(γk + 2) (2 η)η(η + 1) ; c1;1 = 3 γk (γk + 1)(γk + 2) η(1 η)(η + 1) c1;2 = ; γk (γk + 1)(γk + 2) c
2;0 =
(8.69) (8.70) (8.71) (8.72) (8.73) (8.74) (8.75) (8.76)
results in a second order accurate discretisation S h2
2 (uxx + uyy ) = O (h ):
(8.77)
The truncation error for this scheme is
α=
(3γk
12
2)
uxxxx +
1 12
(2
η)(1 η)η(η + 1)γ4l uyyyy h2 + O (h3): 4γk (γk + 1)(γk + 2) (8.78)
Remark 8.9.1 It is clear that the coefficients c1; 1 , c1;0 , c1;1 and c1;2 are equal to the product of the coefficient c1;0 in (8.39) and the cubic Lagrange interpolation polynomials.
8.10 Numerical results 8.10.1 Smooth function This test case is taken from [CMS00] and is concerned with the solution of ∇2 u = f
on Ω = Ω1 [ Ω2 ,
(8.79)
8.10. NUMERICAL RESULTS
107
t
t l
h
t
h
t
h
t
(0; 0)
h
t
t
k ηl
l
t l
t
Figure 8.8: Stencil for the second order scheme with 1D cubic interpolation. where Ω1 = [0; 1℄ [0; 1℄ and Ω2 = [1; 2℄ [0; 1℄. The r.h.s. f and the Dirichlet boundary conditions g are chosen so that the exact solution is us (x; y) = (sin(πx) + sin(πx=2)) sin(πy):
(8.80)
The overlapping subdomains are Ω01 = [0; 1:4℄ [0; 1℄ with h1 = 0:2 2 l and Ω02 = [0:75; 2℄ [0; 1℄ with h2 = 0:25 2 l . Figure 8.9 shows the solution us on Ω01 and Ω02 . The grids on these overlapping subdomains are shown in Fig. 8.10.
Standard stencil with 2D interpolation In Tables 8.1 and 8.2 we list the L∞ -norm of the error on the overlapping subdomain Ω01 and Ω02 for the scheme based on the standard stencil (8.23) and 2D interpolation for the points on the interface. The results in Table 8.1 were obtained with bilinear interpolation for the points on the interface and those in Table 8.2 with bicubic interpolation. For a second order scheme, the ratio between two successive error norms should be 4 when the mesh size is halved. The ratios in columns 4 and 6 of Table 8.2 are very close to 4, illustrating that the standard stencil with bicubic interpolation results in a second order scheme. Bicubic interpolation is very expensive since it requires 16 points, and the interpolation constant is quite large. The ratios in columns 4 and 6 of Table 8.1, for bilinear interpolation, alternate between 4.11 and 3.89 and between 4.18 and 3.83. This is due to the presence of the inconsistency as shown by (8.25) which results in a dependency of the error on ξ(1 ξ), i.e. the relative position of the interface in the other mesh. For this testcase the dominant term in error bound (8.27) is e (ξ(1 ξ)c1 + c2 ) h2 , where c1 and c2 are constants independent of ξ and h. With this expression, we can estimate the ratio γe between two successive error norms. When the mesh is refined
108
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
Table 8.1: Norm of the error in the overlapping domains Ω01 and Ω02 for the smooth function us using the standard stencil with bilinear interpolation. l 0 1 2 3 4 5 6
ξ 0.6 0.2 0.4 0.8 0.6 0.2 0.4
keΩ k∞ 0
1
5.53e-2 1.40e-2 3.62e-2 8.79e-4 2.26e-4 5.50e-5 1.41e-5
2D bilinear keΩ2 k∞ γe 6.34e-2 3.94 1.31e-2 3.88 3.47e-3 4.11 8.26e-4 3.89 2.16e-4 4.11 5.17e-5 3.89 1.35e-5 0
γe 4.82 3.80 4.19 3.82 4.18 3.83
Table 8.2: Norm of the error in the overlapping domains Ω01 and Ω02 for the smooth function us using the standard stencil with bicubic interpolation. l 0 1 2 3 4 5 6
ξ 0.6 0.2 0.4 0.8 0.6 0.2 0.4
keΩ1 k∞ 5.20e-2 1.34e-2 3.33e-3 8.34e-4 2.08e-4 5.21e-5 1.30e-5 0
2D bicubic keΩ2 k∞ γe 4.14e-2 3.8850 1.19e-2 4.0164 3.01e-3 3.9986 7.50e-4 4.0010 1.88e-4 4.0003 4.69e-5 4.0000 1.17e-5 0
γe 3.4810 3.9605 4.0072 4.0012 4.0004 3.9998
8.10. NUMERICAL RESULTS
109
2 1.5 1 0.5 0 -0.5 1 0.8 0.6
0 0.4
0.5 1 1.5
0.2 2 0
Figure 8.9: The solution us on Ω01 and Ω02 . by halving the mesh size, i.e. hi+1 = hi =2, we have γe =
keΩ k∞ c1 (ξi (1 keΩ + k∞ = c1 (ξi 1 (1 0
hi
0
hi 1
+
ξi ) + γc ) h2i ξi+1) + γc ) h2i+1
=
ξ i (1 ξi+1 (1
ξi) + γc 4 (8.81) ξi+1) + γc
where γc = c2 =c1 . For Ω01 we find γc = 2:7491 which results in ratios γe of 4.11 and 3.89 while for Ω02 we have γc = 1:6178 resulting in ratios γe of 4.18 and 3.83. This explains the ratios in columns 4 and 6 in Table 8.1. Note that the same values are observed for ξ = 0:2 and ξ = 0:8 on the one hand and for ξ = 0:4 and ξ = 0:6 on the other hand. The accuracy of the scheme depends on the relative position of the interface in the other mesh. Apart from this phenomenon the scheme is second order, since fitting a power of the mesh size keΩ k∞ κhλ yields λ = 1:9929. The 1 second order accuracy can also be seen when the mesh is refined twice, i.e. the mesh size is divided by 4, in this case the factor ξ(1 ξ) does not change and we get ratios of 15.9973 (for keΩ k∞ ) and of 15.9966 (for keΩ k∞ ) between two 1 2 successive errors, which is very close to the theoretical value of 16. 0
0
0
Modified stencil with 1D interpolation In Table 8.3 we show the results for the modified stencil (8.55) with 1D linear interpolation and in Table 8.4 the results for the modified stencil (8.68) with 1D
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
110
1.2 1
0.8
0.6
0.4
0.2
0
0
0.5
+
+
+
+
+
+
+
+
+
+
+
+ 1
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
h1 = 0:2 h2 = 0:25
1.5
2
Figure 8.10: The grids on the overlapping subdomains Ω01 = [0; 1:4℄ [0; 1℄ with h1 = 0:2 (points marked ) and Ω02 = [0:75; 2℄ [0; 1℄ with h2 = 0:25 (points marked +).
8.10. NUMERICAL RESULTS
111
Table 8.3: Norm of the error in the overlapping domains Ω01 and Ω02 for the smooth function us using the modified stencil with 1D linear interpolation. l 0 1 2 3 4 5 6
keΩ k∞ 0
1
5.29e-2 1.35e-2 3.43e-3 8.39e-4 2.10e-4 5.21e-5 1.31e-5
1D linear keΩ2 k∞ γe 5.16e-2 3.9170 1.21e-2 3.9373 2.06e-3 4.0851 5.80e-4 4.0043 1.63e-4 4.0239 4.34e-5 3.9907 1.13e-5 0
γe 4.2481 5.9039 3.5436 3.5521 3.7673 3.8331
Table 8.4: Norm of the error in the overlapping domains Ω01 and Ω02 for the smooth function us using the modified stencil with 1D cubic interpolation. l 0 1 2 3 4 5 6
keΩ1 k∞ 4.98e-2 1.34e-2 3.31e-3 8.34e-4 2.08e-4 5.21e-5 1.30e-5 0
1D cubic keΩ2 k∞ γe 8.41e-2 3.7168 1.51e-2 4.0523 3.67e-3 3.9687 8.22e-4 4.0017 1.95e-4 3.9999 4.78e-5 4.0004 1.18e-5 0
γe 5.5828 4.0997 4.4717 4.2099 4.0868 4.0393
112
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
cubic interpolation. The advantage of these schemes is that fewer points are involved in the interpolation, since 1D cubic interpolation requires 4 points, and 1D linear interpolation only 2. It is clear that the scheme based on the modified stencil with 1D cubic interpolation is second order and that it is as accurate as the classical approach with expensive bicubic interpolation. The results for the modified stencil (8.55) with 1D linear interpolation are comparable to those of the other schemes involving cubic interpolation, but it is not clear whether this is a second order scheme or not. The results for keΩ1 k∞ and the corresponding ratios are fine, while those for keΩ2 k∞ are good in the sense that the same accuracy is attained, but the ratios are not very close to 4. 0
0
Effect of overlap In order to study the effect of the overlap, we fix the mesh sizes to be h1 1 = 320 and h2 1 = 256 (corresponding to l = 6) and vary the overlap according to δ1 = 2 2m h1 (ranging from 0:625% to 80%) and δ2 = 2m h2 (ranging from 0:390625% to 50%) for integer values of m from 0 to 7. In Tables 8.5 and 8.6 we list the L∞ -norm of the error in the nonoverlapping subdomains Ω1 and Ω2 for the different schemes. The results for the standard stencil (8.23) with bilinear interpolation are given in columns 2 and 3 and for bicubic interpolation in columns 4 and 5 of Table 8.5. In columns 2 and 3 of Table 8.6 we show the results for the modified stencil (8.55) with 1D linear interpolation and in columns 4 and 5 for the modified stencil (8.68) with 1D cubic interpolation. From the results it is clear that the global accuracy of the first method increases as the overlap increases, thus necessitating substantial overlap, while the other methods reach the attainable accuracy even with minimal overlap. The dependency of the accuracy on the amount of overlap for the first method can be explained by the presence of the inconsistency (8.25) and by the fact that the error bound (8.11) depends on the contraction factor, which in turn depends on the amount of overlap. More specifically the factor 1 + 1 σ τ in the error bound (8.11) is smaller when the overlap is larger since τ = maxi (ρi σ) < 1 is smaller. In Table 8.7 we list the number of additive Schwarz iterations required to satisfy the convergence criterion of krn k2 10 10kr0 k2 . As expected the number of additive Schwarz iterations decreases, as the overlap increases. Equation (8.9) shows that the contraction factor ρi measures the error reduction and since increasing the overlap results in a decreasing contraction factor ρi , faster convergence is obtained when the overlap is large. When GMRES is used to accelerate the convergence, only 3 iterations are required unless m = 0, in this case an extra iteration is necessary.
8.10. NUMERICAL RESULTS
113
Table 8.5: The effect of overlap on the norm of the error in the nonoverlapping domains Ω1 and Ω2 for the smooth function us using the standard stencil with 2D interpolation. m 0 1 2 3 4 5 6 7
2D bilinear keΩ1 k∞ keΩ2 k∞ 4.00e-4 4.00e-4 1.89e-4 1.90e-4 7.99e-5 7.93e-5 3.12e-5 3.19e-5 9.97e-6 1.04e-5 1.24e-5 7.47e-6 1.41e-5 6.82e-6 1.33e-5 6.53e-6
2D bicubic keΩ1 k∞ keΩ2 k∞ 1.31e-5 6.48e-6 1.31e-5 6.48e-6 1.31e-5 6.47e-6 1.31e-5 6.46e-6 1.30e-5 6.48e-6 1.30e-5 6.69e-6 1.30e-5 7.48e-6 1.31e-5 9.16e-6
Table 8.6: The effect of overlap on the norm of the error in the nonoverlapping domains Ω1 and Ω2 for the smooth function us using the modified stencils with 1D linear interpolation. 1D linear
m 0 1 2 3 4 5 6 7
1D cubic
keΩ k∞ keΩ k∞ keΩ k∞ keΩ k∞ 1
2
1
2
1.37e-5 1.29e-5 1.34e-5 1.30e-5 1.31e-5 1.30e-5 1.31e-5 1.31e-5
7.95e-6 6.39e-6 6.79e-6 6.42e-6 6.33e-6 6.30e-6 7.29e-6 9.22e-6
1.31e-5 1.31e-5 1.31e-5 1.31e-5 1.30e-5 1.30e-5 1.30e-5 1.31e-5
6.49e-6 6.49e-6 6.48e-6 6.48e-6 6.51e-6 6.73e-6 7.52e-6 9.19e-6
Table 8.7: The effect of overlap on the convergence rate of the Schwarz method for the smooth function us . m 1D linear 1D cubic 2D bilinear 2D bicubic
0 391 389 846 846
1 302 300 436 436
2 161 161 224 224
3 96 96 116 116
4 53 53 60 60
5 30 30 32 32
6 16 16 17 17
7 9 9 10 10
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
114
0.08 0.06 0.04 0.02 0 -0.02 -0.04 -0.06 -0.08
1
1.2 1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 0
0
Figure 8.11: The solution u p on Ω01 .
0.08 0.06 0.04 0.02 0 -0.02 -0.04 -0.06 -0.08 1 0.8 0.8
0.6 1
1.2
0.4 1.4
1.6
0.2 1.8
2 0
Figure 8.12: The solution u p on Ω02 .
8.10. NUMERICAL RESULTS
115
8.10.2 Function with a peak This test case differs from the test case in Section 8.10.1, in that the solution has a sharp peak in the centre of the domain. It is also concerned with the solution of ∇2 u = f on Ω = Ω1 [ Ω2 , where Ω1 = [0; 1℄ [0; 1℄ and Ω2 = [1; 2℄ [0; 1℄. The r.h.s. f and the Dirichlet boundary conditions g are chosen so that the exact solution is u p (x; y) = e
100((x 1)2 +(y 0:5)2 )
sin(2πx) sin(2πy):
(8.82)
The overlapping subdomains are Ω01 = [0; 1:4℄ [0; 1℄ with h1 = 0:2 2 l and Ω02 = [0:75; 2℄ [0; 1℄ with h2 = 0:25 2 l . The solution u p on Ω01 and Ω02 is shown in Fig. 8.11 and Fig. 8.12 respectively. Standard stencil with 2D interpolation In Tables 8.8 and 8.9 we list the L∞ -norm of the error on the overlapping subdomain Ω01 and Ω02 for the scheme based on the standard stencil (8.23) and 2D interpolation for the points on the interface. The results in Table 8.8 were obtained with bilinear interpolation for the points on the interface and those in Table 8.9 with bicubic interpolation. The results for bilinear interpolation are as good as for bicubic interpolation, showing that in this case the error is mainly due to the discretisation error and that the interpolation error is much smaller. Modified stencil with 1D interpolation In Table 8.10 we show the results for the modified stencil (8.55) with 1D linear interpolation. The results for the modified stencil (8.68) with 1D cubic interpolation are shown in Table 8.11. These results also show that the scheme based on the modified stencil with 1D cubic interpolation is second order and that it is as accurate as the classical approach with expensive bicubic interpolation. The results for the scheme with 1D linear interpolation are comparable to those of the other schemes involving cubic interpolation, showing that the error is dominated by the discretisation error. From these results we cannot judge the quality of the 1D linear interpolation. Effect of overlap In Tables 8.12 and 8.13 we list the L∞ -norm of the error in the nonoverlapping subdomains Ω1 and Ω2 for the different schemes. The results for the standard stencil (8.23) with bilinear interpolation are given in the columns 2 and 3 in Table 8.12 and those for bicubic interpolation in columns 4 and 5 of the same table. In the columns 2 and 3 in Table 8.13 we show the results for the modified stencil (8.55)
116
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
Table 8.8: Norm of the error in the overlapping domains Ω01 and Ω02 for the function with a peak u p using the standard stencil with bilinear interpolation. l 0 1 2 3 4 5 6
keΩ k∞ 0
1
3.81e-2 3.34e-3 8.83e-3 2.03e-3 5.01e-4 1.28e-4 3.19e-5
2D bilinear keΩ2 k∞ γe 1.36e-2 11.3849 3.94e-2 0.37876 1.44e-2 4.35111 3.23e-3 4.0472 7.86e-4 3.92651 1.99e-4 4.00461 4.97e-5 0
γe 0.3458 2.73382 4.46339 4.10649 3.94534 4.00469
Table 8.9: Norm of the error in the overlapping domains Ω01 and Ω02 for the function with a peak u p using the standard stencil with bicubic interpolation. l 0 1 2 3 4 5 6
keΩ k∞ 0
1
3.81e-2 3.35e-3 8.83e-3 2.03e-3 5.01e-4 1.28e-4 3.19e-5
2D bicubic γe keΩ2 k∞ 1.73e-2 11.3797 3.93e-2 0.37894 1.44e-2 4.35111 3.23e-3 4.0472 7.86e-4 3.92651 1.99e-4 4.00461 4.97e-5 0
γe 0.4415 2.72781 4.46346 4.10643 3.94533 4.0047
8.10. NUMERICAL RESULTS
117
Table 8.10: Norm of the error in the overlapping domains Ω01 and Ω02 for the function with a peak u p using the modified stencil with 1D linear interpolation. l 0 1 2 3 4 5 6
keΩ k∞ 0
1
3.79e-2 3.00e-3 8.83e-3 2.03e-3 5.01e-4 1.28e-4 3.19e-5
1D linear keΩ2 k∞ 7.00e-4 12.61 3.89e-2 0.340 1.44e-2 4.351 3.23e-3 4.047 7.87e-4 3.9265 1.99e-4 4.0046 4.97e-5 γe
0
γe 0.01802 2.69511 4.46252 4.10771 3.94718 4.00575
Table 8.11: Norm of the error in the overlapping domains Ω01 and Ω02 for the function with a peak u p using the modified stencil with 1D cubic interpolation. l 0 1 2 3 4 5 6
keΩ k∞ 0
1
3.80e-2 3.01e-3 8.83e-3 2.03e-3 5.01e-4 1.28e-4 3.19e-5
1D cubic γe keΩ2 k∞ 7.32e-4 12.65 3.89e-2 0.34045 1.46e-2 4.35111 3.24e-3 4.04715 7.87e-4 3.92647 1.99e-4 4.00459 4.97e-5 0
γe 0.0188 2.669 4.494 4.123 3.951 4.0056
118
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
Table 8.12: The effect of overlap on the norm of the error in the nonoverlapping domains Ω1 and Ω2 for the function with a peak u p using the standard stencil with 2D interpolation. 2D bilinear
m 0 1 2 3 4 5 6 7
2D bicubic
keΩ k∞ keΩ k∞ keΩ k∞ keΩ k∞ 1
2
1
2
3.92e-5 3.52e-5 4.75e-5 3.91e-5 3.38e-5 3.18e-5 3.19e-5 3.19e-5
4.63e-5 5.25e-5 4.48e-5 5.17e-5 5.29e-5 5.00e-5 4.97e-5 4.97e-5
2.64e-5 2.65e-5 2.69e-5 2.84e-5 3.12e-5 3.19e-5 3.19e-5 3.19e-5
5.58e-5 5.57e-5 5.53e-5 5.42e-5 5.19e-5 4.97e-5 4.97e-5 4.97e-5
Table 8.13: The effect of overlap on the norm of the error in the nonoverlapping domains Ω1 and Ω2 for the function with a peak u p using the modified stencils with 1D interpolation. 1D linear
m 0 1 2 3 4 5 6 7
1D cubic
keΩ k∞ keΩ k∞ keΩ k∞ keΩ k∞ 1
2
1
2
3.27e-5 2.52e-5 3.03e-5 2.80e-5 3.06e-5 3.19e-5 3.19e-5 3.19e-5
5.55e-5 5.90e-5 5.94e-5 5.70e-5 5.07e-5 4.92e-5 4.97e-5 4.97e-5
2.63e-5 2.66e-5 2.68e-5 2.87e-5 3.14e-5 3.19e-5 3.19e-5 3.19e-5
5.58e-5 5.55e-5 5.50e-5 5.34e-5 5.13e-5 4.97e-5 4.97e-5 4.97e-5
8.10. NUMERICAL RESULTS
119
Table 8.14: The effect of overlap on the convergence rate of the Schwarz method for the function with a peak u p . m 1D linear 1D cubic 2D bilinear 2D bicubic
0 183 183 396 396
1 142 142 205 205
2 76 76 106 106
3 46 45 55 55
4 25 25 28 28
5 14 14 15 15
6 7 7 8 8
7 4 4 4 4
with 1D linear interpolation and in the columns 4 and 5 in this table we list the results for the modified stencil (8.68) with 1D cubic interpolation. For a consistent scheme some overlap is required to achieve the accuracy but increasing the size of the overlap does not change the accuracy. The dependency of the accuracy on the amount of overlap for the inconsistent scheme with bilinear interpolation is explained in Section 8.10.1. In this case the influence of the size of the overlap is small since the discretisation error is dominant when compared to the interpolation error. We list the number of additive Schwarz iterations required to satisfy the convergence criterion of krn k2 10 10kr0 k2 in Table 8.14. As expected, the number of Schwarz iterations is reduced when the size of the overlap is increased.
8.10.3 Quadratic Function Once more we consider the solution of ∇2 u = f on Ω = Ω1 [ Ω2 , where Ω1 = [0; 1℄ [0; 1℄ and Ω2 = [1; 2℄ [0; 1℄. In this test case the r.h.s. f and the Dirichlet boundary conditions g are chosen so that the exact solution is the quadratic function uq (x; y) = x2 :
(8.83)
The overlapping subdomains are Ω01 = [0; 1:4℄ [0; 1℄ with h1 = 0:2 2 l and Ω02 = [0:75; 2℄ [0; 1℄ with h2 = 0:25 2 l . The solution uq on the overlapping grid Ω01 and Ω02 is shown in Fig. 8.13. This test case is a worst case scenario for the influence of the interpolation error since the discretisation error of the standard five-point stencil is zero, because all derivatives of order 3 and higher are zero. A fully consistent scheme, such as the standard five-point stencil with bicubic interpolation or the modified stencil (8.68) with 1D cubic interpolation, computes the exact solution up to machine precision for this testcase on any grid. Hence any error observed in the numerical results is caused by the interpolation. The numerical results are given in Section 9.5 because of the striking resemblance between the results obtained with bilinear interpolation and the mortar projection used as an interpolation technique.
CHAPTER 8. COMPOSITE MESH DIFFERENCE METHODS
120
4 3.5 3 2.5 2 1.5 1 0.5 0
1 0.8 0.6
2 0.4
1.5 1
0.2 0 0
0.5
Figure 8.13: The solution uq on Ω01 and Ω02 .
8.11 Conclusion We have studied several interface interpolation schemes for overlapping nonmatching grids finite difference methods. A scheme based on the combination of 1D cubic interpolation and a six-point stencil is proposed and produces a consistent and globally second order method. We have also shown numerically that minimum overlap is required to achieve the accuracy, and that larger overlap reduces the number of Schwarz iterations but does not change the accuracy. The method is cheaper than the interpolation method required by the theory of [CH90] and the mortar based method proposed in [CDS99].
Chapter 9
Mortar Projection in Overlapping Composite Mesh Difference Methods Logics will get you from A to B, Imagination will take you everywhere.1
We study experimentally the effect of the mortar projection in an overlapping composite mesh difference method for two-dimensional elliptic problems.
9.1 Introduction In [CDS99], an overlapping mortar element method was proposed. This method has several desirable properties. For example, the discretisation is consistent, the accuracy is of optimal order and the error is independent of the size of the overlap, as well as the ratio of the mesh sizes. However, a major disadvantage of the method is that it needs weights in the bilinear form. The artificially introduced piecewise constant weights make the scheme consistent, but at the same time make it impossible to use fast solvers for the subdomain problems. On the other hand, the composite mesh difference method (CMDM) [Sta77, CMS00, GC99] does not need any weights, and its accuracy is also of optimal order if used with higher order interface interpolations. For example, the 2D bicubic or modified 1D cubic interface interpolation ([GC99]) is needed if one uses P1 or Q1 finite elements for 1 Albert
Einstein
121
122
CHAPTER 9. MORTAR PROJECTION IN CMDM t
t
t
t t t t t t Figure 9.1: Uniform grid and P1 finite element mesh.
the interior of the subdomains. If the computationally more efficient low order interpolation is used on the interfaces, it may lead to a local inconsistent discretisation, resulting in an error that depends on the size of the overlap. The goal in this chapter is to take the mortar approach (as given in Section 9.3), drop the weights and compare its results to the non-mortar methods given in Chapter 8. Of course, in an ideal scheme, which is yet to be discovered, the accuracy should be of optimal order and the error be independent of the size of the overlap and the ratio of mesh sizes. To be able to use fast solvers for the subdomain problems, it is also desirable not to have weights in the discretisation on the overlapping parts of subdomains.
9.2 2D Linear Finite Elements Some of the results given here for 2D linear finite elements are summarised in Table 5.1 of the book by Hirsch [Hir95].
9.2.1 Triangular P1 Linear Finite Element We need to define triangles on our uniform grid in order to be able to use the P1 finite element. The triangles are obtained by introducing the downward diagonal, i.e. the diagonal from the upper left corner to the lower right corner, of every square in the mesh. The resulting triangular mesh on the uniform grid is shown in Fig. 9.1. In every triangular element e we have 3 linear basis functions (e)
ϕi
(x; y) = ai x + bi y + ci :
(9.1)
9.2. 2D LINEAR FINITE ELEMENTS
123
It is straightforward to verify that if the coefficients are chosen as ai = (y2
y3 )=2A
(9.2)
bi = (x3 x2 )=2A ci = (x2 y3 x3 y2 )=2A
(9.3) (9.4) (9.5)
where A is the area of the triangle 0
A=
x1 1 det x2 2 x3
y1 y2 y3
1
1 1 A; 1
(9.6)
1 for i = j, 0 for i 6= j.
(9.7)
these functions have the property that (e)
ϕi
(x j ; y j ) = δi; j =
Here δi; j denotes the Kronecker delta, which is equal to 1 if and only if i = j, and 0 otherwise. Using these basisfunctions we can compute the the element stiffness matrix K (e) which is defined by Z (e) (e) (e) (9.8) ∇ϕi :∇ϕ j dS; Ki; j = Ωe
and the element mass matrix M (e) which is defined by Z (e) (e) (e) Mi; j = ϕi :ϕ j dS:
(9.9)
Ωe
The resulting linear system is obtained by assembling the element stiffness matrices and the element mass matrices. In order to find the difference formula that corresponds to the P1 finite element discretisation on the uniform mesh, we consider the triangle constructed with the points (0; 0), (1; 0) and (0; 1) in the parameter space (ξ; η). The basis functions are (e)
ϕ1
=1
ξ
η;
(9.10)
(e)
= ξ;
(9.11)
(e)
= η:
(9.12)
ϕ2 ϕ3
For this standard element, the P1 element stiffness matrix is 0
(e)
KP1
=
1 2
2 1 1
1 1 0
1
1 0 A 1
(9.13)
CHAPTER 9. MORTAR PROJECTION IN CMDM
124
t
t
t
t
t
t
t
t
t
Figure 9.2: Uniform grid and Q1 finite element mesh.
and the P1 element mass matrix is 0
(e)
MP1
=
2 1 1 1 2 24 1 1
1
1 1 A: 2
(9.14)
The difference formula that corresponds to the P1 finite element discretisation on the uniform mesh is found by assembling the contributions from the 6 elements which share a given node. This results in 2
1 4 h2
1
1 4 1
3
1 5 ui; j =
(uxx + uyy )
(uxxxx + uyyyy )
h2 3 + O (h ): (9.15) 12
The corresponding stencil for the right-hand side is 2
RP1 =
1 1 1 4 1 6 12 1
3
1 5 fi; j : 1
(9.16)
In case “lumping” is used, the r.h.s. becomes Rlumping = fi; j :
(9.17)
9.2.2 Quadrilateral Q1 Bilinear Finite Element In order to find the difference formula that corresponds to the Q1 finite element discretisation on the uniform mesh, we consider the square constructed with the points (0; 0), (1; 0), (0; 1) and (1; 1) in the parameter space (ξ; η). The basis func-
9.2. 2D LINEAR FINITE ELEMENTS
125
tions are (e)
ϕ1
(e)
ϕ2
(e)
ϕ3
(e)
ϕ4
ξ)(1
= (1 = ξ(1 = (1
η);
(9.18)
η);
(9.19)
ξ)η;
(9.20)
= ξη:
(9.21)
For this standard element, the Q1 element stiffness matrix is 0
(e)
KQ1
=
4 1B B 1 6 2 1
1 4 1 2
1
2 1 4 1
1 2 C C 1 A 4
(9.22)
and the Q1 element mass matrix is 0
(e)
MQ1
=
4 1 B B 2 36 1 2
2 4 2 1
1 2 4 2
1
2 1 C C: 2 A 4
(9.23)
The difference formula that corresponds to the Q1 finite element discretisation on the uniform mesh is found by assembling the contributions from the 4 elements which share a given node. This results in 2
1 4 3h2
1 1 1
1 8 1
3
1 1 5 ui; j = 1
(uxx + uyy )
(uxxxx + 4uxxyy + uyyyy )
h2 3 + O (h ) 12 (9.24)
The corresponding stencil for the right-hand side is 2
RQ1 =
1 1 4 4 36 1
3
4 1 16 4 5 fi; j 4 1
(9.25)
From a finite difference methods point of view, there is no reason to use (9.24) since a fourth order accurate finite difference discretisation can be constructed using these nine points, while this Q1 discretisation is only second order accurate and the discretisation error is even larger than the discretisation error of the P1 discretisation stencil (9.15) which uses only five points.
CHAPTER 9. MORTAR PROJECTION IN CMDM
126
t
t
t
A AA AA t A AAt t AAt
t
t
Figure 9.3: A mortar projection interface test function.
9.3 Overlapping Nonmatching Grid Mortar Element Method In this section we briefly describe the overlapping nonmatching grid mortar method. A two-subdomain version was given in [CDS99] and a many subdomain version was given by Maday et al2 . The notation used here is introduced in Section 8.2. Let Ω = Ω1 [ Ω2 be the union of two open polygonal subdomains, which do not overlap each other. The enlarged subdomain Ω0i is defined by (8.2). The function space for P1 or Q1 finite elements on a uniform grid with mesh size hi is denoted by Vhi . We denote the interface by γi = Γci = ∂Ω0i n Γi and the trace space Vhi (γ j ) as the restriction of Vhi on γ j . The mortar projection π1 maps the space Vh2 (γ1 ) into Vh1 (γ1 ): Z ˜ (γ1 ). π1 : ϕ 2 Vh2 (γ1 ) 7! π1 ϕ 2 Vh1 (γ1 ) : (ϕ π1 ϕ)ψ ds = 0 8ψ 2 W h1 γ1
(9.26) ˜ (γ1 ) denotes the space of continuous The interface test function space W h1 piecewise linear functions that are constants in the first and last intervals, see [BMP94, CDS99]. In Fig. 9.3 we show an example of an interface test function ˜ (γ1 ). The 3 types of basisfunctions we use to set up the resulting tridiψ2W h1 agonal system for the mortar projection are shown in Fig. 9.4. The function ϕ is called the master function of the mortar projection and the result π1 ϕ is call the slave function. The grid points used to represent the master and slave functions are referred to as master nodes and slave nodes. Similarly we can define π2 . This projection is used in the definition of the solution space Vh = f(u1 ; u2 )ju1 2 Vh1 ; u2 2 Vh2 ; u1jγ1
g
(9.27)
Find u = (u1 ; u2 ) 2 Vh such that ah (u; v) = fh (v) 8v = (v1 ; v2 ) 2 Vh ,
(9.28)
= π1 (u2jγ1 ); u2jγ2 = π2 (u1jγ2 )
:
With the space Vh the variational form can be defined as:
2 Presentation
at the 12th International Conference on Domain Decomposition Methods.
9.4. MORTAR PROJECTION IN CMDM
t
t
127
t
t
t
t
t
t
t
Figure 9.4: The 3 types of basisfunctions for interface test function space. where the weighted bilinear form is defined as Z Z 1 ∇u1 :∇v1 dx + ∇u1 :∇v1 dx ah (u; v) = 2 Ω1 \Ω2 Ω1 nΩ2 Z Z 1 + ∇u2 :∇v2 dx + ∇u2 :∇v2 dx 2 Ω1 \Ω2 Ω2 nΩ1 0
0
0
0
0
0
0
0
and the right-hand side is given by Z Z Z 1 1 fh (v) = f v1 dx + f v1 dx + f v2 dx 2 Ω1 \Ω2 2 Ω1 \Ω2 Ω1 nΩ2 0
0
0
0
0
(9.29)
0
Z +
Ω02 nΩ01
f v2 dx: (9.30)
The theory by Cai et al. [CDS99] shows that the H 1 norm of the error is of order h. Their numerical results confirm this and show further that the L∞ norm and the L2 norm of the error are both of order h2 .
9.4 Mortar Projection in CMDM We now study a new scheme which takes the mortar approach (as given in Section 9.3) and drops the weights 1=2 in the bilinear form (9.29). In every subdomain we set up a finite element discretisation with the classical bilinear form Z (9.31) ahi (ui ; vi ) = ∇ui :∇vi dx Ω0i
and use the mortar projection (9.26) to compute the Dirichlet conditions along the interfaces ∂Ω0i n Γi . Hence we have p local problems of the form (8.6). The mortar projection is a second order accurate interpolation and can be used in a CMDM as described in Chapter 8.
9.4.1 Maximum Principle In Fig. 9.5 we illustrate that the mortar projection does not, in general, satisfy the maximum principle.
CHAPTER 9. MORTAR PROJECTION IN CMDM
128
Slave Master
1
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6 0.2
0.3
0.4
0.5
0.6
0.7
0.8
Figure 9.5: The mortar projection does not satisfy the maximum principle. Theorem 9.4.1 The mortar projection does not satisfy the maximum principle, i.e. there exists a function ϕ that satisfies:
kπϕk∞ kϕk∞ >
(9.32)
:
Proof. We construct an example in Fig. 9.5, that satisfies (9.32). The master (m) function is obtained by sampling the function sin(πx) at the grid points xi = ihm (s ) for i = 0; 1; : : : ; 5 where hm 1 = 5. The slave nodes are xi = ihs for i = 0; 1; : : : ; 4 (s) (s) where hs 1 = 4. The slave function is set to 0 at the grid points x0 and x4 and (s)
the values at xi for i = 1; 2; 3 are determined from (9.26). We see that the slave (s) function is larger than the master function at x2 = 0:5. Remark 9.4.1 The interpolation constant σ can be larger than 1 in the bound
kπϕk∞ σkϕk∞
:
(9.33)
We may need a large overlap to make the contraction factor ρ small enough in order to have τ < 1.
9.4.2 Inconsistent Discretisation Both the P1 finite element discretisation (9.15) and the Q1 finite element discretisation (9.24) on a uniform mesh can be considered as finite difference stencils for
9.4. MORTAR PROJECTION IN CMDM t
129
t
t
γ
t
t t t
t
t t t
Figure 9.6: The mortar master function ϕ1 on the interface γ1 is constructed from the P1 finite element representation on Ω2 . which the local truncation error is second order. In a finite difference method the r.h.s. is given by (9.17), this corresponds to the use of lumping in a finite element method. All the assumptions for a CMDM are satisfied and the error bound (8.11) shows that the resulting scheme is second order. Due to the fact that the values for the Dirichlet boundary conditions on the interior subdomain boundaries, obtained by the mortar projection, are only O (h2 ) accurate, the discretisations which use these values are inconsistent, since the discretisation error contains the interpolation error divided by h2 . This leaves a constant term in the error expansion of the combined discretisation interpolation pair, which does not tend to zero as the mesh size h tends to zero. Consequently this scheme does not satisfy the consistent interpolation condition (8.22) and we expect the global accuracy to depend on the size of the overlap.
9.4.3 Computation of Master Side The interpolation from the master to the slave side of the mortar on the interface is only one aspect of the interpolation problem. In the case of overlapping nonmatching grids we also need to compute the master side of the mortar, which requires evaluating the P1 or Q1 finite element function. This boils down to linear interpolation. In Fig. 9.6 we illustrate the interpolation required for the computation of the master side of the mortar in the case of P1 elements. Note that the intersection of the interface Γ and the diagonal in every square leads to an additional master node, for which a linear interpolation along this diagonal is required. The other master nodes are obtained by means of a linear interpolation in the direction normal on the interface. The interpolation required for the computation of the master side of the mortar in the case of Q1 elements is illustrated in Fig. 9.7. In this case linear interpolation is done only in the direction normal on interface. Based on our experience with bilinear interpolation (see Theorem 8.6.1) we
130
CHAPTER 9. MORTAR PROJECTION IN CMDM t
t
t
γ
t
t
t
t
t
t
Figure 9.7: The mortar master function ϕ1 on the interface γ1 is constructed from the Q1 finite element representation on Ω2 .
can estimate the effect of using linear interpolation in the direction normal on interface, which is required for the computation of the mortar master side for P1 and Q1 finite elements. Suppose the interface is at x = xΓ between the grid lines at xi and xi+1 . The coefficients for the linear interpolation in the direction normal on the interface are ξ = (xΓ xi )=(xi+1 xi ) and (1 ξ). We expect this interpolation to give rise to a term ξ(1 ξ)uxx in the bound on the error in the extended subdomain just as in the case of the standard P1 stencil with bilinear interpolation. The numerical results in Section 9.5 clearly show the influence of the term ξ(1 ξ)uxx in the error bound.
9.4.4 Size of the overlap A final point is the dependency on the overlap. We have already pointed out that a large overlap may be required since the mortar projection does not satisfy the maximum principle. However this does not imply that the error on the nonoverlapping subdomains Ω1 and Ω2 depends on the size of the overlap. The standard stencils with bicubic interpolation and our modified stencil with 1D cubic interpolation also require some overlap in order to make sure that τ < 1 because the interpolation constants are larger than 1. But the numerical results show that there is no dependency on the amount of overlap since these schemes are fully consistent. In this case the error depends on the size of the overlap and this is due to the inconsistency mentioned above. In Table 9.3 we show numerical results illustrating the effect of the size of the overlap. These results also confirm the well known fact that increasing the size of the overlap results in faster convergence of the additive Schwarz method.
9.5. NUMERICAL RESULTS
131
Table 9.1: Effect of inconsistent discretisation: results for P1 stencil with bilinear interpolation. l 0 1 2 3 4 5 6
ξ1 0.6 0.2 0.4 0.8 0.6 0.2 0.4
bilinear interpolation keΩ2 k∞ γe 1 1.65e-2 1.02e-2 2.97e-3 5.57 2.98e-3 9.58e-4 3.10 1.55e-4 1.60e-4 6.00 2.59e-5 5.98e-5 2.67 9.70e-6 9.97e-6 6.00 1.62e-6 3.74e-6 2.67 6.06e-7
keΩ k∞ 0
0
γe 3.42 19.1 6.00 2.67 6.00 2.67
9.5 Numerical results 9.5.1 Quadratic Function Our testcase concerns the solution of ∇2 u = f on Ω = Ω1 [ Ω2 , where Ω1 = [0; 1℄ [0; 1℄ and Ω2 = [1; 2℄ [0; 1℄. The r.h.s. f and the Dirichlet boundary conditions g are chosen so that the exact solution is u(x; y) = x2 . The overlapping subdomains are Ω01 = [0; 1:4℄ [0; 1℄ with h1 = 0:2 2 l and Ω02 = [0:75; 2℄ [0; 1℄ with h2 = 0:25 2 l . Inconsistent Schemes In Table 9.1 we list the L∞ norm of the error keΩ k∞ and keΩ k∞ on the overlapping 1 2 domains Ω01 and Ω02 for the standard five-point P1 stencil with bilinear interpolation. The results for the standard five-point P1 stencil with mortar projection are given in Table 9.2. Both these combinations satisfy all the assumptions for a CMDM so the error bound (8.11) shows that these methods are second order. For a second order scheme, the ratio between two successive error norms should be 4 when the mesh size is halved. The discussion here is based on the bound (8.27) on the error in every extended subdomain Ω0i for the standard P1 stencil with bilinear interpolation. The presence of the inconsistency results in a dependency of the error on ξ(1 ξ), i.e. the relative position of the interface in the other mesh. For this testcase the dominant term in the error bound is e (ξ(1 ξ)c1 + c2 ) h2 , where c1 and c2 are constants independent of ξ and h. With this expression, we can estimate the ratio γe between two successive error norms. When the mesh is refined by halving the mesh size, 0
0
CHAPTER 9. MORTAR PROJECTION IN CMDM
132
Table 9.2: Effect of inconsistent discretisation: results for P1 stencil with mortar projection. l 0 1 2 3 4 5 6
mortar projection keΩ2 k∞ γe 1 2.95e-2 1.64e-2 5.02e-3 5.88 5.02e-3 1.85e-3 2.71 1.57e-4 3.11e-4 5.96 2.59e-5 1.17e-4 2.66 9.71e-6 1.95e-5 5.99 1.62e-6 7.32e-6 2.66 6.06e-7
keΩ k∞
ξ1 0.6 0.2 0.4 0.8 0.6 0.2 0.4
0
γe
0
3.26 32.0 6.04 2.67 6.00 2.67
i.e. hi+1 = hi =2, we have γe =
keΩ k∞ c1 (ξi (1 = keΩ + k∞ c1 (ξi 1 (1 0
hi
0
hi 1
+
ξi ) + γc ) h2i ξi+1) + γc ) h2i+1
=
ξi (1 ξi+1 (1
ξi) + γc 4 (9.34) ξi+1) + γc
where γc = c2 =c1 . The worst case scenario is γc = 0 which results in values of 6:00 and 2:67 for γe since in this testcase the term ξ(1 ξ) alternates between 0.24 and 0.16. For the function u(x; y) = x2 we have γc 0. The numerical results in Tables 9.1 and 9.2 show ratios γe equal to 6:00 and 2:67, illustrating the effect of the inconsistency due to linear interpolation in the x-direction. Apart from this phenomenon both schemes are second order, since fitting a power of the mesh size keΩ1 k∞ κhλ yields λ 2. The second order accuracy can also be seen when the mesh is refined twice, i.e. the mesh size is divided by 4, in this case ξ(1 ξ) does not change and we get ratios between two successive error norms, which are very close to the theoretical value of 16. 0
Second Order Consistent Scheme A fully consistent scheme such as the standard P1 stencil with bicubic interpolation or the modified stencil (8.68) with 1D cubic interpolation, computes the exact solution up to machine precision for this testcase on any grid. Effect of Overlap In order to see the effect of the overlap, we fix the mesh sizes to be h1 1 = 320 and h2 1 = 256 and vary the overlap according to δ1 = 2 2m h1 and δ2 = 2m h2 for the values of m listed in Table 9.3. This table shows the number of additive Schwarz iterations required to satisfy the convergence criterion of krn k2 10 10kr0 k2 and
9.6. CONCLUSION
133
Table 9.3: Effect of overlap on the convergence rate of the Schwarz method and on the accuracy for the standard P1 stencil with mortar projection. The same results are obtained with bilinear interpolation. m nsolver nprec keΩ1 k∞ keΩ2 k∞ 0 587 35 1.00e-4 9.99e-5 1 305 26 4.42e-5 4.47e-5 2 159 19 1.74e-5 1.59e-5 3 83 14 6.01e-6 5.07e-6 4 44 10 4.81e-6 3.41e-6 5 24 8 1.77e-6 8.49e-7 6 13 6 1.31e-6 2.77e-7 7 8 5 2.51e-7 1.04e-8 the L∞ norm of the error in the nonoverlapping subdomains Ω1 and Ω2 . First we list the number of iterations the method needs when it is used a solver, i.e. in a Richardson iteration, in this case the convergence rate is bounded by τ. We also list the number of iterations the method needs when it is used as a right preconditioner for GMRES. As expected the number of additive Schwarz iterations decreases in both cases, as the overlap increases. These results clearly show the advantage of using a Krylov subspace method to accelerate the convergence of the iterative solver. From the results it is clear that the global accuracy of these two methods increases as the overlap increases, thus necessitating substantial overlap. The sensitivity to the size of the overlap is quite high since the error decreases 3 orders of magnitude when the overlap is increased from m = 0 to m = 7. This is highly undesirable. With a consistent scheme, this error would be independent of the size of the overlap.
9.6 Conclusion We studied the effect of using a mortar projection as the interface interpolation in a composite mesh difference method for overlapping nonmatching grid problems. In this case the results are comparable to using bilinear interpolation for the Dirichlet boundary conditions on the interfaces. This is due to the fact that a linear interpolation in the direction that is normal to interface is used to define the values on the master side of the interface. This results in a dependency of the error on the relative position of the interface nodes in the other mesh. Also due to the inconsistency, the global accuracy depends on the size of the overlap.
Chapter 10
Overlapping Nonmatching Grids Method based on Finite Difference Coupling I’m disappointed too, but keep in mind transmogrification is a new technology.1
In this chapter, a finite difference method on overlapping nonmatching grids for a two-dimensional Poisson problem is studied.
10.1 Introduction Instead of using an interpolation formula to transfer information from one grid to the other, i.e. to compute Dirichlet boundary conditions on the interior boundary, we propose a coupling technique based on a finite difference discretisation. The idea is to discretise the given PDE on the modified stencil consisting of the point on the interior boundary and its neighbours in the other mesh. We focus only on the error issues.
10.2 Finite Difference Coupling In Section 8.6 we have already pointed out that the scheme based on the combination of the standard five-point stencil with bilinear interpolation does not satisfy 1 Bill
Watterson, Something under the bed is drooling (A Calvin and Hobbes Collection).
135
CHAPTER 10. FINITE DIFFERENCE COUPLING
136
u3
α
v1
1
u4
0.8 0.6
β
6
-
?
u1
0.4 0.2
0
u2
-1
-0.5
0
0.5
1
Figure 10.1: Stencil for interpolation and nonmatching grid. the consistent interpolation condition (8.22) and (8.25) shows the inconsistency present in the discretisation for the nodes where the bilinear interpolation is used. Consider the stencil and the nonmatching grid shown in Fig. 10.1. Instead of using bilinear interpolation to compute v1 from u1 , u2 , u3 and u4 , which may result in an inconsistent discretisation, we discretise the partial differential equation on the stencil formed by u1 , u2 , u3 , u4 and v1 . We seek the coefficients γ0 , γ1 , γ2 , γ3 and γ4 in L(α; β)
=
γ0 v1 + γ1 u4 + γ2 u3 + γ3 u1 + γ4 u2 γ0 u (0; 0) + γ1 u ((1 α)h; (1 β)h) + γ2 u ( αh; (1
+
γ3 u ( αh; βh) + γ4 u ((1
=
(10.1) β )h )
α)h; βh)
(10.2)
so that a consistent approximation to (uxx + uyy )h2 =2 at v1 = u(0; 0) is obtained. This can be done using the Taylor expansion for u(x; y) about the origin: u(x; y) = u + ux x + uy y + uxx x2 =2 + uxy xy + uyy y2 =2 + O (h3). We assume jxj h and jyj h so that the remainder term can be bounded by Ch3 . The requirements that the coefficients of u, ux , uy and uxy vanish together with the requirements that the coefficients of uxx h2 =2 and uyy h2 =2 equal 1 in the Taylor expansion of (10.2), yield an overdetermined system Cg = c: 0 B B B B B B
1 0 0 0 0 0
1
β) (1 α) (1 β)2 (1 α)2 (1 α)(1 β) (1
1
β) α (1 β)2 α2 α(1 β) (1
1 β α β2 α2 αβ
1 β (1 α) β2 (1 α)2 (1 α)β
1
0
C CB CB CB CB C A
γ0 γ1 γ2 γ3 γ4
1
0
B C B C B C=B C B A B
1
0 0 C C 0 C C: 1 C C 1 A 0 (10.3)
This overdetermined system Cg = c can only have solutions if the determinant of the extended matrix C˜ = (C j c) is zero: det C˜ = det (C j c) = (β
α)(α + β
1):
(10.4)
10.3. ERROR ANALYSIS
137
Hence a solution can only exist when α = β or α + β = 1, i.e. when the point v1 lies on one of the diagonals of the square, formed by u1 , u2 , u3 and u4 . The solution is g=
α)=α
(1
2
α=(1
α) α=(1
α) 1
(1
α)=α 1
T
(10.5)
when α = β and when α + β = 1 it is g=
2
(1
α)=α
α=(1
α) 1
(1
α)=α 1
α=(1
α)
T
:
(10.6) The truncation error is determined by substitution of this solution in (10.2): L(α; β) = h2 (uxx + uyy ) =2 + Cαh3 (uxxx + uyyy ) =6 + O (h4);
(10.7)
where Cα = 1 2α in case α = β and Cα = 2α 1 when α + β = 1. Hence an O (h) approximation to uxx + uyy can be obtained, an O (h2 ) approximation can only be obtained when α = β = 1=2: L(α; β) = h2 (uxx + uyy ) =2 + h4 (uxxxx + 6uxxyy + uyyyy ) =96 + O (h6):
(10.8)
In summary, a consistent discretisation exists only if v1 is in the centre or on one of the diagonals of the square formed by u1 , u2 , u3 and u4 . The truncation error is O(h2 ) when v1 is in the centre and is O(h) when v1 is on one of the diagonals.
10.3 Error Analysis We prove second order accuracy in the L∞ norm for the standard five-point stencil with the coupling (10.8) on the nonmatching grids shown in Fig. 10.1. The domain Ω = Ω0 [ Ω1 consists of two subdomains Ω0 = ( 1; 0) (0; 1) and Ω1 = ( h1 ; 1) (0; 1). The mesh sizes satisfy h0 = 2h1 and we define h = h0 . We denote by JΩ the set of the index pairs of grid points in the domain Ω. Theorem 10.3.1 The standard five-point stencil with the coupling (10.8) results in a second order scheme on Ω:
je pj C (Mxxxx + 6Mxxyy + Myyyy) h2 + O (h4) 8 p 2 JΩ
;
(10.9)
where the constant C = CT CΦa is the product of CT , the largest of the constants in the bounds (8.29) and (10.12) on the truncation error, and the constant CΦa = maxa2J∂Ω Φa , which denotes the maximum of the function Φ, defined by (10.10), over the grid points where a Dirichlet boundary condition is given.
138
CHAPTER 10. FINITE DIFFERENCE COUPLING
Proof. The proof consists in showing that the conditions for Theorem 8.4.1 are satisfied. The standard five-point stencil satisfies the assumptions for the maximum principle. In case (10.8) is used to obtain an equation for v1 , the assumptions (8.4.1) and (8.4.2) for the maximum principle are satisfied. The comparison function is chosen as Φ(x; y) =
1 4
(x
x0)2 + (y
y0)2
(10.10)
resulting in
Lh Φ p = 1 for all p 2 JΩ .
(10.11)
Hence we can set C1 = C2 = 1 in (8.18). The truncation error α of the standard five-point stencil (8.26) is bounded by (8.29). The truncation error α of the coupling equation (10.8) is bounded by T2 =
h2 (Mxxxx + 6Mxxyy + Myyyy ) : 96
(10.12)
Since the Dirichlet boundary conditions are given without error, we have that maxa2J∂Ω jea j = 0: The desired results follow from (8.20).
(10.13)
Remark 10.3.1 The scalars x0 and y0 are chosen to minimise the maximum value of this function Φ(x; y) on the boundary ∂Ω. If the point v1 is not in the centre of the square formed by u1 , u2 , u3 and u4 , we have to use (10.7) to obtain an equation from which v1 can be determined. In this case we still have second order accuracy, but a different comparison function must be defined in Ωl . This is analogous to the classic result that second order accuracy is obtained with a second order discretisation of the partial differential equation and only first order discretisation of the boundary conditions.
10.4 Numerical Results The testcases are concerned with the solution of ∇2 u = f on Ω and u = g on ∂Ω.
(10.14)
The domain Ω = Ω0 [ Ω1 consists of two subdomains Ω0 = ( 1; 0) (0; 1) and Ω1 = ( h1 ; 1) (0; 1). The coordinates of the grid points are (xi ; y j ), where xi = xref + ih and y j = yref + jh, for i = 0; 1; : : : ; (n0 1) for Ω0 ; i = 0; 1; : : : ; n1 for Ω1
10.4. NUMERICAL RESULTS
139
Table 10.1: Norm of the error in the domains Ω0 and Ω1 for the function u1 . Results for h0 = 2h1 . n0 n1 L∞ in Ω0 ratio L∞ in Ω1 ratio 6 11 1.73e-3 1.60e-3 11 21 4.93e-4 3.52 4.39e-4 3.65 21 41 1.29e-4 3.83 1.14e-4 3.87 41 81 3.28e-5 3.92 2.87e-5 3.95 81 161 8.28e-6 3.96 7.22e-6 3.98 n0 n1 L2 in Ω0 ratio L2 in Ω1 ratio 6 11 6.62e-4 5.13e-4 11 21 1.94e-4 3.41 1.37e-4 3.73 21 41 5.20e-5 3.73 3.51e-5 3.91 41 81 1.34e-5 3.87 8.84e-6 3.97 81 161 3.41e-6 3.94 2.22e-6 3.99 Reference results for h0 = h1 . n0 n1 L∞ in Ω0 L2 in Ω0 L∞ in Ω1 L2 in Ω1 6 6 2.88e-3 1.09e-3 2.88e-3 1.17e-3 11 11 7.37e-4 2.81e-4 7.37e-4 3.00e-4 21 21 1.85e-4 7.12e-5 1.85e-4 7.42e-5 41 41 4.64e-5 1.79e-5 4.64e-5 1.83e-5 81 81 1.16e-5 4.50e-6 1.16e-5 4.55e-6 161 161 2.90e-6 1.13e-6 2.90e-6 1.13e-6
and j = 0; 1; : : : ; (n 1). The reference point for Ω0 is ( 1; 0) and for Ω1 it is ( h1 ; 0). The grid sizes are h0 = 1=(n0 1) and h1 = 1=(n1 1). The interface Γ is defined by x = h1 =2. The right-hand side f and the boundary conditions g are chosen so that the exact solution is u1 resp. u2 in the testcases, where u1 (x; y) = exp( x2
y2 )
(10.15)
and u2 (x; y) = exp(αx) sin(βy);
(10.16)
where α = 2 and β = 8π. By definition the error is ei; j = u(xi ; y j ) ui; j where u(xi ; y j ) is the exact solution and ui; j is the computed approximation. In Tables 10.1–10.3 we list both the L∞ norm and L2 norm of the error, defined by L∞ (e) = maxi; j jei; j j
(10.17)
140
CHAPTER 10. FINITE DIFFERENCE COUPLING
Table 10.2: Norm of the error in the domains Ω0 and Ω1 for the function u2 . Results for h0 = 2h1 . n0 n1 L∞ in Ω0 ratio L∞ in Ω1 ratio 6 11 7.24 4.22 11 21 4.87e-1 14.9 7.19e-1 5.87 21 41 1.03e-1 4.74 1.74e-1 4.14 41 81 2.49e-2 4.12 4.54e-2 3.83 81 161 6.56e-3 3.80 1.13e-2 4.01 n0 n1 L2 in Ω0 ratio L2 in Ω1 ratio 6 11 3.20 1.66 11 21 2.04e-1 15.7 3.03e-1 5.48 21 41 4.36e-2 4.68 7.49e-2 4.05 41 81 1.07e-2 4.06 1.88e-2 3.98 81 161 2.70e-3 3.98 4.73e-3 3.98 Reference results for h0 = h1 . n0 n1 L∞ in Ω0 L2 in Ω0 L∞ in Ω1 L2 in Ω1 6 6 18.1 6.69 47.2 21.4 11 11 7.17e-1 2.64e-1 3.37 1.41 21 21 1.37e-1 5.03e-2 7.19e-1 3.02e-1 41 41 3.21e-2 1.18e-2 1.74e-1 7.43e-2 81 81 8.32e-3 2.90e-3 4.54e-2 1.87e-2 161 161 2.07e-3 7.24e-4 1.13e-2 4.70e-3 and
v u n 1n 1 u1 L2 (e) = t e2
n2
∑∑
i; j :
(10.18)
i=0 j =0
The results for ∇2 u1 = f1 on a nonmatching grid are given in Table 10.1. We also give the results for the same problem on matching grids. This allows us to verify the accuracy of the results. In Table 10.2 we give the results for ∇2 u2 = f2 . The ratios in the fourth and sixth columns approach 4 as the mesh widths are divided by 2, showing that the method is second order accurate. To emphasize the importance of consistent interpolation, we give in Table 10.3 the results for ∇2 u2 = f2 when bilinear interpolation is used. In this case (8.22) is not satisfied. The results are for the same problem but solved on shifted grids. Recall that we denote by Ωl and Ωr the small subdomains containing the grid points next to the interface, on the left resp. right side as defined in Section 7.2. The duplications of these subdomains are denoted by Ωl˜ and Ωr˜ . The reason for using shifted grids is that an interpolation is required for every point in Ωr˜ , while for grids as in Fig. 10.1 only half of the points in Ωr˜ require an interpolation
10.5. CONCLUSION
141
Table 10.3: Norm of the error in the domains Ω0 and Ω1 for the function u2 , when bilinear interpolation is used. n0 7 12 22 42 82
n1 12 22 42 82 162
L∞ in Ω0 9.64 8.47e-1 1.43e-1 3.16e-2 7.10e-3
L∞ in Ω1 7.65 9.95e-1 2.20e-1 4.97e-2 1.19e-2
L2 in Ω0 4.32 2.62e-1 4.90e-2 1.12e-2 2.72e-3
L2 in Ω1 2.77 3.67e-1 8.22e-2 1.97e-2 4.86e-3
since the other points match some point in Ωr . The coordinates of the grid points are now given by (xi ; y j ) where xi = xref + (i 12 )h and y j = yref + ( j 12 )h for i = 0; 1; : : : ; (n 1) and j = 0; 1; : : : ; (n 1). The reference point for Ω0 is ( 1; 0) and for Ω1 it is (0; 0). The grid sizes h0 = 1=(n0 2) and h1 = 1=(n1 2) are the same as in the previous case. Since an inconsistent interpolation is used, the error is larger.
10.5 Conclusion We studied an overlapping nonmatching grids finite difference method for a twodimensional Poisson problem and proposed a finite difference coupling. This technique is an alternative to using an interpolation formula for the information transfer from one grid to the other grid. This new approach has a very limited applicability since a consistent discretisation exists only if the point on the boundary for which a coupling equation is sought, is in the centre or on one of the diagonals of the square formed by the neighbouring grid points in the other mesh. We proved second order global accuracy of the discretisation scheme.
Chapter 11
Conclusions and Suggestions for Future Research Renton shrugged. --- At least we’ll be prepared, whaeivir the fuck it is. If they gave oot qualifications in bereavement, ah’d be a fuckin Ph.D. by now.1
11.1 Conclusion The contribution of this thesis is discussed in detail in Section 1.2, where the overview is given. In the context of the application considered in this thesis, the first goal of improving the convergence of the iterative solver, has clearly been reached. Our solver for the Shallow Water Equations based on FGMRES and the Generalised Additive Schwarz Method performs very well. The numerical methods introduced in this thesis for this application have been integrated in the software developed by Waterloopkundig Laboratorium Delft Hydraulics (Delft, The Netherlands) for the solution of the Shallow Water Equations on real-life applications. The research on Ritz and Harmonic Ritz values in Chapter 4, and on nested Krylov methods in Chapter 5, resulted in new insights, e.g. relations in Simpler GMRES and a better selection for the starting vector in nested Krylov methods. However, these new insights can only improve the performance of the algorithms, in case the iteration stagnates. 1 Irvine
Welsh, Trainspotting, p. 299
143
144
CONCLUSIONS AND SUGGESTIONS
For the problem addressed in the second part of the thesis, the extension of the method to nonmatching grids, we have, at least for 2D elliptic problems, presented a satisfactory solution. Our lower dimensional interpolation formula and modified discretisation stencil result in a consistent and second order accurate global discretisation. We have also investigated alternative techniques. The results in Chapter 9 show that using the mortar projection in an overlapping composite mesh difference method has a performance similar to bilinear interpolation. However, since the mortar projection is more difficult to implement, using the mortar projection as an interpolation technique is ruled out. The finite difference discretisation coupling technique in Chapter 10 is unacceptable because of its limited applicability.
11.2 Suggestions for Future Research Since there is no best overall Krylov subspace method, the future research in this area should be concentrated on preconditioning strategies. Several successful techniques such as domain decomposition and multilevel methods, sparse approximate inverses, incomplete factorisation, etc. already available. In the ideal scheme for the solution of partial differential equations on nonmatching grids, which is yet to be discovered, the accuracy should be of optimal order and the error be independent of the size of the overlap and the ratio of mesh sizes. To be able to use fast solvers for the subdomain problems, it is also desirable not to have weights in the discretisation on the overlapping parts of subdomains. A lot of research in this area remains to be done. We mention a few research topics. It would be interesting to extend our lower dimensional interpolation and modified stencils technique to grids that do not have parallel grid lines and to grids built with elliptic grid generation. We believe that for 3D applications this technique will prove to be very advantageous, since 3D cubic interpolation requires 64 points, while 2D cubic interpolation only requires 16 points. The question remains of course whether we can modify the discretisation stencil so that 1D interpolation is sufficient for a 3D problem. Our experiments in Chapter 9 reveal the effect using the mortar projection along the interface as an interpolation technique in a composite mesh difference method. We suggest another experiment to gain more insight in the mortar element method, which is to keep the weights in the bilinear form and replace the mortar projection with bilinear interpolation.
Appendix A
Ritz Values and FOM Residual Polynomial I think sex is better than logic, but I can’t prove it.
A.1 Proof of Lemma 4.2.1 j 1
Proof. We define u1 = e1 and u j = Hm u j 1 = Hm u1 for j = 2; 3; : : : ; m. Since Hm is upper Hessenberg u j has nonzero entries only in its first j positions, consequently eH m u j = 0 for j < m. Using (2.16) we have for j < m AVm e1
=
j
=
m
=
A Vm e1 A Vm e1
= =
Vm Hm + hm+1;m vm+1 eH m e1 = Vm Hm e1 = Vm u2 AVm u j = Vm Hm u j = Vm Hmj e1 = Vm u j+1 AVm um = Vm Hm + hm+1;m vm+1 eH m um Vm Hm um + hm+1;m vm+1 eH m um H m Vm Hm e1 + hm+1;m vm+1 em um
(A.1) (A.2)
(A.3)
A straightforward computation shows that hm+1;m eH m um = ζm and we can conclude that χm (A)Vm e1 = Vm χm (Hm )e1 + γm ζm vm+1 : This completes the proof since r0 = Vm e1 . 145
(A.4)
146
APPENDIX A. RITZ VALUES AND FOM RESIDUAL POLYNOMIAL
The last nonzero entry in u j+1 is in position j + 1 and equals ζj
6
= h j+1; j : : : h3;2 h2;1 = 0:
(A.5)
Hence from (A.2) we can conclude that A jVm e1 = Vm u j+1 = ζ j v j+1 + vˆ j+1
(A.6)
with vˆ j+1 2 SPANfv1 ; v2 ; : : : ; v j g for j = 1; 2; : : : ; m.
A.2
Proof of Lemma 4.2.2
Proof. Let the polynomial χ j (λ) = γ˜ j λ j + + γ˜ 1 λ + γ˜ 0 have strict degree j < m. We know from (A.6) that χ j (A)r0 has a nonzero component γ˜ j ζ j v j+1 along v j+1 2 K m (A; r0 ) and thus cannot be orthogonal to K m (A; r0 ). Since ψm (λ) = det (λI Hm ) = γm λm + + γ1 λ + γ0 is the characteristic polynomial of Hm , we have by the Cayley-Hamilton theorem that ψm (Hm ) = 0. Setting the polynomial χm (A) = αψm (A) in (4.6) we deduce that αψm (A)r0 = αγm ζm vm+1 is orthogonal to K m (A; r0 ). Any polynomial ϕm (λ) of degree m that is not a scalar multiple of ψm (λ) can be written as ϕm (λ) = αψm (λ) + χ j (λ) with χ j (λ) a nonzero polynomial of degree j < m. We have that ϕm (A)r0 = αψm (A)r0 + χ j (A)r0 = αγm ζm vm+1 + χ j (A)r0 which is not orthogonal to K
A.3
(A.7)
m (A; r0 ).
Proof of Theorem 4.2.1
Proof. Since the FOM residual polynomial ϕ˜ FOM (λ) = γm λm + + γ1 λ + γ0 has m FOM degree m (4.6) yields an expression for r 2 K m+1 (A; r0 ) ˜ FOM rFOM = ϕ˜ FOM (A)r0 = Vm ϕ (Hm )e1 + γm ζm vm+1 : m m
(A.8)
(Hm ) = 0 is necessary to have rFOM ? K m (A; r0 ). By lemma 4.2.2 Thus ϕ˜ FOM m we know that the FOM residual polynomial ϕ˜ FOM (λ) = αψm (λ) has to be a scalar m multiple of the characteristic polynomial of Hm in order to eliminate all the components of the residual in K m (A; r0 ). If Hm is nonsingular then ψm (0) = det Hm 6= 0 (0) = αψm (0) = 1. We obtain the following and α can be determined from ϕ˜ FOM m expression for the residual polynomial (λ) = ϕ˜ FOM m
This completes the proof.
ψm (λ) : ψm (0)
(A.9)
A.3. PROOF OF THEOREM 4.2.1
147
The requirement that Hm has to be nonsingular is related to possible breakdown of FOM. The existence of Hm 1 is a necessary and sufficient condition for the existence and the uniqueness of xFOM and as a consequence also for the existence m FOM and the uniqueness of the corresponding residual rm as can be seen from (4.2). FOM The FOM residual polynomial ϕ˜ m (λ) is uniquely defined by the normalisa(m) tion ϕ˜ FOM (0) = 1 and by the fact that the m Ritz values ϑi are its zeroes m (λ ) = ϕ˜ FOM m
m
∏ i=1
1
λ (m)
ϑi
! :
(A.10)
Appendix B
Convection–Diffusion Problem Mathematicians are like Frenchmen: whenever you say something to them, they translate it into their own language, and at once it is something entirely different.1
In this appendix we briefly outline the theory underlying the finite element approximation of the convection–diffusion equation. A more elaborate treatment of this subject can be found in Chapter 8 of the book by Quarteroni and Valli [QV94]. The description given here is a short version of the introductions by Elman et al. [ESW96] and Fisher et al. [FRSW99].
B.1 Formulation and Discretisation The convection–diffusion problem can be formulated as follows: given a divergence-free ∇~w = 0 convective velocity field ~w, find a scalar variable u (the transported quantity) satisfying the convection–diffusion equation ν∇2 u + ~w∇u = f in Ω,
(B.1)
with a Dirichlet boundary condition u(x) = g(x) on ∂Ω. The weak formulation of (B.1) is defined in terms of the Sobolev space X = H01 (Ω), the set of functions with derivatives in L2 (Ω) and which are zero on ∂Ω. The solution u satisfies a(u; v) = ( f ; v) 1 Goethe
149
8v 2 X
;
(B.2)
150
APPENDIX B. CONVECTION–DIFFUSION PROBLEM
where the bilinear form a(u; v) is defined as a(u; v) = ν(∇u; ∇v) + (~w∇u; v)
(B.3)
and (u; v) denotes the usual scalar L2 (Ω) inner product Z u(x)v(x) dx: (u; v) = (u; v)L2 (Ω) = Ω
(B.4)
Since the domain Ω is bounded and the velocity field ~w is divergence-free, the bilinear form a(u; v) is bounded (or continuous)
ja(u v)j Cw jj∇ujj jj∇vjj
(B.5)
a(u; u) = νjj∇ujj2
(B.6)
;
and coercive over X
8u 2 X .
The continuity constant Cw is given by Cw = ν + CΩ jjwjjL∞ (Ω) ;
(B.7)
where CΩ is the Poincar´e constant associated with the domain Ω. Existence and uniqueness of the solution to (B.1) follows from the Lax–Milgram theorem. A finite dimensional subspace Xh X with mesh parameter h is selected to obtain a discrete system by enforcing (B.2) over Xh . We are now looking for the function uh such that uh = gh on ∂Ω, which solves the set of equations a(uh ; v) = ( fh ; v)
8v 2 Xh
(B.8)
;
where fh is the L2 (Ω) projection of f onto Xh and gh is the interpolant of the boundary data g. The function uh is uniquely defined since a conforming discretisation is being used. For the problem with homogeneous Dirichlet boundary conditions, i.e. g = 0, (B.5) and (B.6) imply the following a priori estimate
jj∇(u
uh)jj
Cw infv2Xh jj∇(u ν
v)jj:
(B.9)
The finite element approximation is of optimal order as the mesh parameter h ! 0 tends to zero. The stability depends on the ratio Cw =ν. If the mesh Peclet number Pe =
hjj~wjj2 2ν
(B.10)
is larger than 1, oscillatory solutions are observed, e.g. if the mesh is unable to resolve boundary layers. When the convection dominates, the discrete solution inherits the instability from (B.2).
B.2. TESTCASE
151
Instead of adaptively refining the mesh until the boundary layers are captured, one may choose to stabilise the discrete system using some form of upwinding discretisation. In a finite element method this is achieved by means of the Petrov– Galerkin framework. A “shifted” nonconforming test space is used a(uh ; v + δ~w∇v) = ( fh ; v + δ~w∇v)
8v 2 Xh
;
(B.11)
where δ is the upwinding parameter. We use a linear approximation space (Q1 finite elements) and with standard element-wise evaluation of the nonconforming term this results in the so-called streamline-diffusion method b = (uh ; v) = a(uh ; v) + (~w∇uh ; δ~w∇v) = ( fh ; v + δ~w∇v)
8v 2 Xh .
(B.12)
Due to the additional coercivity in the flow direction b = (u; u) = νjj∇ujj2 + δjj~w∇ujj2
8u 2 Xh
(B.13)
this formulation has better stability properties. This stabilised formulation is also consistent in the sense that the exact solution of the PDE (B.1) satisfies (B.11). The solution of (B.11) satisfies the “best possible” error estimate (for any degree of polynomial approximation) for δ=
αh jj~wjj ;
(B.14)
where α is a parameter. If the discretised problem is diffusion-dominated, i.e. the mesh Peclet number Pe 1 is small, the best choice is the standard Galerkin approach, corresponding to α = 0. The choice of α is guided by the following two considerations. Firstly, it is very easy to over-stabilise, resulting in smooth but inaccurate solutions and secondly, the performance of iterative solvers applied to (B.12) will be influenced by the choice of the parameter α. For a constant velocity field ~w, Fisher et al. [FRSW99] have determined an optimal value α =
1 1 2
1 Pe
;
(B.15)
which minimises the contraction rate of iterative solvers applied to (B.12). Their analysis is based on Fourier analysis.
B.2 Testcase A 2D convection–diffusion problem with Dirichlet boundary conditions is solved on a uniform square grid with mesh width h = 1=32. We prescribe u = 1 on the right half of the bottom boundary and on the right-hand wall and p u = 0 on p the remainder of the boundary. The constant velocity field ~w = 1= 2 1= 2
APPENDIX B. CONVECTION–DIFFUSION PROBLEM
152
1 1.5 0.8 1 0.6 0.5 0.4 0 1
0.2
0
0.5 0
0
0.2
0.4
0.6
0.8
0.5
1 0
1
Figure B.1: Solution of the convection–diffusion problem described in Sec. B.2. has a 45Æ inclination with the grid lines. The diffusion parameter was set to ν = 10 2 , resulting in a mesh Peclet number of Pe = 1:5625, which shows that the convection is dominant. We have used the optimal value for the parameter α = 0:18 in the discretisation equations. The solution is shown in Fig. B.1. The iterative GMRES solver converged in n = 62 iterations, satisfying the convergence criterion
jjrn jj 10 jjr1 jj
12
;
(B.16)
where ri = b Axi is the residual corresponding to the approximate solution xi the iterative solver produced in iteration i. The relative accuracy of the solution obtained with the iterative GMRES solver xGMRES xLU 12 = 5:5 10 (B.17) xLU is of the same order of magnitude as the threshold in the convergence criterion. We denote by xGMRES the solution obtained with GMRES and by xLU the solution obtained by a direct method using LU factorisation, which can be considered the exact solution up to machine precision.
B.3 Software We have used the software package, written by Elman et al. in M ATLAB, for solving the convection–diffusion problem as described here. This software is available from ftp://ftp.ma.man.ac.uk/pub/narep/convdiff.tar.
B.3. SOFTWARE
153
GMRES convergence history 0
10
−2
10
−4
scaled residual norm
10
−6
10
−8
10
−10
10
−12
10
0
10
20
30 iterations
40
50
60
Figure B.2: GMRES convergence history for the solution of the convection– diffusion problem described in Sec. B.2.
Bibliography [Arn51]
W. E. Arnoldi. The principle of minimized iterations in the solution of the matrix eigenvalue problem. Quart. Appl. Math., 9(1):17–29, 1951.
[AV91]
O. Axelsson and P. S. Vassilevski. A black box Generalized Conjugate Gradient solver with inner iterations and variable-step preconditioning. SIAM J. Matrix Anal. Applics., 12(4):625–644, 1991.
[BBC+ 94]
R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst. Templates for the solution of linear systems: building blocks for iterative methods. SIAM, 1994.
[BEK98]
P. Bjørstad, M. Espedal, and D. Keyes, editors. Domain Decomposition Methods in Sciences and Engineering: 9th International Conference. Domain Decomposition Press, 1998.
[Ben82]
Benqu´e et al. New method for tidal current computation. Journal of the Waterway, Port, Coastal and Ocean division, ASCE, 108:396– 417, 1982.
[BMP94]
C. Bernardi, Y. Maday, and A. Patera. A new non conforming approach to domain decomposition: The mortar element method. In H. Brezis and J.-L. Lions, editors, Coll`ege de France Seminar, pages 13 – 51. Pitman, 1994.
[BPWX91]
J. H. Bramble, J. E. Pasciak, J. Wang, and J. Xu. Convergence estimates for multigrid algorithms without regularity assumptions. Math. Comp., 57(195):23–45, 1991.
[BPX90]
J. H. Bramble, J. E. Pasciak, and J. Xu. Parallel multilevel preconditioners. Math. Comp., 55:1–22, 1990. 155
BIBLIOGRAPHY
156
[Bra72]
A. Brandt. Multi-level adaptive techniques (MLAT) for fast numerical solution to boundary value problems. In Proc. 3rd Internat. Conf. on Numer. Method in Fluid Mechanics, pages 82–89. Springer, 1972.
[Bra77]
A. Brandt. Multi-level adaptive solutions to boundary value problems. Numer. Math., 31:333–390, 1977.
[Bra93]
J. H. Bramble. Multigrid Methods. Longman Scientic & Technical, 1993. Pitman Research Notes in Mathematics Series #294.
[Bri87]
W. Briggs. A Multigrid Tutorial. SIAM, 1987.
[Bro91]
P. N. Brown. A Theoretical Comparison of the Arnoldi and GMRES algorithms. SIAM J. Sci. Stat. Comput., 12(1):58–78, 1991.
[Bru95]
A. M. Bruaset. A survey of preconditioned iterative methods. Pitman Research Notes in Mathematics Series. Longman Scientific & Technical, 1995.
[Cai93]
X.-C. Cai. An optimal two-level overlapping domain decomposition method for elliptic problems in two and three dimensions. SIAM J. Sci. Comp., 14:239–247, January 1993.
[CDS99]
X.-C. Cai, M. Dryja, and M. V. Sarkis. Overlapping nonmatching grid mortar element methods for elliptic problems. SIAM J. Numer. Anal., 36:581–606, 1999.
[CG96]
J. Cullum and A. Greenbaum. Relations between Galerkin and Norm-minimising iterative methods for solving linear systems. SIAM J. Matrix Anal. Applics., 17(2):223–246, 1996.
[CGPW89]
T. F. Chan, R. Glowinski, J. P´eriaux, and O. B. Widlund, editors. Proc. Second Int. Conf. on Domain Decomposition Meths. SIAM, 1989.
[CGPW90]
T. F. Chan, R. Glowinski, J. P´eriaux, and O. B. Widlund, editors. Proc. Third Int. Conf. on Domain Decomposition Meths. SIAM, 1990.
[CH90]
G. Chesshire and W. D. Henshaw. Composite Overlapping Meshes for the Solution of Partial Differential Equations. J. Comp. Phys., 90:1–64, 1990.
[CKKP00]
T. Chan, T. Kako, H. Kawarada, and O. Pironneau, editors. Domain Decomposition Methods in Sciences and Engineering: 12th International Conference. Domain Decomposition Press, 2000.
BIBLIOGRAPHY
157
[CM94]
T. F. Chan and T. P. Mathew. Domain decomposition algorithms. In Acta Numerica 94, pages 61–143. Cambridge University Press, 1994.
[CMdP84]
R. C. Y. Chin, T. A. Manteuffel, and J. de Pillis. ADI as a preconditioning for solving the convection–diffusion equation. SIAM J. Sci. Stat. Comput., 5(2):281–299, 1984.
[CMS00]
X.-C. Cai, T. P. Mathew, and M. V. Sarkis. Maximum norm analysis of overlapping nonmatching grid discretizations of elliptic equations. SIAM J. Numer. Anal., 2000. To appear.
[CS97]
A. Chapman and Y. Saad. Deflated and Augmented Krylov Subspace Techniques. Numerical Linear Algebra with Applications, 4(1):43–66, 1997.
[dG93]
E. D. de Goede. Een AOI methode voor TRISULA. Technical Report VR595.93/Z642, Delft Hydraulics, September 1993.
[dG95]
E. D. de Goede. TRISULA summary. VR1108.95/Z961, Delft Hydraulics, 1995.
[dGTB+ 96]
E. D. de Goede, K. H. Tan, M. J. A. Borsboom, I. Elshoff, and G. S. Stelling. Domain decomposition for TRISULA and DELWAQ. Technical Report Z921/X142, Delft Hydraulics, 1996.
[dS96a]
E. de Sturler. Nested Krylov Methods based on GCR. Journal of Computational and Applied Mathematics, 67:15–41, 1996.
[dS96b]
E. de Sturler. Truncation Strategies for Optimal Krylov Subspace Methods. Technical Report TR-96-38, Swiss Center for Scientific Computing, 1996.
[DW87]
M. Dryja and O. B. Widlund. An additive variant of the Schwarz alternating method for the case of many subregions. Technical Report 339, also Ultracomputer Note 131, Courant Institute, New York University, 1987.
[DW91]
M. Dryja and O. B. Widlund. Multilevel additive methods for elliptic finite element problems. In W. Hackbusch, editor, Parallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar, Kiel, January 19–21, 1990, pages 58–69. Vieweg & Son, 1991.
[EBP94]
J. Erhel, K. Burrage, and B. Pohl. Restarted GMRES preconditioned by deflation. Technical Report 94-04, Seminar f¨ur Angewandte Mathematik, Eidgen¨ossische Technische Hochschule, 1994.
Technical Report
158
BIBLIOGRAPHY
[EES83]
S. C. Eisenstat, H. C. Elman, and M. H. Schultz. Variational iterative methods for nonsymmetric systems of linear equations. SIAM J. Numer. Anal., 20(2):345–361, 1983.
[ESW96]
H. C. Elman, D. J. Silvester, and A. J. Wathen. Iterative Methods for Problems in Computational Fluid Dynamics. Technical Report 96/19, Oxford University Computing Laboratory, 1996.
[FGN92]
R. W. Freund, G. H. Golub, and N. M. Nachtigal. Iterative Solution of Linear Systems, pages 57–100. Acta Numerica 1992. Cambridge University Press, 1992.
[Fis96]
B. Fischer. Polynomial Based Iteration Methods for Symmetric Linear Systems. John Wiley & Sons, Ltd., 1996.
[FM84]
V. Faber and T. Manteuffel. Necessary and sufficient conditions for the existence of a conjugate gradient method. SIAM J. Numer. Anal., 21(2):352–362, 1984.
[FN91]
R. W. Freund and N. M. Nachtigal. QMR: A quasi-minimal residual algorithm for non-hermitian linear systems. Numer. Math., 60:315– 339, 1991.
[Fre92]
R. W. Freund. Quasi-kernel polynomials and their use in nonHermitian matrix iterations. Journal of Computational and Applied Mathematics, 43:135–158, 1992.
[Fre93]
R. W. Freund. A Transpose-free Quasi-minimal Residual Algorithm for non-hermitian Linear Systems. SIAM J. Sci. Comput., 14(2):470–482, 1993.
[FRSW99]
B. Fischer, A. Ramage, D.J. Silvester, and A.J. Wathen. On parameter choice and iterative convergence for stabilised discretisations of advection-diffusion problems. Computer Methods in Applied Mechanics and Engineering, 179:185–202, 1999.
[GC99]
S. Goossens and X.-C. Cai. Lower Dimensional Interpolation in Overlapping Composite Mesh Difference Methods. In C-H. Lai, P. Bjørstad, M. Cross, and O. Widlund, editors, Domain Decomposition Methods in Sciences and Engineering: 11th International Conference, pages 248–255. Domain Decomposition Press, 1999.
[GC00]
S. Goossens and X.-C. Cai. Mortar Projection in Overlapping Composite Mesh Difference Methods. In T. Chan, Kako T., H. Kawarada, and O. Pironneau, editors, Domain Decomposition Methods in Sciences and Engineering: 12th International Conference. Domain Decomposition Press, 2000. To appear.
BIBLIOGRAPHY
159
[GCR98]
S. Goossens, X.-C. Cai, and D. Roose. Overlapping Nonmatching Grids Method: some Preliminary Studies. In J. Mandel, Ch. Farhat, and X-C. Cai, editors, Domain Decomposition Methods 10, Contemporary Mathematics, pages 254–261. AMS, 1998.
[GGMP88]
R. Glowinski, G. H. Golub, G. A. Meurant, and J. P´eriaux, editors. Proc. First Int. Conf. on Domain Decomposition Meths. SIAM, 1988.
[GKM+ 91]
R. Glowinski, Yu. A. Kuznetsov, G. A. Meurant, J. P´eriaux, and O. B. Widlund, editors. Proc. Fourth Int. Conf. on Domain Decomposition Meths. SIAM, 1991.
[Goo98]
S. Goossens. Extending Nested Krylov Subspace Methods. Fifth Copper Mountain Conference on Iterative Methods, March 30 – April 3, 1998.
[GPS96]
A. Greenbaum, V. Pt´ak, and Z. Strako˘s. Any nonincreasing convergence curve is possible for GMRES. SIAM J. Matrix Anal. Applics., 17(3):465–469, 1996.
[GPSW96]
R. Glowinski, J. P´eriaux, Z.-C. Shi, and O. B. Widlund, editors. Proc. Eighth Int. Conf. on Domain Decomposition Meths. Wiley and Sons, 1996.
[GR99]
S. Goossens and D. Roose. Ritz and Harmonic Ritz Values and the Convergence of FOM and GMRES. Numerical Linear Algebra with Applications, 6(4):281–293, 1999.
[Gre97]
A. Greenbaum. Iterative Methods for Solving Linear Systems, volume 17 of Frontiers in Applied Mathematics. SIAM, 1997.
[Gri94]
M. Griebel. Multilevelmethoden als Iterationsverfahren u¨ ber Erzeugendensystem. Teubner Skripten zur Numerik. B.G. Teubner, 1994.
[GT97]
S. Goossens and K. Tan. Krylov-Schwarz-Krylov methods for DELFT 3 D - FLOW . Technical Report Z2041, Delft Hydraulics, 1997.
[GTR98a]
S. Goossens, K. Tan, and D. Roose. An efficient FGMRES solver for the Shallow Water Equations based on Domain Decomposition. In P. Bjørstad, M. Espedal, and D. Keyes, editors, Domain Decomposition Methods in Sciences and Engineering: 9th International Conference, pages 350–358. Domain Decomposition Press, 1998.
[GTR98b]
S. Goossens, K. Tan, and D. Roose. A Krylov-Schwarz iterative solver for the Shallow Water Equations. Physics and Chemistry of the Earth, 23(5/6):485–490, 1998.
160
BIBLIOGRAPHY
[Gut93]
M. Gutknecht. Variants of Bi-CGStab for matrices with complex spectrum. SIAM J. Sci. Comput., 14(5):1020–1033, 1993.
[GVL92]
G. H. Golub and C. F. Van Loan. Matrix Computations. The John Hopkins University Press, second edition, 1992.
[Hac85]
W. Hackbusch. Multigrid Methods and Applications. Springer, 1985.
[Hir95]
Ch. Hirsch. Numerical Computation of Internal and External Flows, Volume 1: Fundamentals of Numerical Discretization. Wiley Series in Numerical Methods in Engineering. John Wiley & Sons, 1995.
[HS52]
M.R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand., 49:406–436, 1952.
[HTJ88]
T. Hagstrom, R. P. Tewarson, and A. Jazcilevich. Numerical experiments on a Domain Decomposition algorithm for nonlinear elliptic boundary value problems. Appl. Math. Lett., 1(3):299–302, 1988.
[HV97]
M. Holst and S. Vandewalle. Schwarz Methods: to symmetrize or not to symmetrize. SIAM J. Numer. Anal., 34(2):699–722, 1997.
[Jap98]
C. Japhet. Optimized Krylov-Ventcell method. Application to convection-diffusion problems. In P. Bjørstad, M. Espedal, and D. Keyes, editors, Domain Decomposition Methods in Sciences and Engineering: 9th International Conference, pages 382–389. Domain Decomposition Press, 1998.
[Jou94a]
W. Joubert. On the Convergence Behavior of the Restarted GMRES Algorithm for Solving Nonsymmetric Linear Systems. Numerical Linear Algebra with Applications, 1(5):427–447, 1994.
[Jou94b]
W. Joubert. A robust GMRES-based adaptive polynomial preconditioning algorithm for nonsymmetric linear systems. SIAM J. Sci. Comput., 15(2):427–439, 1994.
[KCM+ 92]
D. E. Keyes, T. F. Chan, G. A. Meurant, J. S. Scroggs, and R. G. Voigt, editors. Proc. Fifth Int. Conf. on Domain Decomposition Meths. SIAM, 1992.
[KX95]
D. E. Keyes and J. Xu, editors. Proc. Seventh Int. Conf. on Domain Decomposition Meths. Number 180 in Contemporary Mathematics. AMS, 1995.
BIBLIOGRAPHY
161
[Lan50]
C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Nat. Bur. Stand., 45:255–282, 1950.
[Lan52]
C. Lanczos. Solutions of linear equations by minimized iterations. J. Res. Nat. Bur. Stand., 49:33–53, 1952.
[LBCW99]
C-H. Lai, P. Bjørstad, M. Cross, and O. Widlund, editors. Domain Decomposition Methods in Sciences and Engineering: 11th International Conference. Domain Decomposition Press, 1999.
[Lio78]
P. L. Lions. Interpr´etation stochastique de la m´ethode altern´ee de Schwarz. C. R. Acad. Sci. Paris, 268:325–328, 1978.
[Lio88]
P. L. Lions. On the Schwarz alternating method. I. In R. Glowinski, G. H. Golub, G. A. Meurant, and J. P´eriaux, editors, First International Symposium on Domain Decomposition Methods for Partial Differential Equations, pages 1–42. SIAM, 1988.
[Lio89]
P. L. Lions. On the Schwarz alternating method. II. In T. F. Chan, R. Glowinski, J. P´eriaux, and O. B. Widlund, editors, Domain Decomposition Methods, pages 47–70. SIAM, 1989.
[Lio90]
P. L. Lions. On the Schwarz alternating method. III: a variant for nonoverlapping subdomains. In T. F. Chan, R. Glowinski, J. P´eriaux, and O. B. Widlund, editors, Third International Symposium on Domain Decomposition Methods for Partial Differential Equations, pages 202–223. SIAM, 1990.
[Man93]
J. Mandel. Balancing domain decomposition. Comm. Numer. Meth. Engrg., 9:233–241, 1993.
[McC87]
S. F. McCormick, editor. Multigrid Methods. SIAM, 1987. Frontiers in Appl. Math. Vol. 6.
[McC89]
S. F. McCormick. Multilevel Adaptive Methods for Partial Differential Equations. SIAM, 1989.
[MFC98]
J. Mandel, Ch. Farhat, and X-C. Cai, editors. Domain Decomposition Methods 10. Number 218 in Contemporary Mathematics. AMS, 1998.
[Mil65]
K. Miller. Numerical analogs to the Schwarz Alternating Procedure. Numer. Math., 7:91–103, 1965.
[MM94]
K. W. Morton and D. F. Mayers. Numerical Solution of Partial Differential Equations. Cambridge University Press, 1994.
162
BIBLIOGRAPHY
[MO94]
T. A. Manteuffel and J. S. Otto. On the Roots of the Orthogonal Polynomials and Residual Polynomials Associated with a Conjugate Gradient Method. Numerical Linear Algebra with Applications, 1(5):449–475, 1994.
[Mor95]
R. B. Morgan. A restarted GMRES method Augmented with Eigenvectors. SIAM J. Matrix Anal. Applics., 16(4):1154–1171, 1995.
[Mor97]
R. B. Morgan. Implicitly restarted GMRES and Arnoldi Methods for nonsymmetric systems of equations. submitted, 1997.
[MR86]
S. F. McCormick and J. Ruge. Unigrid for multigrid simulation. Math. Comp., 41:43–62, 1986.
[MS96]
T. A. Manteuffel and G. Starke. On hybrid iterative methods for nonsymmetric systems of linear equations. Numer. Math., 73:489– 506, 1996.
[MZ98]
R. B. Morgan and M. Zeng. Harmonic projection methods for large nonsymmetric eigenvalue problems. Numerical Linear Algebra with Applications, 5(1):33–55, 1998.
[NR95a]
F. Nataf and F. Rogier. Factorization of the convection-diffusion operator and the Schwarz algorithm. Mathematical Models and Methods in Applied Sciences, 5(1):67–93, 1995.
[NR95b]
F. Nataf and F. Rogier. Outflow Boundary Conditions and Domain Decomposition method. In D. E. Keyes and J. Xu, editors, Domain Decomposition Methods in Scientific and Engineering Computing, number 180 in Contemporary Mathematics, pages 289–293. AMS, 1995.
[NRT92a]
N. M. Nachtigal, S. C. Reddy, and L. N. Trefethen. How fast are nonsymmetric matrix iterations. SIAM J. Matrix Anal. Applics., 13(3):778–795, 1992.
[NRT92b]
N. M. Nachtigal, L. Reichel, and L. N. Trefethen. A hybrid GMRES algorithm for nonsymmetric linear systems. SIAM J. Matrix Anal. Applics., 13(3):796–825, 1992.
[Osw94]
P. Oswald. Multilevel Finite Element Approximation, Theory and Applications. Teubner Skripten zur Numerik. B.G. Teubner, 1994.
[Par80]
B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, 1980.
BIBLIOGRAPHY
163
[PPdV95]
C. C. Paige, B. N. Parlett, and H. A. Van der Vorst. Approximate Solutions and Eigenvalue Bounds from Krylov Subspaces. Numerical Linear Algebra with Applications, 2(2):115–134, 1995.
[PR55]
D. Peaceman and H. Rachford. The numerical solution of elliptic and parabolic differential equations. Journal of SIAM, 3:28–41, 1955.
[PS82]
C. C. Paige and M. A. Saunders. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software, 8:43–71, 1982.
[QPKW94]
A. Quarteroni, J. P´eriaux, Yu. A. Kuznetsov, and O. B. Widlund, editors. Proc. Sixth Int. Conf. on Domain Decomposition Meths. Number 157 in Contemporary Mathematics. AMS, 1994.
[QV94]
A. Quarteroni and A. Valli. Numerical Approximation of Partial Differential Equations. Springer, 1994.
[QV99]
A. Quarteroni and A. Valli. Domain Decomposition Methods for Partial Differential Equations. Oxford Science Publication, 1999.
[R¨ud93]
U. R¨ude. Mathematical and Computational Techniques for Multilevel Adaptive Methods. SIAM, 1993.
[Saa83]
Y. Saad. Projection methods for solving large sparse eigenvalue problems. In B. K˚agstr¨om and A. Ruhe, editors, Matrix Pencils, Lecture Notes in Mathematics 973, pages 121–144. SpringerVerlag, 1983.
[Saa92]
Y. Saad. Numerical Methods for Large Eigenvalue Problems. Manchester University Press, 1992.
[Saa93]
Y. Saad. A flexible inner-outer preconditioned GMRES algorithm. SIAM J. Sci. Stat. Comput., 14(2):461–469, 1993.
[Saa96]
Y. Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing Company, 1996.
[Saa97]
Y. Saad. Analysis of augmented Krylov Subspace Methods. SIAM J. Matrix Anal. Applics., 18(2):435–449, 1997.
[Saa98]
Y. Saad. Preconditioned Krylov Subspace Methods. In G. Winter Althaus and E. Spedicato, editors, Algorithms for Large Scale Linear Algebraic Systems: Applications in Science and Engineering, volume 508 of NATO ASI Series C: Mathematical and Physical Sciences, pages 131–149. Kluwer Academic Publishers, 1998.
164
BIBLIOGRAPHY
[SBG96]
B. F. Smith, P. E. Bjørstad, and W. D. Gropp. Domain Decompostion: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, 1996.
[Sch90]
H. A. Schwarz. Gesammelte Mathematische Abhandlungen, volume 2, pages 133–143. Springer, 1890. First published in Vierteljahrsschrift der Naturforschenden Gesellschaft in Z¨urich, volume 15, 1870, pp. 272–286.
[Son89]
P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear systems. SIAM J. Sci. Comput., 10:36–52, 1989.
[SS86]
Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7(3):856–869, 1986.
[SS91]
P. E. Saylor and D. C. Smolarski. Implementation of an Adaptive Algorithm for Richardson’s Method. Linear Algebra and its Applications, 154–156:615–646, 1991.
[Sta77]
G. Starius. Composite Mesh Difference methods for elliptic boundary value problems. Numer. Math., 28:243–258, 1977.
[Sta94]
G. Starke. Alternating direction preconditioning for nonsymmetric systems of linear equations. SIAM J. Sci. Comput., 15(2):369–384, 1994.
[Ste95]
H. Steeghs. Long-term simulation Clyde model (z 976.95). Technical report, Delft Hydraulics, June 1995.
[SVdV96]
G. L. G. Sleijpen and H. A. Van der Vorst. A Jacobi-Davidson Iteration Method for Linear eigenvalue Problems. SIAM J. Matrix Anal. Applics., 17(2):401–425, 1996.
[SVdVF94]
G. L. G. Sleijpen, H. A. Van der Vorst, and D. R. Fokkema. BiCGStab(l) and other hybrid Bi-CG methods. Numerical Algorithms, 7:75–109, 1994.
[Tan92]
W. P. Tang. Generalized Schwarz Splittings. SIAM J. Sci. Stat. Comput., 13(2):573–595, 1992.
[Tan95]
K. H. Tan. Local Coupling in Domain Decomposition. PhD thesis, Universiteit Utrecht, 1995.
[TB95]
K. H. Tan and M. J. A. Borsboom. On Generalized Schwarz Coupling applied to advection-dominated problems. In D. E. Keyes and J. Xu, editors, Domain Decomposition Methods in Scientific and
BIBLIOGRAPHY
165
Engineering Computing, number 180 in Contemporary Mathematics, pages 125–130. AMS, 1995. [TB97]
K. H. Tan and M. J. A. Borsboom. Domain Decomposition with patched subgrids. In R. Glowinski, J. Periaux, Z-C. Shi, and O. Widlund, editors, Domain Decomposition Methods in Sciences and Engineering, pages 117–124. John Wiley & Sons Ltd., 1997.
[TT96]
K. C. Toh and L. N. Trefethen. Calculation of Pseudospectra by the Arnoldi Iteration. SIAM J. Sci. Comput., 17(1):1–15, 1996.
[TW98]
A. Tveito and R. Winther. Introduction to Partial Differential Equations: A Computational Approach. Number 29 in Texts in Applied Mathematics. Springer-Verlag, 1998.
[VdSVdV86] A. Van der Sluis and H. A. Van der Vorst. The rate of convergence of Conjugate Gradients. Numer. Math., 48:543–560, 1986. [VdV92]
H. A. Van der Vorst. Bi-CGStab: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 13(2):631–644, 1992.
[VdVS98]
H. A. Van der Vorst and G.L.G. Sleijpen. Iterative Bi-CG type methods and implementation aspects. In G. Winter Althaus and E. Spedicato, editors, Algorithms for Large Scale Linear Algebraic Systems: Applications in Science and Engineering, volume 508 of NATO ASI Series C: Mathematical and Physical Sciences, pages 217–253. Kluwer Academic Publishers, 1998.
[VdVV93]
H. A. Van der Vorst and C. Vuik. The superlinear convergence behaviour of GMRES. Journal of Computational and Applied Mathematics, 48:327–341, 1993.
[VdVV94]
H. A. Van der Vorst and C. Vuik. GMRESR: a family of nested GMRES methods. Numerical Linear Algebra with Applications, 1:369–386, 1994.
[Vui95]
C. Vuik. New insights in GMRES-like methods with variable preconditioners. Journal of Computational and Applied Mathematics, 61:189–204, 1995.
[Wal88]
H. F. Walker. Implementation of the GMRES method using Householder transformations. SIAM J. Sci. Stat. Comput., 9(1):152–163, January 1988.
166
BIBLIOGRAPHY
[WAS98]
G. Winter Althaus and E. Spedicato, editors. Algorithms for Large Scale Linear Algebraic Systems: Applications in Science and Engineering, volume 508 of NATO ASI Series C: Mathematical and Physical Sciences. Kluwer Academic Publishers, 1998.
[Wes92]
P. Wesseling. An Introduction To Multigrid Methods. Wiley, 1992.
[WZ94]
H. F. Walker and L. Zhou. A Simpler GMRES. Numerical Linear Algebra with Applications, 1(6):571–581, 1994.
[Xu92]
J. Xu. Iterative methods by space decomposition and subspace correction. SIAM Review, 34(4):581–613, 1992.
[Yse86a]
H. Yserentant. Hierarchical bases give conjugate gradient type methods a multigrid speed of convergence. Appl. Math. Comp., 19:347–358, 1986.
[Yse86b]
H. Yserentant. On the multi-level splitting of finite element spaces. Numer. Math., 49:379–412, 1986.
[Yse86c]
H. Yserentant. On the multi-level splitting of finite element spaces for indefinite elliptic boundary value problems. SIAM J. Numer. Anal., 23:581–595, 1986.
[Yse90]
H. Yserentant. Two preconditioners based on the multi-level splitting of finite element spaces. Numer. Math., 58(2):163–184, 1990.
Index D ELFT 3 D, 61 D ELFT 3 D - FLOW, 63, 75
Conforming Discretisation, 150 Conjugate Gradient, 6 Consistent Interpolation, 93 Continuity Equation, 63 Contraction Factor, 90 convection–diffusion Equation, 149 Convergence Criterion, 66 Coriolis Force, 63 Courant Number, 68
Additive Schwarz Method, 91 Advection, 63 Alternating Direction Implicit, 64 Alternating Operator Implicit Method, 63 Arnoldi Eigenvalue Estimates, see Ritz Values Arnoldi’s Method, see FOM
Diffusion, 63 Dirichlet Boundary Condition, 149 Discretisation, 64 Finite Difference, 64 Upwinding, 150 Distributed Memory, 88 Domain Decomposition, 17 Domain Decomposition Method, 85
Benqu´e Testcase, 68 BiCG, 12 BiCGStab, 13 BiCGStab(l), 13 BiCGStab2, 13 Bilinear Form, 149 bounded, 150 coercive, 150 continuous, see Bilinear Form, bounded Bilinear Interpolation, 95 Boundary Conditions, 78 Dirichlet, 64 Neumann, 64 BPX, 31
Error Bound, 90 FFOM, 10 FGMRES, 10, 66, 68, 70, 85 Field of Values, 39 Finite Elements A Priori Estimate, 150 Lumping, 124 P1 Element, 122 Q1 Element, 124, 151 Stability, 150 Finite Termination Property, 6 Fixed Point Iterations, 63 FOM, 8 Residual Polynomial, 37, 147
CGNE, 8 CGNR, 8 CGS, 12 Ch´ezy Coefficient, 68 Clyde River Model, 69 Composite Mesh Difference Method, 88 167
INDEX
168
Fourier Analysis, 151 Free Surface, 63 Gauss-Seidel Red-Black, 63 GCR, 11 Generalised Additive Schwarz Method, 76 Global Accuracy, 88 GMRES, 8 Residual Polynomial, 40 Stagnation, 40 Harmonic Ritz Values, 39 Hierarchical Basis Method, 31 Householder Transformations, 10 Hydrostatic Assumption, 62 hydrostatic assumption, 62 Interior Eigenvalues, 39 Interpolation, 88 Interpolation Constant, 89 Interpolation Error, 89 Invariant Subspace, 40 Kronecker Delta, 123 Krylov Iterative Method, 64, 77 BiCG, see BiCG BiCGStab, see BiCGStab BiCGStab(l), see BiCGStab(l) BiCGStab2, see BiCGStab2 CGNE, see CGNE CGNR, see CGNR CGS, see CGS FFOM, see FFOM FGMRES, see FGMRES FOM, see FOM GCR, see GCR GMRES, see GMRES MINRES, see MINRES QMR, see QMR SYMMLQ, see SYMMLQ TFQMR, see TFQMR
Krylov Subspaces, 5 Laplacian, 18 Lax–Milgram Theorem, 150 Linear System Nonsymmetric, 64 Pentadiagonal, 64 Linear Systems Tridiagonal, 66 Lucky Breakdown, 55 Maximum Norm Stability, 90 Maximum Principle, xxv, 90, 92 Mesh Peclet Number, 150 MINRES, 7 Modified Gramm–Schmidt Orthogonalisation, 10 Momentum Equations, 63 Mortar Projection, 88 Multigrid V-cycle, 32 W-cycle, 32 Multilevel Algorithms, 29 Multilevel nodal basis, 31 Nonlinear, 62 Nonmatching Grids, 88 Outlier, 15 Parallel Computer, 88 Partial Differential Equation Elliptic, 64 Parabolic, 64 Partial Differential Equations, 62 Petrov–Galerkin, 150 Poincar´e Constant, 150 Poisson’s Equation, 88 Postmoothing, 32 Preconditioner, 64 Presmoothing, 32 Pressure, 62 QMR, 12
INDEX
Rayleigh–Ritz, 39 Residual Norm, 66 Ritz Values, 37, 79, 83, 147 Shallow Water Equations, 62, 85 Simpler GMRES, 38 Sobolev Norm, 89 Sobolev Space, 89 Sobolev space, 149 Spectral Radius, 65, 77 Streamline-diffusion Method, 151 Superlinear Convergence, 7 SYMMLQ, 7 TFQMR, 13 Time Step, 64 Transmission Conditions, 78 True Breakdown, 55 Truncation Error, 89, 96, 100, 102, 106 Upwind, see Discretisation, Upwinding Velocity, 63 Viscosity, 63 Wachspress Relaxation Parameters, 65 Water Elevation, 63 Waves, 62 Weak Formulation, 149
169