S
KATHOLIEKE UNIVERSITEIT LEUVEN FACULTEIT INGENIEURSWETENSCHAPPEN DEPARTEMENT COMPUTERWETENSCHAPPEN AFDELING INFORMATICA Celestijnenlaan 200 A — B-3001 Leuven
Static and Dynamic Verification of Indirect Data Sharing in Component-based Applications
Promotoren : Prof. Dr. ir. F. PIESSENS Prof. Dr. ir. P. VERBAETEN
Proefschrift voorgedragen tot het behalen van het doctoraat in de ingenieurswetenschappen door Lieven DESMET
Januari 2007
S
KATHOLIEKE UNIVERSITEIT LEUVEN FACULTEIT INGENIEURSWETENSCHAPPEN DEPARTEMENT COMPUTERWETENSCHAPPEN AFDELING INFORMATICA Celestijnenlaan 200 A — B-3001 Leuven
Static and Dynamic Verification of Indirect Data Sharing in Component-based Applications
Jury : Prof. Dr. Prof. Dr. Prof. Dr. Prof. Dr. Prof. Dr. Prof. Dr. Prof. Dr.
ir. W. Sansen, voorzitter ir. F. Piessens, promotor ir. P. Verbaeten, promotor ir. E. Steegmans ir. W. Joosen T. Holvoet D. Gurov (KTH Royal Institute of Technology)
U.D.C. 681.3∗D1, 681.3∗D2, 681.3∗D46
Januari 2007
Proefschrift voorgedragen tot het behalen van het doctoraat in de ingenieurswetenschappen door Lieven DESMET
c °Katholieke Universiteit Leuven – Faculteit Ingenieurswetenschappen Arenbergkasteel, B-3001 Heverlee (Belgium) Alle rechten voorbehouden. Niets uit deze uitgave mag worden vermenigvuldigd en/of openbaar gemaakt worden door middel van druk, fotocopie, microfilm, elektronisch of op welke andere wijze ook zonder voorafgaande schriftelijke toestemming van de uitgever. All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the publisher. D/2007/7515/3 ISBN 978–90–5682–769–4
Abstract Modern software systems evolve towards modularly composed applications, in which existing software components are reused in new software compositions. Current component-based software systems often promote simple syntactical interfaces to make the wiring of components simple. In practice however, these looselycoupled components sometimes tend to have semantical dependencies, for example by passing data to each other on a shared data repository as is the case in datacentered applications. Typically, these hidden semantical dependencies are not checked at compile-time and lead to run-time errors in current software systems. The main contribution of this thesis is an approach to reduce the number of runtime errors due to broken data dependencies in data-centered, component-based applications. We have built up extensive hands-on experience with such applications in two application domains: networking software and web applications. This experience shows that errors caused by broken data dependencies are important in practice. Our approach is based on the formal verification of a composition property, the no broken data dependencies property. This formal verification is achieved in two steps. In a first step, we statically verify the desired composition property in deterministic software compositions. To do so, each component of the composition is extended with a component contract, stating in a problem-specific contract language what the shared data interactions of the component are. These contracts are then verified in two steps. First, the compliance of the component’s implementation with its contract is checked. Next, the different component contracts are used to verify the composition property in a given composition. In a second step, we extend the verification of the desired composition property towards reactive software systems by combining the static verification approach with run-time checking. Reactive systems are characterised by their nonterminating behaviour and perpetual interactions with their environment, as is for example the case in graphical user interfaces. In addition to the approach for deterministic compositions, the expected, reactive protocol between the client and the server is expressed in a labelled state machine. Next, we apply the earlier proposed solution to verify statically whether or not the desired property is vioi
ii lated in a given composition, while assuming that the actual interaction protocol complies with the expressed state machine. Finally, we use the run-time checking capabilities of a Web Application Firewall to guarantee that the incoming requests of a user’s session adheres to the verified labelled state machine and thus that the no broken data dependencies composition property also holds in the given composition. Finally, we validated the formal verification of the no broken data dependencies composition property in both a deterministic and a reactive software composition. The static verification approach is successfully applied to the medium-sized open-source webmail application GatorMail, in which more than 1350 shared data interactions are present. In addition, the combination of static verification and run-time checking by means of a Web Application Firewall is validated in the Duke’s BookStore application, a reactive e-commerce application. In both case studies, a limited annotation and verification overhead was measured, and they both illustrated that the presented solution is scalable to larger, real-life software applications thanks to the modular specification and verification process.
Voorwoord – Preface In onze huidige maatschappij zijn computersystemen en het Internet alomtegenwoordig. Tevens hangen we in ons dagdagelijkse leven steeds sterker af van de goede werking van deze infrastructuur en hun diensten, zowel vanuit een individueel als vanuit een sociaal en economisch perspectief. Het aanbieden van betrouwbare en veilige diensten is dan ook essentieel, maar lang niet evident. Deze doctoraatsthesis onderzoekt daarom in deze context hoe de betrouwbaarheid en veiligheid verhoogd kan worden van componentgebaseerde software toepassingen, waarin indirect data wordt gedeeld. Velen hebben direct of indirect bijgedragen tot het uitvoeren van dit doctoraatsonderzoek, en ik zou dan ook graag iedereen bedanken die op professioneel of persoonlijk vlak bijgedragen heeft tot dit werk. Daarnaast wil ik nog een aantal mensen extra in de schijnwerper plaatsen. Ik zou in de eerste plaats mijn promotoren, professor Frank Piessens en professor Pierre Verbaeten, van harte willen bedanken. Zij hebben de realisatie van dit werk mogelijk gemaakt en mij op deze tocht met raad en daad bijgestaan. Speciale dank gaat hierbij uit naar Frank die instond voor de dagdagelijkse begeleiding. Steeds opnieuw maakte hij tijd voor mij vrij en ik zal mij dan ook nog lang de vele inspirerende discussies en kritische reflecties herinneren. In combinatie met zijn inspirerend enthousiasme en gedrevenheid, zijn talent om mensen te motiveren en zijn gefundeerde onderzoeksvisie maakt dat van hem dan ook een promotor uit de duizend. Naast mijn promotoren wil ik ook de andere leden van de begeleidingscommissie, professor Eric Steegmans en professor Wouter Joosen, van harte danken voor het kritisch nalezen van deze thesistekst. Ik wil Wouter ook bedanken om samen met mijn promotoren mijn onderzoek te helpen kaderen in een bredere onderzoeksvisie. Samen hebben ze binnen DistriNet gezorgd voor een brede waaier aan opportuniteiten en mij tevens voldoende vrijheid aangereikt om mijn eigen onderzoekspiste verder te kunnen uitdiepen. I would also like to thank professor Tom Holvoet and professor Dilian Gurov (from the KTH Royal Institute of Technology in Stockholm) for accepting to be members of the jury, and professor Willy Sansen for chairing the jury. A special thanks to Dilian for travelling all the way to Leuven. iii
iv Ik zou eveneens graag mijn collega’s van de DistriNet onderzoekgroep willen bedanken voor de interessante discussies en de fijne samenwerking. Ik denk hierbij in het bijzonder aan de leden van de security taskforce en de networking taskforce. Hen allemaal opsommen is onbegonnen werk, maar toch een expliciet dankjewel aan Tine Verhanneman, Yves Younan, Bart De Win, Riccardo Scandariato, Bart Jacobs, Jan Smans, Dries Vanoverberghe, Koen Yskout en Thomas Heyman van de security taskforce; en Nico Janssens, Sam Michiels en Tom Mahieu van de networking taskforce. Verder wil ik ook enkele collega’s buiten deze twee taskforces expliciet bedanken waaronder Andr´e Mari¨en, Adriaan Moors, Eddy Truyen, Bert Lagaisse en Marko van Dooren. Verder wil ik ook de vele vrienden buiten de onderzoeksgroep bedanken. Jullie zorgden ervoor dat de computer op tijd en stond werd ingeruild voor de nodige ontspanning. Gaande van een partijtje squash of een toertje met de mountainbike tot een avondje stevig doorzakken, jullie waren altijd wel van de partij. Ik wil ook mijn ouders uitgebreid bedanken. Zij hebben mijn studies mogelijk gemaakt en zo de fundamenten gelegd voor mijn doctoraat. Gedurende die ganse periode hebben ze mij blijvend gesteund en gemotiveerd. Ook een woord van dank aan mijn zus Greet, mijn schoonbroer Pieter en mijn schoonouders. Tot slot wil ik mijn vrouw Kathleen bedanken. Al zeven jaar lang ben je mijn steun en toeverlaat, en laat je dag in dag uit de zon schijnen in mijn leven. Gedurende mijn ganse doctoraat stond je aan mij zijde. Je gaf mij telkens opnieuw een duwtje in de rug wanneer ik het nodig had, ook al was het afwerken van deze thesis ook voor jou soms een hele beproeving. Van harte bedankt! Lieven Desmet Januari 2007
Contents 1 Introduction 1.1 Background . . . . . . . 1.2 Problem Statement . . . 1.3 Main Contributions . . . 1.4 Overview of this Thesis
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
2 Indirect Data Sharing in Loosely-coupled Component 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Indirect Data Sharing In Data-centered Applications . . 2.3 Case Study 1: Component-based Web Applications . . . 2.3.1 Web Applications . . . . . . . . . . . . . . . . . . 2.3.2 A Servlet-based E-commerce Application . . . . 2.4 Case Study 2: Component-based Protocol Stacks . . . . 2.4.1 Protocol Stacks . . . . . . . . . . . . . . . . . . . 2.4.2 DiPS+ Rapid Prototyping Infrastructure . . . . 2.4.3 A Simplified DiPS+ IPv4 Layer . . . . . . . . . . 2.4.4 Development of the DiPS+ Radius Layer . . . . 2.5 Goal and Scope of this Work . . . . . . . . . . . . . . . 2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
1 2 4 5 6
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 7 9 10 10 12 15 15 16 17 21 25 26 29
. . . .
. . . .
. . . .
. . . .
. . . .
3 Dependency Analysis of the GatorMail Webmail Application 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 GatorMail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 The Struts Framework . . . . . . . . . . . . . . . . . . . . 3.2.2 Composition Example . . . . . . . . . . . . . . . . . . . . 3.3 Dependency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Exploring Dependencies . . . . . . . . . . . . . . . . . . . 3.3.2 Heuristic Identification of Dependencies . . . . . . . . . . 3.3.3 Abstract Application Model . . . . . . . . . . . . . . . . . 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
. . . . . . . . .
31 32 32 33 34 39 39 43 48 48
vi
CONTENTS
3.5
3.4.1 Overview of Dependencies . . . . . . . . . . . . . . . . . . . 3.4.2 Key Characteristics . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48 53 56
4 Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Component Contracts and Static Verification . . . . . . . . 4.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Simplified Application Model . . . . . . . . . . . . . . . . . 4.3.2 Composition Example . . . . . . . . . . . . . . . . . . . . . 4.3.3 Desired Composition Properties . . . . . . . . . . . . . . . . 4.4 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 No Dangling Forwards Property . . . . . . . . . . . . . . . 4.4.2 No Broken Data Dependencies Property . . . . . . . . . . . 4.4.3 Unsoundness with ESC/Java2 . . . . . . . . . . . . . . . . . 4.5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Verifying Struts Applications: an Example . . . . . . . . . 4.5.2 Results of the GatorMail Experiment . . . . . . . . . . . . 4.5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57 58 59 59 59 60 61 62 62 63 64 67 68 68 70 71 73 74
5 Bridging the Gap Between Web Application Firewalls and Web Applications 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Web Vulnerabilities and Web Application Firewalls (WAFs) 5.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Prototype Implementation . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Server-side Specification and Verification . . . . . . . . . . . 5.5.2 Application-specific Protocol Verification . . . . . . . . . . 5.5.3 Run-time Protocol Enforcement . . . . . . . . . . . . . . . 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Results of the BookStore Experiment . . . . . . . . . . . . 5.6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75 76 77 77 78 81 84 84 86 88 88 88 89 91 92
CONTENTS 6 Conclusion 6.1 Summary . . . . . . . . . . . 6.1.1 Main Contributions . 6.1.2 Validation . . . . . . . 6.2 Future Work . . . . . . . . . 6.3 Developer-centric Verification 6.4 Concluding Thoughts . . . .
vii
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
93 93 93 94 96 97 99
viii
CONTENTS
List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11
Functional dataflow dependencies . . . . . . . . . . . . . . . . A small e-commerce web application . . . . . . . . . . . . . . Additional constraints on the data flow dependencies . . . . . OSI reference model versus the TCP/IP protocol suite . . . . A protocol stack in DiPS . . . . . . . . . . . . . . . . . . . . Shared repositories in DiPS+ . . . . . . . . . . . . . . . . . . A simplified IPv4 protocol layer . . . . . . . . . . . . . . . . . Dataflow dependencies in the IPv4 protocol layer . . . . . . . The IPSec protocol layer . . . . . . . . . . . . . . . . . . . . . Design of RADIUS authentication and accounting in DiPS+. Data dependencies in the DiPS+ Radius layer . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
10 13 14 16 17 18 18 20 21 22 24
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12
Request processing in Struts . . . . . . . . . . . . . . . . Composition example in Struts . . . . . . . . . . . . . . Internal dataflow dependencies in Struts . . . . . . . . . Interaction between client and server . . . . . . . . . . . Protocol transitions for /preferences.do . . . . . . . . . Data sharing between client and server . . . . . . . . . . Structure of preferences.jsp using Struts tiles . . . . . . Abstract application model . . . . . . . . . . . . . . . . Composition Processing /modifyMessage.do . . . . . . . Crosscuttingness of dependencies in GatorMail . . . . . Composition processing /saveAddresses.do . . . . . . . . Visualisation of the dependencies for /saveAddresses.do
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
34 35 41 42 42 43 46 49 53 54 54 55
4.1 4.2 4.3 4.4
The simplified application model . . . . . . Composition example: scheduling a meeting Overview of the solution . . . . . . . . . . . /createFolder.do composition in GatorMail
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
61 61 62 68
5.1
Web Application Firewall infrastructure . . . . . . . . . . . . . . .
78
ix
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
x
LIST OF FIGURES 5.2 5.3 5.4
Solution overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . Client/server interaction protocol . . . . . . . . . . . . . . . . . . . Class diagram of the run-time enforcement engine . . . . . . . . .
82 83 88
Listings 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8
Struts configuration file: struts-config.xml . . . . . . . . . . . . . . 35 Implementation of an ActionForm: PreferencesForm.java . . . . . . 36 Implementation of an Action: PreferencesAction.java . . . . . . . . 37 Implementation of a JSP view: preferences.jsp . . . . . . . . . . . 38 Conditional repository interaction . . . . . . . . . . . . . . . . . . 41 Use of static methods: ActionsUtil.java . . . . . . . . . . . . . . . 45 Use of the JSTL Expression Language: loginForm.jsp . . . . . . . 46 Extract from the navigation tile: navBar.jsp . . . . . . . . . . . . . 47 Framework-specific contract of AddMeetingAction . . . . . . . . . 63 ENBF notation of the framework-specific contract language . . . . 63 Contract for declarative forwarding (AddMeetingAction.spec) . . . 64 Contract for indirect data sharing (EmailNotificationAction.spec) . 65 JML contract for indirect data sharing (AddMeetingAction.spec) . 65 JML contract of the shared data repository (Request.spec) . . . . 66 Composition-specific check method to be verified by ESC/Java2 . . 66 Frame condition of EmailNotificationAction . . . . . . . . . . . . . 67 Declarative forwarding in Struts . . . . . . . . . . . . . . . . . . . 69 JML specification of ActionMapping . . . . . . . . . . . . . . . . . 69 Declarative forward specification of FolderManageAction . . . . . . 69 JML contract of FolderAction . . . . . . . . . . . . . . . . . . . . . 71 Struts-specific contract of FolderAction . . . . . . . . . . . . . . . 71 Example of a defensive read/write operation in BookDetailsServlet 79 Problem-specific specification of ShowCartServlet . . . . . . . . . . 82 ENBF notation of the problem-specific contract language . . . . . 82 EBNF notation of the client-server protocol . . . . . . . . . . . . . 83 Contract for shared session repository interactions (ShowCartServlet.spec) 84 JML contract of the session repository (HttpSession.spec) . . . . . 85 Protocol-simulating check method to be verified by ESC/Java2 . . 87 Component-specific specification of the repository (HttpSession.spec for ShowCartServlet) . . . . . . . . . . . . . . . . . . . . . . . . . . 90
xi
xii
LISTINGS
List of Tables 3.1 3.2
Overview of control flow transitions and shared data interactions in the internal viewpoint . . . . . . . . . . . . . . . . . . . . . . . . . Overview of control flow transitions and data passing in the external viewpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51 52
4.1 4.2 4.3
Indirect data dependencies in /createFolder.do . . . . . . . . . . . JML notation overhead in GatorMail . . . . . . . . . . . . . . . . . Verification performance . . . . . . . . . . . . . . . . . . . . . . . .
69 70 72
5.1
Interactions with the shared session repository in the BookStore application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Verification performance . . . . . . . . . . . . . . . . . . . . . . . .
80 89
5.2
xiii
xiv
LIST OF TABLES
Chapter 1
Introduction Developing a large software systems completely from scratch is a very time- and resource-consuming activity, which makes custom-made software less cost-effective for a lot of companies and which results in a high time-to-market as well. On the other hand, buying customisable standard software sometimes reduces the flexibility needed in a particular setup or lowers the competitive advantage on competitors who can buy the same software. Component-based software development combines several advantages of both worlds. Components are independent building blocks possibly provided by different vendors. New software solutions are built by combining commercial off-the-shelf components and custom-made components together in new compositions [115]. In [116], Szyperski et al. define software components as a unit of composition with contractually specified interfaces and explicit context dependencies only. In practice however, software developers and software vendors tend to have a less rigid interpretation of software components and component-based software development. Even in cases where software components also specify their semantical behaviour [14] (as opposed to just their syntactical programming interface), these contracts are often incomplete or expressed informally. In addition, context dependencies are often neglected. Software systems also become more and more mission critical. In software systems, such as computer networks [73], application servers [113] and avionics systems [106], software failures can cause a considerable loss and damage as business processes strongly depend on the availability and the correct functioning of software systems. The explosion of the Ariane 5 in 1996 for instance is probably the most well-known and costliest software failure ever and is caused by a reuse error due to incomplete specification [70]. In addition, many modern software systems have an increasing demand for security. More and more companies for example incorporate e-commerce in their business model to increase their revenues, but at the same time, their web applica1
2
Introduction
tions tend to be error-prone, and these bugs are a welcome target for attackers due to their high accessibility and possible profit gain. Similarly, NIST’s national vulnerability database clearly shows an increasing number of vulnerabilities located in the application layer [88]. Quality characteristics of a software system such as reliability and security can be expressed in terms of software quality attributes or metrics [12]. Several standards and models such as ISO/IEC 17799 [2], the Common Criteria [91] and the Capability Maturity Model (CMM) [3] exist, but they often achieve a more qualitative rather than a quantitative measurement of the software quality. Furthermore, in component-based software systems it is quite difficult to guarantee that certain quality characteristics are achieved by composing the individual building blocks, each with their own characteristics. This is especially the case in software systems composing commercial off-the-shelf components [27], and in runtime reconfigurable or multi-stakeholder distributed systems, for which in-depth quality measurement or integration testing is almost unfeasible.
1.1
Background
Within the DistriNet research group, we worked on different techniques to achieve certain security and reliability related quality characteristics in modularly composed software systems. I contributed in three main research tracks, in different research domains. Firstly, we focussed on achieving graceful degradation behaviour in overload situations in component-based software applications. To do so, the concurrency behaviour of an application was separated from the functional building blocks by introducing dedicated concurrency components. In this way, the concurrency model could easily be adapted without the need to modify the existing functional components within a given composition. Two adaptations of the concurrency model were investigated in the context of graceful degradation in component-based protocol stacks. In applications with blocking behaviour, a multi-threaded concurrency model was combined with a selfhealing scheduling algorithm in order to achieve optimal performance results in cases of overload [26, 82]. Similarly, to reflect the business policy, the concurrency model was customised to handle important requests in a prioritised way during overload situations [83, 84]. This first track was research continued upon my MSc thesis [26] and was joint work with Sam Michiels as part of the SCAN project [5]. Secondly, we focussed on providing additional support to perform threat analyses on web applications. To do so, a generic architecture for web applications was defined, integrating a set of web technologies that are often combined in current web applications such as web services and databases [28]. Based on this architecture, a technology-specific threat analysis was performed on the web service technology using the STRIDE threat enumeration methodology [54] as
1.1 Background
3
reported in [29]. Next, the CORAS methodology [42] was used to perform risk analysis and to visualise threats, risks and treatments typically involved in web services-based web applications. The abstract risk analysis model was then used as input artefact to automate risk management and treatment selection in a particular web services-based web application [47]. This second track was joint work with several colleagues of the DistriNet Security Taskforce and was part of the DeSecA project [86]. Other partners in this project performed threat analyses on web application containers [49], security tokens [19], directory services [17] and databases [13]. Thirdly, we focussed on achieving reliability and security relevant properties in software compositions by applying formal verification. To deal with different types of bugs in Java-like programs, we explored the combination of program annotations and formal verification techniques to verify the absence of these bugs in a given software system. In our approach, components are first annotated with a problem-specific component contract. Next, the component contracts of a composition are translated to a general purpose specification formalism, that is then fed to an automatic prover. Various researchers in DistriNet have applied this approach to three types of bugs. (1) The author of this thesis verified that data dependencies are not broken in software compositions with a shared data repository. The problem-specific contract annotations specify the indirect data sharing interactions between the components and the shared data repository. In addition, static verification is used to verify that dependencies are not broken in deterministic software compositions [38]. Moreover, the static verification was combined with run-time checking in reactive software systems [37]. (2) In [63], Jacobs proposes a programming model for concurrent programming in Java-like languages, including the design of a set of problem-specific program annotations and automated verification of compliance. This program model ensures the absence of data races and deadlocks in Java-like programs. (3) In [108], Smans proposes formal component contracts for stack inspection-based sandboxing, and they verify that at run time the component implementations will not throw security exceptions in C#. For each of these systems, a prototype verifier was built, and some experience in verifying medium sized software systems was gained. The case studies include a chat server, and an open-source web mail application. This third track was part of the SoBeNet project [101]. The approach of formally achieving reliability and security relevant composition properties also strongly relates to other research topics within the DistriNet research group. For instance, with the NeCoMan middleware platform, Janssens provides support for safe, distributed-service adaptations in run-time reconfigurable software systems. The main focus of his research is to achieve a customised, safe adaptation process, i.e. the distributed software system remains in a consistent state during the run-time reconfiguration [66]. This safe adaptation process
4
Introduction
is complementary to the verification process of this research track. The latter one verifies that among others no dependencies are broken in a given composition, whereas the NeCoMan platform guarantees that the run-time transition from an old (verified) composition to a new (verified) composition is performed in a safe way. This thesis will present in more detail my contributions within this third research track, i.e. the formal verification of the absence of composition problems related to indirect data sharing.
1.2
Problem Statement
Data-centered software applications consist of a set of components and a shared data repository [107]. Each component can interact with the shared repository to store data on the repository, or to retrieve data from the repository. By using the data repository, the different components of an application can indirectly share data without actually interacting with each other. This loose coupling adds extra flexibility to software systems and is widely used in various applications and frameworks. A data-centered application is correctly composed if, at run time, each component is able to retrieve the data from the repository that it expects to find. Thus, the correct functioning of a component depends on the run-time state of the shared repository during its execution. Or rephrased, in applications with a shared data repository, implicit semantical dependencies exist between components that share a common data item on the shared repository. Typically, these dependencies are hidden within a software system. Because of this, it is hard for a software composer to reuse existing components in new compositions without breaking any of the hidden data dependencies between the components [32, 34]. Especially in run-time reconfigurable systems where reconfigurations can easily break dependencies, a decent dependency management is crucial to achieve a reliable software system [31, 65]. Moreover, the complexity of shared data dependencies in real-life applications may not be underestimated as was illustrated in the in-depth dependency analysis of the GatorMail webmail application [36]. In this medium-sized software system, already more than 1350 interactions with the shared data repository were identified without any form of documentation. Breaking data dependencies in data-centered applications is a relevant composition problem and typically leads to run-time errors. This results in less reliable and secure software systems. Hence, the main goal of this thesis is to reduce the number of run-time errors due to broken data dependencies in data-centered applications. This thesis will present solutions to detect and prevent such composition problems at compile time or at composition time instead of at run time.
1.3 Main Contributions
1.3
5
Main Contributions
At this moment, the software engineering research community lacks experience reports on the typical composition problems in nowadays software technologies and on the complexity of dependency management in large-scale software systems. In this thesis, we develop good insights in the dependency management of component-based software systems with a shared data repository. We are particularly focussed on dependencies that occur when composing different components that share a common data item on the shared repository. In addition, we perform an in-depth dependency analysis of the medium-sized webmail application GatorMail to investigate the complexity of managing dependencies in such data-centered applications. Based on the extensive hands-on experiences, we define the no broken data dependencies composition property for applications with a shared data repository. Achieving this property in data-centered applications eliminates a large number of composition problems, and the typical errors they cause at run time, such as NullPointerExceptions and ClassCastExceptions. The goal of this thesis is then to reduce the number of run-time errors by formally verifying that a given composition does not violate the no broken data dependencies composition property. We present two solutions to formally verify this property, in respectively deterministic and reactive software compositions. Deterministic software compositions are software systems in which the control flow through the application is independent from the environment. In contrast, reactive systems are characterised by their non-terminating behaviour and perpetual interactions with their environment, as is for example the case in graphical user interfaces. The formal verification of the software composition property is based on the use of component contracts. First, the component contracts specify in a problemspecific language the interactions between the component and the shared repository. Next, static verification is used to verify the composition property in deterministic software systems, relying on the different component contracts. In reactive software compositions, this static verification is combined with run-time checking of the system’s interactions with the environment. The formal verification of the no broken data dependencies composition property is validated in both a deterministic and a reactive software system. Two existing, medium-sized applications are chosen for the validation experiments. These applications are representative to measure the overhead and test the applicability of the solutions in larger, real-life applications.
6
1.4
Introduction
Overview of this Thesis
The thesis is structured as follows. Chapter 2 starts with introducing indirect data sharing in loosely-coupled applications with a shared data repository. Next, two case studies from different application domains are presented, illustrating the typical use of a shared data repository. The first case study investigates componentbased web applications built upon the Java Servlet technology, which is part of the J2EE web tier specification. The second case study explores component-based protocol stacks in DiPS+, an in-house developed protocol stack framework. Based on extensive hands-on experiences in both case studies, the precise goals of this thesis are determined and related with existing research. Chapter 3 illustrates in more detail the complexity of hidden data dependencies in loosely-coupled component systems by investigating the servlet-based webmail application GatorMail, written by the University of Florida. Analysis of the different components of GatorMail and their interactions with the shared repository reveals that even in this medium-sized open-source application more than 1350 interactions occur with the shared data repository without being specified or documented. Extending the GatorMail application or reusing some of its components without breaking one of the hidden data dependencies is really hard, and oversight of the composer can easily lead to run-time errors. In chapter 4, parts of a component’s contract are made formal and automated tool support is used to verify some level of semantical compatibility at composition time. In particular, a given composition is formally verified to ensure that no data dependencies are broken between components that share a common data item on the shared repository. To do so, a component’s contract specifies its interactions with the shared data repository in a problem-specific specification language. Static verification then checks whether the implementation of a component adheres to its contract, and whether in a given composition dependencies between components, that share a common data item, are not broken. Finally, the presented approach is validated on the GatorMail webmail application. Whereas the static verification approach of chapter 4 is limited to deterministic systems, chapter 5 combines static and dynamic verification of the indirect data sharing dependencies to cope with indeterministic, reactive systems. A wellknown example of such a reactive system is a web application, in which a web user typically can choose the next processing step by clicking the link of his choice. In such reactive web applications, the strict request flow enforcement of a Web Application Firewall (WAF) at run time is combined with static verification of the WAF policy in a given software composition. By doing so, the combination of static and dynamic checking ensures a software composer that no shared data dependencies are broken in the reactive system. To conclude, the presented approach is validated in the Duke’s BookStore e-commerce application. Finally, chapter 6 summarises the contributions of this thesis and discusses some opportunities for future research.
Chapter 2
Indirect Data Sharing in Loosely-coupled Component Systems 2.1
Introduction
Modern software systems evolve towards modularly composed services, in which existing software components are reused within new compositions [115]. The different services are interconnected into distributed software systems, using serviceoriented software architectures [74]. Both component-orientation as well as serviceorientation promote a shift towards highly decoupled and reusable software units, which can be implemented independently from each other. Applications are then instantiated by composing the components and connectors of the system in a particular software composition. Recent software systems even enable dynamic reconfiguration of the running application [45, 46]. Hereby, the software system can adapt the current composition towards a new composition by adding, removing and replacing loosely-coupled software units at run time [8, 67]. In this way, highly flexible software systems are achieved, both at composition-time and at run time. Also, software systems become more and more mission critical. In mission critical systems, such as computer networks [73], application servers [113] and avionics systems [106], software failures can cause a considerable loss and damage as business processes strongly depend on the availability and the correct functioning of software systems. Because of the stringent dependability requirements inherent in such critical systems, the composition-time and run-time flexibility should be prevented from jeopardising among others the correct functioning of the system. 7
8
Indirect Data Sharing in Loosely-coupled Component Systems
Building manageable, large-scale software systems introduces several development constraints and invariants, such as architectural constraints and data typing. Specifying and enforcing compatibility requirements, constraints and invariants has already proven to significantly enhance the safety and reliability of a software system (e.g. syntactic interface compatibility [14], type systems [87, 6] and ADLs [59, 89]). By composing the individual components, implicit assumptions are made by the software composer about the usage and interaction of the components in a particular composition instance. In particular, during the composition of the components into a working application, composition-specific dependencies can arise between the different components. Different types of composition-time dependencies can be identified. In eventbased systems for example, the implicit invocations in the control flow are clearly specific for a particular composition and impose dependencies between the different publishers and subscribers of that composition. The control flow of an application is the description how the computational control moves around through a running instance of the application or, in other words, which executing method invokes another method and passes hereby the control to the latter one. Similarly, indirect data sharing in data-centered repositories imposes at composition-time dataflow dependencies between the software components. The dataflow describes how data moves between different components. Dataflow may follow the control flow (e.g. in passing arguments through methods), but dataflow and control flow can also have separate paths within an application. In our opinion, keeping such composition-time dependencies implicit within a software system (as is often done in state of the art systems today), increases the risk for software failure due to broken dependencies, especially in a dynamically reconfigurable system. Therefore, in this thesis, we investigate the dependency problems caused by indirect data sharing in data-centered applications in order to reduce the risk for run-time software failures. The rest of this chapter is structured as follows. Firstly, in section 2.2 the concept of indirect data sharing is further explained. Next, two case studies are presented in sections 2.3 and 2.4, in order to demonstrate the importance of dataflow dependencies and the risk for software failure due to violated constraints or broken dependencies. Based on hands-on experiences in both case studies, the precise goals of this thesis are defined in section 2.5. Finally, section 2.6 discusses related research and section 2.7 summarises the contributions in this chapter. This chapter is partially based on the following publications: • Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Infrastructural support for data dependencies in data-centered software systems. In Proceedings of the Third AOSD Workshop on Aspects, Components, and Patterns for Infrastructure Software, pages 79–80, Lancaster, U.K., March 2004 [32]
2.2 Indirect Data Sharing In Data-centered Applications
9
• Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Improving software reliability in data-centered software systems by enforcing composition time constraints. In Proceedings of Third Workshop on Architecting Dependable Systems (WADS2004), pages 32–36, Edinburgh, Scotland, May 2004 [35] • Sam Michiels, Lieven Desmet, and Pierre Verbaeten. A DiPS+ Case Study: A Self-healing RADIUS Server. Report CW 378, Department of Computer Science, K.U.Leuven, Leuven, Belgium, February 2004 [84] • Sam Michiels, Lieven Desmet, Wouter Joosen, and Pierre Verbaeten. The DiPS+ software architecture for self-healing protocol stacks. In Proceedings of the 4th Working IEEE/IFIP Conference on Software Architecture (WICSA-4), pages 233–242, Oslo, Norway, June 2004 [83]
2.2
Indirect Data Sharing In Data-centered Applications
In the repository architectural style [107], a system consists of a central data structure (representing the state of the system) and a set of separate components interacting with the central data store. This architectural style is quite commonly used in several component models and APIs such as JavaServlet containers [68], Pluggable Authentication Modules framework (PAM) [105] and JavaSpaces in Sun’s Jini [44, 69]. Sometimes, further refinements are introduced to this style. However, in practice, most of the refinements or additional constraints are only modelled implicitly and are not explicitly checked. A data-centered application is a collection of separate components and a shared repository. The application is correctly composed if, among others, every required data item of a component is provided by another component. In figure 2.1, different dataflows within a data-centered application are explicitly shown, while the actual component interactions (i.e. the control flow) are abstracted. According to Shaw and Garlan, two major subcategories of repositories exist, depending on the control flow within the application. If the central data structure is the main trigger for selecting processes to execute, the repository is called a blackboard. Otherwise the repository can be a traditional database [107]. Indirect data sharing strongly contributes to achieve loosely-coupled components, which can easily be developed by different parties 1 . Nevertheless, in our opinion, the very loose nature of such software compositions can also easily lead to 1 In this thesis, we use the term loosely-coupled to reflect the loose coupling of components at the syntactical level [14], irrespectively of the possible tight binding between them on a semantical level, as will be illustrated later on in this chapter.
10
Indirect Data Sharing in Loosely-coupled Component Systems
Figure 2.1: Functional dataflow dependencies reliability and correctness problems. In the next sections, two case studies in different domains are presented to illustrate the use of a shared data repository as the glue between loosely-coupled components. Both case studies investigate some reoccurring problems with indirect data sharing in data-centered applications, based on extensive hands-on experiences within the presented technologies.
2.3
Case Study 1: Component-based Web Applications
In this section the typical use of a shared data repository in a servlet-based ecommerce application is analysed. The case study is structured in two parts. Firstly, subsection 2.3.1 introduces web applications and the web technologies used in this case study. Next, the functionality of the e-commerce application is examined in subsection 2.3.2 with special attention to the interactions with the shared data repository.
2.3.1
Web Applications
Web applications are server-side applications that are invoked by thin web clients (browsers), typically using the HyperText Transport Protocol (HTTP). A user can navigate through a web application by clicking links or URLs in his browser, and he is also able to supply input parameters by completing web forms. A URL maps to a server-resident program that is executed with the user’s supplied input parameters. The result of the program execution (often expressed in the HyperText Markup Language (HTML)) is then sent back to the browser where it is rendered for further user interaction. HTTP is a stateless, application-level request/response protocol and has been in use on the World Wide Web since 1990 [41]. Since the protocol is stateless,
2.3 Case Study 1: Component-based Web Applications
11
each request is processed independently, without any knowledge of previous requests. To enable the typical user’s session concept in web applications, the web application needs to add session management on top of the stateless HTTP layer. Different techniques exist to embed web requests within a user session such as the use of cookies, URL rewriting or hidden form fields [100]. Nowadays, most web applications use an underlying framework or web technology to facilitate the development and the deployment of the web application. Widespread technologies such as PHP, ASP.NET and JSP/Servlets incorporate among others the management of user sessions. Next to tracking to which user session a web request belongs, these technologies also provide server-side state for each user session. While processing a web request, server-side web components can store non-persistent, user-specific data (e.g. a shopping cart in an e-commercesite) in a data container bound to the user session. Other web components can then retrieve this data while processing future requests in the same user session. Java Servlets The Java Servlet technology is part of the J2EE specification and provides mechanisms for extending the functionality of a Web server and for accessing existing business system [68, 56]. Java Servlets are functional units of the web tier. A J2EE web application is typically a collection of Java Servlets and is deployed in a servlet-based web container such as Tomcat, JBoss or WebSphere. The web container offers infrastructural support for using servlets. Extra-functional properties such as load-balancing and security are added to the web container rather than to the servlets themselves. The core functionality of the container is to handle incoming web requests and to use (chains of) servlets for processing the requests. The container casts incoming HTTP requests into an object-oriented form (HTTPServletRequest) and checks to see if there is a servlet registered for processing that request. If there is a match, the request is given to the servlet. The web deployment descriptor (web.xml) of a web container contains the deployment information of the web application, including extra-functional properties and a list of servlets with their corresponding URL mapping. To cope with more complex server-side processing, a chain of servlets can be used to process the incoming requests. Hereby, the different servlets form a pipeand-filter composition and the servlets process the request sequentially: after the first servlet has processed the request, it is passed to the next servlet for further processing until the end of the pipe is reached. An advantage of such a composition is that reoccurring functionality can be encapsulated in a separate servlet that can be reused in different chains. Within a web application, servlets are loosely-coupled with each other and support for dispatching between servlets is provided by the web container. The servlets can communicate anonymously by means of a shared data repository. In
12
Indirect Data Sharing in Loosely-coupled Component Systems
fact, five instances of shared repositories are provided to the servlet, each with a different access scope: a data repository associated with the dynamic web page (1), with the web request (2), with the user session (3), the web context (4) and the application (5). Hence, servlet-based applications are data-centered compositions [107], and the application composer must pay special attention to the dataflow dependencies. JavaServer Pages The JavaServer Pages (JSP) technology is also part of the J2EE specification and is built upon Java Servlets. JSP enables the separation of content and presentation in developing dynamic websites. In JSP, Java code can be embedded into the markup language similar to other technologies such as ASP and PHP. Next to embedding plain Java code, also JSP Tags can be used to encapsulate simple programming logic (such as iterators and boolean tests) and provide the website designer an abstract interface to the model of the application. JSP files are also deployed in a servlet-based web container and are compiled into Java Servlets the first time they are requested within the application. This implies that JavaServer Pages inherit the strengths of Java Servlets, while providing a better separation between logic and markup. In general, JSP files are used to develop the user interface (or view) of a web application. They are loosely coupled, and can communicate anonymously with other JSP files or servlets using shared data repositories.
2.3.2
A Servlet-based E-commerce Application
In this subsection a canonical e-commerce site is analysed in order to illustrate the typical use of the shared data repositories in a servlet-based web application. Figure 2.2 illustrates a subset of the simple e-commerce application. Three different services are identified within the application: adding a product item to the personal shopping basket, the payment of the shopping order and searching through the website. Each box represents a functional task implemented as a servlet, and the services are pipe-and-filter compositions of several independent tasks. Adding a product item to the shopping basket starts by retrieving the necessary item information from the data back-end (retrieve item information). Next, the item is added to the shopping basket (process add to basket), and depending on the success or failure of adding the product to the basket, the new basket (display new basket) or an error page (display error page) is displayed to the end user. Similarly, the payment service is decomposed into a first servlet constructing an order out of the shopping basket entries (prepare basket order) and a second servlet (process order payment), taking care of the payment (e.g. the submission of credit card information). Next, the order is logically processed at the server,
2.3 Case Study 1: Component-based Web Applications
13
Figure 2.2: A small e-commerce web application e.g. by updating the stock and scheduling the transport (process basket order). Finally, the payment result is displayed to the end user (display payment result). Finally, to process search requests, a third-party servlet retrieves pages conform the given search pattern (retrieve search results for pattern). Next, the search results are displayed (display search results). More information about this canonical e-commerce application and its implementation can be found in [47]. 2.3.2.1
Indirect Data Sharing
Although the servlets are implemented as independent components, dataflow dependencies exist between these servlets. The dataflow dependencies are depicted in figure 2.2 by dotted lines. For each website user, a personal shopping basket is saved at server-side, in the session scope of the shared repository. The personal basket is created the first time a shopping item is added to the basket. It is used by three different servlets, entitled in the figure as process add to basket, display new basket and prepare basket order. Furthermore, in order to prevent race conditions, the composed application needs also extra synchronisation support for the shopping basket (figure 2.3(a)), since this data item is stored on the session repository and is thus available to all servlets processing requests of the same web user. A simple synchronisation requirement might be: no other servlet may update the shared item basket between the write interaction of the process add to basket and the read interaction of the display new basket.
14
Indirect Data Sharing in Loosely-coupled Component Systems
Within the payment service, two dataflow dependencies are present within the scope of the request. Between the first and third servlet, the order information (constructed from the current shopping basket) is shared for further processing. The second and fourth servlet share the result of the payment transaction. Due to the separate development of the servlets (e.g. the payment component is typically a third-party component), a conflict in the naming of the shared data item exists in this particular composition. This results not only in retrieving unintended data from the repository, but also downcasting this data item to the expected data type will lead to a run-time exception. Therefore, also extra support is needed to prevent the name and type clash (figure 2.3(b)). Two other dataflow dependencies are depicted in figure 2.2. In the first service, the item object encapsulates the necessary item data for further processing by the process add to basket servlet. For the search command, a result object represents the output of the search query and is used for displaying the results to the user. In both dataflow dependencies, the data items are shared on a per-request base (i.e. in the request scope of the shared repository).
(a) synchronisation
(c) logical separation
(b) name and type clash
(d) untrusted component
Figure 2.3: Additional constraints on the data flow dependencies Moreover, two additional, more general constraints may be expressed on the dataflows of this example. Firstly, although there is only one shared repository (with five scope levels), the three services can be grouped into two separate parts, each requiring a (logically) separate shared repository (figure 2.3(c)). In this way, the system ensures that there is no dataflow interference possible between the
2.4 Case Study 2: Component-based Protocol Stacks
15
two separate parts. This feature is particularly useful in case only a part of the application is actually examined. Secondly, since the third-party search servlet can’t be trusted completely, all unexpected access from the third-party search servlet to the shared repository could be prevented in every possible scenario (figure 2.3(d)). 2.3.2.2
Client/server interactions
A web application is triggered by the client who sends a web request for a certain URL to which the server-side web application will reply. The interactions between the client and the web server are typically nondeterministic, since the web user can decide which links to follow next within the web application. As a consequence, the protocol between a web client and a given web application is very open. This nondeterministic sequence of web requests makes analysis of the dataflow dependencies in the session, context or application scope much harder. In contrast to dataflow dependencies in the request scope, the protocol history determines here whether or not a dataflow dependency is satisfied. In fact, satisfying the intended dataflow dependencies in such a reactive system imposes extra constraints on the client/server interaction protocol. In an e-commerce application for example, it makes little or no sense to contact the payment service before items are added to the shopping basket. In particular, retrieving a non-existing shopping basket from the shared session repository leads in this application to a run-time error for the prepare basket order. Thus, to satisfy this particular dataflow dependency, the client/server protocol must be constrained to only allow the pay command after items are added to the shopping basket.
2.4
Case Study 2: Component-based Protocol Stacks
This section examines interactions with a shared data repository in the componentbased protocol stack technology DiPS+. After a short introduction to protocol stacks in subsection 2.4.1, the component framework DiPS+ is summarised in subsection 2.4.2. Next, the functionality and data dependencies in two layers are analysed: subsection 2.4.3 examines the IPv4 layer and subsection 2.4.4 covers the development process of the DiPS+ RADIUS authentication and accounting layer.
2.4.1
Protocol Stacks
Computer networks became omnipresent in the last 15 years. Nowadays the majority of electronic devices are interconnected, and some of them even have different communication media available. To enable network communication between devices, a small piece of system software (i.e. the protocol stack) resides on each device. Computer networks (and accordingly also protocol stacks) are typically
16
Indirect Data Sharing in Loosely-coupled Component Systems
organised in a layered way to reduce their design complexity. Each layer or protocol is assigned one or more responsibilities of the network communication and the different layers are stacked together into a protocol stack. Each layer provides services to higher layers, while abstracting lower layers and implementation details [117]. The Open Systems Interconnection (OSI) reference model started as a standardisation effort in 1982 and is considered as the primary architectural model for inter-computer communications [24]. The conceptual model is composed of seven layers, each with their particular function (figure 2.4(a)). The more wide-spread TCP/IP model [57, 58] originates from ARPANET in 1970 as a more pragmatic approach to networking. This model consists of four layers and is the de facto standard on the Internet (figure 2.4(b)).
(a) OSI reference model
(b) TCP/IP protocol suite
Figure 2.4: OSI reference model versus the TCP/IP protocol suite Recent advances in network technology tend to stretch the dynamics of computer networks. For example, modern protocol stacks have to cope with the very dynamic nature of networks such as active networks [118] and ad-hoc networks [51]. Their typical requirements ask for both composition-time and run-time flexibility in the protocol stack. In the next subsection, we will investigate DiPS+, a reconfigurable, component-based protocol stack framework. Some other examples of configurable, modular protocol stack solutions are Click [73], Cactus [127] and SEDA [126].
2.4.2
DiPS+ Rapid Prototyping Infrastructure
DiPS+ [39, 81] is a rapid prototyping infrastructure for building reconfigurable, component-based system software such as protocol stacks, file systems and device drivers. The main goal of DiPS+ is to support reuse and adaptation at composition time [60] as well as at run time [67, 82, 83]. The underlying software architecture of
2.4 Case Study 2: Component-based Protocol Stacks
17
DiPS+ applications is a combination of a pipe-and-filter composition with shared data repositories [85]. A DiPS+ protocol stack consists of several protocol layers, each with an upgoing, a down-going and sometimes a forwarding path. A path is a pipe-and-filter composition of loosely coupled functional units, coupled by means of connectors (figure 2.5).
Figure 2.5: A protocol stack in DiPS Within DiPS+, network packets are modelled as first class entities, transported along the paths. In DiPS+ packets are processed sequentially by the different components on the path: after a component completes its processing of the packet, the packet is transported to the next component on the path. Nevertheless, concurrent processing in DiPS+ exists since it is allowed to have more than one packet to be processed simultaneously. This concurrency is obviously important if the protocol stack is composed of components with blocking operations. To enable collaborative data sharing between functional units, a shared data repository is attached to each network packet, as illustrated in figure 2.6. This data repository is used to attach meta information to the packet, which can be used by components further along the path. For example within a protocol layer, a HeaderParser parses the raw data header of the packet, and adds the extracted header information to the packet’s shared data repository. This meta information can be used by downstream components in the path to customise the processing of the packet.
2.4.3
A Simplified DiPS+ IPv4 Layer
IPv4 is the network layer protocol of the TCP/IP protocol suite [57]. This protocol offers a best-effort, packet-switched communication service between two endpoints.
18
Indirect Data Sharing in Loosely-coupled Component Systems
Figure 2.6: Shared repositories in DiPS+ The endpoints may reside in different LANs, and the IPv4 protocol handles both the addressing of these endpoints, as well the routing of IP packets in between. 2.4.3.1
Implementation of the DiPS+ IPv4 Layer
Figure 2.7 shows a simplified DiPS+ version of the IPv4 protocol. As in most network protocol layers, three flows can be identified: an upgoing flow representing the protocol’s activity on incoming packets arriving from the network, a downgoing flow for the packets coming from the application that should be put on the network, and a forwarding flow for packets coming from the network that should result in sending out a new packet on the network.
Figure 2.7: A simplified IPv4 protocol layer The upgoing path consists of the IPv4HeaderParser and InRouter component. The IPv4HeaderParser parses the binary IPv4 header from the packet and puts the parsed data (in this example simplified to the IPv4 source and destination address) on the shared repository of the packet for further usage by the other components. The InRouter dispatching component has two output ports. Based on the locality of the destination address, the packet is further propagated on the upgoing path for processing by the upper layer or selected for the forwarding path.
2.4 Case Study 2: Component-based Protocol Stacks
19
In the downgoing path the OutRouter component is followed by the IPv4HeaderConstructor component. The OutRouter component decides, based on the destination address of the packet, which network interface and corresponding destination address should be used by the datalink layer for sending out the packet. The IPv4HeaderConstructor creates a binary IPv4 header, based on the header fields information attached to the shared repository. The IPv4Forwarder component is the only component within the forwarding path. This component encapsulates the binary data from the incoming packet in a new packet, and copies both the source and destination address between the data repositories of the incoming and the outgoing packet. After forwarding the newly constructed packet to the OutRouter component of the downgoing path, the incoming packet is discarded. By doing so, other meta information (e.g. attached by components in the datalink layer) is discarded. 2.4.3.2
Indirect Data Sharing
Similar to the servlet-based e-commerce site, several data dependencies can be identified between the different components. In DiPS+, these dependencies are solely on a per-packet basis, since each packet has its own shared data repository. The interactions with the shared data repository are depicted in figure 2.8. The source and destination address parsed by the IPv4HeaderParser component are the basis for the first two dataflow dependencies. The data item src addr is provided by the IPv4HeaderParser component and is used by the IPv4Forwarder for specifying the source address of the newly constructed packet. Similarly, the InRouter and IPv4Forwarder component use the data item dst addr, provided by the IPv4HeaderParser ; the InRouter for the locality check and the IPv4Forwarder for the destination address of the new packet. In the downgoing path, similar dataflow dependencies are found. Both the IPv4Forwarder component as the upperlayer provide data items src addr and dst addr. The IPv4HeaderConstructor uses the data items src addr and dst addr to create the IPv4 header. The OutRouter component selects the outgoing interface and datalink destination based on the dst addr data item. Two remaining data dependencies exist between the OutRouter component and the underlying layer. The data item iface out is the network interface selected by the OutRouter component for sending out the packet; the data item datalink dest is the corresponding datalink layer destination address. In addition, two extra concerns arise within this DiPS+ case study. Firstly, notice that the above presented data dependencies occur in two separate data repositories. Up to the IPv4Forwarder the shared repository of the incoming packet is used. Once the forward packet is generated in the IPv4Forwarder component, the data interactions happen on the data repository of the newly created packet. This is also the reason why in figure 2.8 the dependencies for data items src addr and dst addr are split up in respectively dependencies 1 and 3, and 2
20
Indirect Data Sharing in Loosely-coupled Component Systems
Figure 2.8: Dataflow dependencies in the IPv4 protocol layer and 4. Although the same key is used to identify these data items, the data items reside on different repositories. Secondly, the following scenario illustrates that although a component’s shared data interaction may be fixed, the data dependencies with other components in a composition are composition-specific. In dynamic network environments, the protocol stack must be an open software system, prepared for composition-time or run-time adaptation. Adapting the protocol stack by changing the composition should however not break the data dependencies present within the system. In the case of our IPv4 layer for example, it is a realistic scenario that the current composition has to be adapted towards other standards such as MobileIP [95], IPv6 [64] or IPSec [122], even at run time. In figure 2.9, a DiPS+ composition of the IPSec protocol is shown2 . The grey boxes represent existing components from the IPv4 layer; the white boxes contain newly introduced functionality. What is important in the context of this case study, is that the fact that several components are reused in different compositions and may share data with other components than the ones of the original composition. Hence, for each composition, the composer needs to bear in mind all the interactions with the shared repository, and needs to ensure that each component can read the data it needs from the repository. To cope with varying environmental requirements, DiPS+ can handle both composition-time and run-time (re)configuration. S¸ora et al. provide automatically building support of client-customised DiPS+ protocol stacks based on highlevel functional properties of the individual components [109]. 2A
full discussion of the IPSec protocol and its implementation in DiPS+ can be found in [122].
2.4 Case Study 2: Component-based Protocol Stacks
21
Figure 2.9: The IPSec protocol layer In case of run-time reconfiguration, the NeCoMan middleware platform on top of DiPS+/CuPS provides safe, distributed-service adaptations by invoking primitive composition operations in term of creating, removing, linking, and unlinking components [66, 67]. Both approaches provide extra flexibility to DiPS+, but they do no take into account the indirect data sharing dependencies between components at this moment. Hence, reconfiguration of components can lead to run-time errors due to broken data dependencies.
2.4.4
Development of the DiPS+ Radius Layer
RADIUS (Remote Authentication Dial-in User Service) is a client-server security service running on top of the UDP transport protocol [103, 102]. It controls authentication (verifying user name and password), accounting (logging relevant user activities), and authorisation (access-control) in a distributed multi-user environment. For example, upon an ADSL network login, the dial-up server delegates login information to the RADIUS server of the Internet Service Provider (ISP) to authenticate the user, to authorise access to the ISP system and to provide the client the necessary network configuration (such as the local IP address to use on the Internet). As part of accounting services, RADIUS logs the start and stop of user sessions in a database, typically for billing or statistical purposes.
22
Indirect Data Sharing in Loosely-coupled Component Systems
2.4.4.1
Implementation of the DiPS+ RADIUS Layer
Figure 2.10 shows the high-level DiPS+ design of RADIUS authentication (left side), and accounting (on the right). Following the RADIUS specification, the accounting layer accepts packets via UDP port 1812, the accounting layer is connected via UDP port 1813. The parsing components (RadiusHeaderParser and RadiusAttributeParser ) interpret the header and the list of attributes of an incoming RADIUS packet. The header and attribute constructors (RadiusHeaderConstructor and RadiusAttributeConstructor ) create the RADIUS header and the list of attributes of an outgoing RADIUS response packet. The NASChecker verifies the originator of the request and drops the packet in case the Network Access Server (NAS) is unknown. The parsing and constructing components, as well as the NASChecker, are generic and are reused in both the authentication and the accounting layer.
Figure 2.10: Design of RADIUS authentication and accounting in DiPS+. The authentication layer consists of four extra components specifically for authentication. The Authenticator component looks up the user’s password and the AccessRequestAuthenticatorChecker compares this password with the hashed password that is encapsulated in the authentication request. Finally, a positive or negative authentication response is prepared by the AccessAcceptReplyPreparator or the AccessRejectReplyPreparator. The former looks up user-specific configuration information (e.g. the IP address) to respond to the client. The accounting layer consists of three accounting-specific components. The AccountingRequestAuthenticatorChecker verifies the integrity of an incoming accounting request. The Accounter logs the information in the accounting request to the database and the AccountingReplyPreparator creates a response packet and attaches the necessary meta-information. 2.4.4.2
Indirect Data Sharing
Figure 2.11(a) illustrates the usage of the shared repository in the DiPS+ RADIUS accounting layer. For each incoming request, twelve data items are read and
2.4 Case Study 2: Component-based Protocol Stacks
23
written, resulting in 31 interactions with the shared data repository. Data items 1 to 9 are stored on the repository of the incoming request; items 10 to 12 are stored on the repository of the newly created response packet. For each data item, the according data type is listed below figure 2.11(a). 2.4.4.3
Composition Problems During the RADIUS Development
The DiPS+ RADIUS authentication and accounting layers were developed at the DistriNet research group by my colleague Sam Michiels and myself. To coordinate the design and implementation, we had regular whiteboard meetings and we used CVS [43] as version control system to share our code. The total development of DiPS+ RADIUS took about two months of part-time work. Several components were developed independently, and since no shared documentation was created of the interactions with the shared repository (except for some rough whiteboard drawings), the composition of the different components into the two layers did not go smoothly. Figure 2.11(b) shows a snapshot of our RADIUS accounting implementation on June 30th, 2003, right before one of the first composition attempts was made. At that moment, the components implemented their basic functionality, most of the interactions with the shared repository were already coded and no compilation errors were found. Nevertheless, three different composition problems can be identified within this CVS snapshot. Firstly, a mismatch occurs between the data items written to the repository by the underlying layer (data item 1’ and 2’), and the items read by the AccountingReplyPreparator (data item 1 and 2). The first items have identifiers udp.senderIP and udp.senderPort, while the latter ones are identified by udp.sourceIP and udp.sourcePort. This mismatch is clearly caused by the independent development of both components and the limited documentation. Secondly, a data type mismatch occurs for the radius.authenticator (data item 7). The component AccountingRequestAuthenticatorchecker expects to read out a byte[], whereas the AccountingReplyPreparator expects a byte value. Further investigation reveals that a logical mistake causes the type mismatch: instead of reading the radius.authenticator (data item 7), the AccountingReplyPreparator needs to read the radius.identifier (data item 6), which is indeed of the expected type byte. Finally, a typographical error explains the use of the identifier radius.requestAthenticator (data item 10’) instead of radius.requestAuthenticator (data item 10). However, besides the typographical error again a logical mistake occurs in this read interaction: further investigation reveals that AccountingReplyPreparator should read the radius.authenticator of the incoming packet instead of the radius.requestAthenticator. Notice that while all three problems reside in the implementation of the CVS snapshot, no compilation error was found. In other words, all these problems pass
24
Indirect Data Sharing in Loosely-coupled Component Systems
(a) Radius data dependencies (final composition) 1 2 3 4 5 6
shared data item udp.sourceIP udp.sourcePort radius.nasSecret radius.radiusPacket radius.code radius.identifier 1’ 2’ 10’
data type InetAddress Integer byte[] byte[] byte byte
7 8 9 10 11 12
shared data item radius.authenticator radius.attributeList radius.replyAttributeList radius.requestAuthenticator udp.destinationIP udp.destinationPort
shared data item udp.senderIP udp.senderPort radius.requestAthenticator
data type byte[] AttributeList AttributeList byte[] InetAddress Integer
data type InetAddress Integer byte[]
(b) Radius data dependencies (CVS snapshot)
Figure 2.11: Data dependencies in the DiPS+ Radius layer
2.5 Goal and Scope of this Work
25
the compilation phase without any problem and lead to run-time errors, such as a ClassCastException or a NullPointerException. Furthermore, it is worth to mention that none of the shared data interactions was documented for the components in the DiPS+ RADIUS layer, as is the case for all layers developed in DiPS+. In case some insight is needed in the shared data interactions of a component, typically a simple regular expression on the source code is applied by the DiPS+ developers in an ad-hoc manner.
2.5
Goal and Scope of this Work
The high-level goal of this thesis is to reduce the risk of run-time problems in data-centered applications caused by broken data dependencies. Obviously, this restricts the application scope to software systems with the following characteristics: • A loosely-coupled component system, in which components are reused in different compositions. • An application with indirect data sharing through a shared data repository. • Both deterministic as well as reactive software systems are within the scope of this thesis. To keep focus, only a subset of the problems identified in the two case studies is handled in this thesis. In particular, we define the following desired composition property for indirect data sharing in data-centered applications. The goal of this thesis is then to eliminate certain types of run-time errors (such as a NullPointerException or a ClassCastException) by giving a formal guarantee that the following desired composition property is not violated in a given composition. No broken data dependencies: A shared data item is only read after being written. For each shared data read interaction, the shared data item that is already written on the repository is of the type expected by the read operation. In this composition property, we are no longer concerned about the dataflow as a connection between writers and readers of a particular data item, but instead we define the correctness of the data sharing in term of what is expected by a component whenever it tries to read a data item from the shared repository. In addition, we also investigate the control flow of an application, since the control flow determines which components are executed in which order, and consequently also which shared data interactions occur in which order. Projected on the DiPS+ case study, guaranteeing the desired composition property implies that all the identified dataflows dependencies in the IPv4 and RADIUS
26
Indirect Data Sharing in Loosely-coupled Component Systems
layer are satisfied. In contrast, the three problems identified in the CVS snapshot of the RADIUS layer (i.e. the identifier mismatch, the typological error and the type mismatch) all violate the desired composition property and hence the static detection of all three problems is in the scope of this thesis. For the servlet-based case study, the basic dataflows in the composition are satisfied by guaranteeing the desired composition property. This also includes the dataflows in the shared repository on the session scope. Because of the indeterministic or reactive nature of the client-server interactions in web applications, this implies that the composition property also needs to hold in reactive systems in which the sequence of interactions with the shared repository is not statically known. However, the extra dataflow constraints identified in the servlet-based case study (such as the logical separation and the handling of untrusted components) are not further investigated in this thesis and are considered out of scope. Reducing certain types of run-time errors by formally verifying that a given composition does not violate the desired composition property certainly improves the reliability of the software composition, but in order to be really useful to the software composer, the following extra-functional criteria are important: Reasonable overhead In order to be applicable, the introduced overhead for the software developer and software composer must be minimal, both in terms of additional workload and verification time. In addition, the solution must be easy to understand for mainstream developers and software composers. The less overhead and complexity the solution introduces, the more likely that the proposed solution will actually be adopted. Applicable to real-life applications The proposed solution may not be limited to toy examples, but the proposed solution must also be more generally applicable to larger, real-life applications. This includes among others that the proposed solution is scalable to larger software projects and that the solution is not limited to a specific choice of program language or software framework. Instead a rather technology neutral solution is preferred, which then can be easily adopted in different software platforms.
2.6
Related Work
The identified problems with indirect data sharing in data-centered applications relate to different research domains. Particularly interesting are the architecturalbased approaches to describe and analyse dependencies between architectural components. Software Architecture. Since more than a decade, software architectures are used to abstract reasoning about software systems from source code level towards
2.6 Related Work
27
coarse-grained architectural elements and their interconnections. By using software architectures and reasoning at the architectural level of software systems, developers try to reduce production costs by capturing commonalities between closely related product families, and try to guarantee the requirements and characteristics by embedding early design decisions within the overall structure and by introducing several constraints on the elements and their interconnections [96, 107, 12, 80]. Architectural Styles. Architectural styles try to abstract reoccurring patterns of components, connectors and behavioural interactions within different software architectures. Architectural styles highlight a specific aspect of a software architecture, and try to capture the advantages or main characteristics for the software system ensured by using a particular architectural style, as well as the constraints introduced by the style. In [107], Shaw and Garlan proposed a taxonomy of different architectural styles. Top-level classes within this classification are dataflow systems, call-andreturn systems, independent components, virtual machines, data-centered systems (or repositories). The problem of composition-time dependencies, stated in this chapter, mainly exists in architectural styles with implicit coupling between components. This implicit coupling between components can occur both within the dataflow (in data-centered systems, as presented in this chapter) as well as in the control flow (as is the case in event-based systems). Architecture Description Languages. To support architecture-based reasoning, (semi-)formal modelling notations and analysis techniques are required. Several Architecture Description Languages (ADLs) are proposed for architectural specification, ranging from semi-formal diagrams with boxes and lines to formal notations. The ADLs are bundled with tools for analysis and development, both for modelling architectures for a specific application domain as for modelling generalpurpose architectures. Although these ADLs strongly vary in the abstractions and analysis capabilities they provide, most ADLs explicitly provide abstractions for components, connectors and their behavioural interactions, as well as tool support for analysis and architecture-based development [18, 80]. xADL is one particular Architecture Description Language, aimed to be an open ADL [22, 23]. This ADL is an XML-based specification language, with a minimal set of architectural entities (such as components and connectors), specified by means of an XML schema. This minimal list of entities is easily extendable with custom-made XML schemas. A range of extensions schema’s are already developed in different research groups to fit the ADL to particular application domains, system requirements and analysis purposes. Dependency Analysis. The formal description of a software architecture is used for precisely specifying software systems and for analysing system properties.
28
Indirect Data Sharing in Loosely-coupled Component Systems
Several architecture analysis techniques have already been developed to detect problems such as deadlock and component mismatch [59, 89, 7, 111]. A first class of approaches proposes dependency models for specifying and analysing inter-component dependencies. In [123], Vieira et al. provide mechanisms for (1) describing dependencies of an individual component by using declarative XML descriptions and (2) analysing the influence of those dependencies in compositions. In order to analyse dependencies between components, Li proposes a matrix-based approach in [78]. With this approach, also indirect dependencies (such as the transitivity of dependencies) can be captured. Although both approaches propose a rough classification of dependencies, no explicit support for composition-time dependencies such as the dataflow dependencies in data-centered applications is provided. A second class of papers recognises the importance of modelling implicit, composition-time dependencies. Magee and Kramer state in [79] that components may require external services, without knowing the service provider during specification, implementation or testing. Since this context independence permits reuse of components during construction and run-time replacement, the authors discuss some general approaches to dynamic architecture description. In their declarative ADL Darwin for distributed software architectures, they model implicit event invocation with an underlying π-calculus. In [111], Stafford and Wolf recognise implicit dataflow as a dependency problem. The authors state that dependency analysis can answer the question with which other components a particular component is communicating, if the component is communicating with a shared repository. Dellarocas proposes a handbook for guiding the integration of software components [25]. The author argues that current languages and tools have several shortcomings in providing integration support. In most cases a discontinuity exists between the architectural model and the actual implementation model, making outcomes of architectural analysis meaningless. Also, the often undocumented assumptions about the components on the one hand, and the structure and interactions of the target component environment on the other hand impose a hard burden upon reuse. Furthermore, in giving an overview of the dependencies that should be included in the handbook, the author identifies (among others) flow and sharing dependencies in the taxonomy of software interconnections. The dataflow dependencies in data-centered software systems match both dependency classes of the presented taxonomy. ArchJava [6] offers a unique binding between architectural description and actual implementation. Architectural styles and constraints could be expressed within language constructs of the actual program. Byte-code compiled with the ArchJava compiler is guaranteed to comply with the expressed architectural constraints, as such preserving the intended correctness. Although ArchJava is able to enforce architectural constraints independently for each component instance within a composition, all architectural constraints
2.7 Conclusion
29
are hard-coded within the components. Currently no support exists in ArchJava to express and enforce composition-specific dependencies and constraints, such as the dataflow dependencies with data-centered repositories. Providing such compositional support would make software components much more reusable in different contexts and compositions. Web Frameworks. Also other efforts in the web development community, such as the Struts framework [112], aim at providing extra support for building reliable and reconfigurable servlet-based applications. To our knowledge however, no project currently offers support to manage the dataflow dependencies in datacentered web applications. Struts provides a declarative web form description, including the declarative composition of the form processing actions. These actions are servlet-alike components, chained together to perform the server-side processing, quite similar to the chaining of servlets into services as presented in our first case study. Since this framework only provides an extra abstraction layer and useful tool support upon the JavaServlet technology, the desired composition property formulated in this chapter also holds for the shared data repository in Struts, and therefore formally guaranteeing this property is complementary to the Struts framework, as will be further investigated in chapters 3 and 4.
2.7
Conclusion
In this chapter, one particular enabler for loosely-coupled component systems is investigated: the indirect data sharing in data-centered applications. By means of case studies in two different application domain (i.e. component-based web applications and component-based protocol stacks), reoccurring problems with indirect data sharing are identified, based on extensive hands-on experiences within the presented component technologies. Inspired by some common problems in the presented case studies, a desired composition property is formulated, which data-centered software compositions should not violate. The goal of this thesis is then to eliminate certain types of run-time errors by giving a formal guarantee that the desired composition property is not violated in a given composition. The rest of this thesis is structured as follows. To illustrate the importance of a good dependency management in data-centered applications, an in-depth dependency analysis of the medium-size webmail application GatorMail is presented in chapter 3. Next, static verification is used to formally guarantee the desired composition property in both deterministic (chapter 4) and reactive software systems (chapter 5). Finally, chapter 6 summarises the main contributions of this thesis and identifies some challenging tracks for future research.
30
Indirect Data Sharing in Loosely-coupled Component Systems
Chapter 3
Dependency Analysis of the GatorMail Webmail Application The software engineering research community lacks experience reports on the typical composition problems in nowadays software technologies and on the complexity of dependency management in large-scale software systems. Creating a better understanding of dependencies in software compositions is the first step to come to a better management of dependencies and to achieve more reliable software compositions. In this chapter, we illustrate the complexity of inter-component dependencies in loosely-coupled software systems by exploring the dependencies in an existing component-based webmail application, GatorMail. We identify four types of dependencies in the GatorMail webmail application, resulting in more that 2000 data interactions and control flow transitions. This chapter contains the following technical report [36]: Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Dependency Analysis of the Gatormail Webmail Application. Report CW 427, Department of Computer Science, K.U.Leuven, Leuven, Belgium, September 2005. Next to some minor editorial changes made to the original report, some background information is moved to section 2.3 and the appendices of the original report (containing 147 pages of more detailed analysis results) are not repeated in this thesis. 31
32
3.1
Dependency Analysis of the GatorMail Webmail Application
Introduction
Nowadays, software systems are evolving towards modular composed applications, in which existing, loosely-coupled software components are reused in new compositions. These loosely-coupled components have typically very few dependencies on other components at the syntactical level, in order to better promote reuse in different compositions and applications. In practice however, reusable software components tend to have quite often a set of dependencies on other components in software systems at the semantical level. In most cases, these dependencies are even not well documented, and remain completely implicit in the software system. In order to achieve a reliable functioning of the software system, the software composer needs to take care of all those inter-component dependencies within the software system and make sure they are satisfied. In this chapter, we illustrate the complexity of inter-component dependencies in component-based software systems by exploring the dependencies in an existing component-based webmail application, GatorMail. Both explicit and implicit dependencies are taken into account in this case study. Creating a better understanding of dependencies in software compositions is the first step to come to a better management of dependencies and to achieve more reliable software compositions. The remainder of this chapter is structured as follows. Section 3.2 introduces the webmail application GatorMail and briefly explains the underlying Struts Framework. In section 3.3, the dependencies within the GatorMail application are explored. Firstly, four different types of dependencies are identified within the case study. Secondly, for each type, a heuristic approach is presented to reveal all dependencies based upon the source code and configuration files of the webmail application. Finally, an abstract model of the webmail application is constructed with respect to the identified dependencies. The results of the dependency exploration are discussed in section 3.4. Section 3.5 summarizes this heuristic dependency study of the GatorMail webmail application. More detailed results of the dependency study are included in appendix A to F of the original report [36].
3.2
GatorMail
GatorMail [40] is an open-source webmail application built on the Struts framework [112], using the Java Servlet and JavaServer Pages technologies. It was originally developed to meet the needs of the University of Florida [1]. The project has 4 active developers (Drake Emko, Sandy McArthur, Todd Williams, Stephen L. Ulmer), and is housed on the SourceForge development platform [110]. As in most webmail systems, the system’s functionality can be divided in five
3.2 GatorMail
33
subsystems: authentication, folder actions, message actions, addressbook actions and settings. • The authentication subsystem allows users to log in and to log off of the webmail service using password credentials. • The folder subsystem consist of actions for creating, removing and modifying mail folders as well as actions for listing messages in a mail folder. Folder modifications include renaming and changing the subscription status of the folder. • The message subsystem enables retrieving and displaying messages (including attachments) as raw messages, or in a standard or printer-friendly layout. Messages can be deleted, moved or copied to another mail folder. Besides message housekeeping, also new outgoing messages, replies to previous messages or message forwards can be composed and sent out. • The addressbook subsystem allows users to add, remove and modify mail contacts in their addressbook. When composing a new message, the appropriate contact(s) can be selected from the addressbook. • The settings subsystem enables the setting and modifying of user preferences. Typical preferences are account information (such as the mail signature and the preferred from and reply-to header fields) and user interface settings. The GatorMail software project consists of about 14.350 lines of Java code, combined with about 6.100 lines of JSP formatting. This results in 36 different Struts action elements and 29 JSP views. These actions and views are frequently reused in 52 different compositions, each representing the processing logic for one URL of the web application. Subsection 3.2.1 gives a brief introduction to the Struts Framework. Subsection 3.2.2 illustrates how components and views are composed in GatorMail using this technology.
3.2.1
The Struts Framework
Apache Struts is an open-source application framework on top of the Java Servlets and JavaServer Pages technologies. Struts encourages developers to use the JavaServer Pages Model 2 architecture, which is a variation on the Model-View-Controller design pattern for building web applications. In a Struts application (illustrated in figure 3.1), incoming HTTP requests are encapsulated in HTTPServletRequest objects and dispatched to the Struts’ ActionServlet. This ActionServlet is the Controller of the Struts application. All input parameters are encapsulated in an ActionForm data container and additional input validation checks can be performed. Next, according to the requested URL,
34
Dependency Analysis of the GatorMail Webmail Application
an appropriate action is selected and the HTTPServletRequest (Req in figure 3.1) and ActionForm are given to this action for further processing. An action interacts with the Model and fetches the necessary data for the View. After processing the request, an ActionForward object(AF in figure 3.1) is returned to the ActionServlet, indicating which action or view has to be processed next. This recursion continues until a JSP view is reached and output is sent back to the web browser.
Req
Action
ActionServlet AF HTTP request JSP Req Web browser Web application
Figure 3.1: Request processing in Struts Instead of using several servlets for the different functional blocks, Struts deploys only one standard servlet, the ActionServlet, in combination with several actions. Actions resemble to Java Servlets in the way that they both process a HTTPServletRequest and are able to use the associated shared data repositories. New actions can be easily implemented by extending the org.apache.struts.action.Action class. Actions return an ActionForward object to the ActionServlet after processing the request. In order to achieve reusable actions, an extra forward indirection is used in Struts. Actions use logical names to identify forwards, whereas the Struts configuration file (which is specific for each configuration) specifies the declarative mapping between logical forwards and actual forwards. In this way, the logical names are mapped to actual forwards at run time using the ActionMapping class. ActionForms are created by extending the org.apache.struts.action.ActionForm class. ActionForms are data containers and are populated with the input parameters of the given request, such as the input fields of a web form. Before they are given to an action, they are validated. Instead of coding ActionForms for each web form, also a DynaActionForm can be used. This is a generic ActionForm that can be configured declaratively in the Struts configuration file.
3.2.2
Composition Example
In order to clarify the different technologies used in GatorMail, a small composition example from the webmail application is now explained in more detail.
3.2 GatorMail
35
In GatorMail, each user can configure some webmail preferences, such as his default name and reply-to address, his mail signature and his threshold for junkmail. Saving these preferences is done by filling in a preferences web form (preferences.jsp) and submitting the fields to the /preferences.do URL. The server-side processing composition is shown in figure 3.2. Firstly, the ActionServlet dispatches the request to the PreferencesAction. Secondly, the PreferencesAction return either an input or success forward, both resulting in forwarding the request to the JSP view preferences.jsp. Finally, the output of the JSP page is sent back to the user. In figure 3.2, the rounded rectangles represent the Struts actions and JSP views. The black arrows between the rectangles express the control flow in the composition. When a control flow transition is the result of a forward, the transition is labelled with the logical forward name (or forward names in case several forwards result in the same transition). control flow
/preferences PreferencesAction
url: /preferences.do
input/ success
Struts Action
/preferences.jsp
JSP View
Figure 3.2: Composition example in Struts Listing 3.1 shows the according Struts configuration with three sections: the form beans, the global forwards and the action mappings. Listing 3.1: Struts configuration file: struts-config.xml 1 2 3 4 5 6 7 8 9 10 11 12
<struts−config>
36 13 14 15 16 17 18 19 20 21 22
Dependency Analysis of the GatorMail Webmail Application
In the form beans section, one ActionForm is defined, namely the preferencesForm. This ActionForm is a data container (conform the JavaBean specification) with appropriate getters and setters for the fields of the preferences web form (username, replyTo, signature, junkThreshold, . . . ). Next to getters and setters, also a reset and a validate method are implemented to clear the web form and to validate the input parameters on submission. A shrunk implementation of this PreferencesForm is shown in listing 3.2. The Struts configuration describes two types of forwards: global forwards and local forwards. Global forwards can be used by all actions and are listed in the global forwards section. Local forwards are action-specific and are part of the action description. These local forwards either introduce a new forward or override the global forward for this action only. The action mappings section lists the different actions of the application. Each action is described by means of the action implementation, a name and a path on which the action has to be deployed. Also an ActionForm can be requested, together with the scope of the data repository where the ActionForm can be retrieved. The validate parameter indicates whether or not the ActionForm should be validated, and in case the input fields for the ActionForm are missing or the validation fails, the logical forward described by the input attribute is automatically followed. Listings 3.3 and 3.4 illustrate with simplified code fragments how the PreferencesAction and preferences.jsp view are implemented. Listing 3.2: Implementation of an ActionForm: PreferencesForm.java 1 2 3 4 5 6 7 8 9
public class PreferencesForm extends ActionForm { private String replyTo; private String signature; // reset and validate method public void reset(final ActionMapping actionMapping, final HttpServletRequest request) { super.reset(actionMapping, request); setReplyTo(null); setSignature(null);
3.2 GatorMail 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
37
} public ActionErrors validate(final ActionMapping actionMapping, final HttpServletRequest request) { ActionErrors errors = super.validate(actionMapping, request); if ( errors == null) { errors = new ActionErrors(); } final String replyTo = getReplyTo(); if (replyTo != null && replyTo.length() > 0) { try { new InternetAddress(replyTo).validate(); } catch (AddressException ae) { errors .add("replyTo", new ActionError("preferences.replyTo.invalid", ae.getMessage()) ); } } return errors; } // getters and setters public String getReplyTo() { return replyTo; } public void setReplyTo(final String replyTo) { this.replyTo = replyTo; } public String getSignature() { return signature; } public void setSignature(final String signature) { this.signature = signature; } }
Listing 3.3: Implementation of an Action: PreferencesAction.java 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
public final class PreferencesAction extends Action { public final ActionForward execute(final ActionMapping mapping, final ActionForm form, final HttpServletRequest request, final HttpServletResponse response) throws Exception { ActionsUtil.checkSession(request ); final HttpSession session = request.getSession (); final PreferencesForm prefsForm = (PreferencesForm)form; final User user = Util.getUser(session ); final Properties prefs = (PreferencesProvider)getServlet (). getServletContext(). getAttribute(Constants.PREFERENCES PROVIDER).getPreferences(user, session); // Update preferences from the form bean. final String replyTo = prefsForm.getReplyTo(); if (replyTo == null && prefs.getProperty("compose.replyTo") != null) { prefs .remove("compose.replyTo"); } else if (replyTo != null && !replyTo.equals( prefs .getProperty("compose.replyTo"))) { prefs .setProperty("compose.replyTo", replyTo); } final String signature = prefsForm.getSignature(); if (signature == null && prefs.getProperty("compose.signature") != null) { prefs .remove("compose.signature");
38 22 23 24 25 26 27 28
Dependency Analysis of the GatorMail Webmail Application } else if (signature != null && ! signature .equals( prefs .getProperty("compose.signature"))) { prefs .setProperty("compose.signature", signature); } return mapping.findForward("success"); }
}
Listing 3.4: Implementation of a JSP view: preferences.jsp 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
<%@page contentType="text/html" import="java.util.List, edu.ufl.osg.webmail.util.Util, edu.ufl.osg.webmail.Constants, java.util.ArrayList"%> <%@taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%> <%@taglib uri="/tags/struts-html" prefix="html"%> <%@taglib uri="/tags/struts-bean" prefix="bean"%> <%@taglib uri="/tags/webmail" prefix="wm"%>
3.3
Dependency Analysis
In this section, firstly the different types of dependencies within GatorMail are explored. Secondly, for each type of dependencies a heuristic approach is presented to reveal these dependencies within the case study. Finally, an abstract model of the webmail application is constructed with respect to the identified dependencies. The results of this dependency exploration will be discussed in section 3.4.
3.3.1
Exploring Dependencies
In exploring different types of dependencies within the GatorMail application, two viewpoints are used: an internal and an external viewpoint. In the internal viewpoint, all dependencies between server-side components are taken into account. The external viewpoint studies the interaction between the user and the application for determining dependencies between the client and the server application. Both viewpoints are now further discussed. 3.3.1.1
Internal Viewpoint
The internal viewpoint explores dependencies within the different server-side compositions for processing the different requests. Hereby, control flow transitions and interactions with a shared repository are distinguished as illustrated in figure 3.2. Control Flow Transitions The control flow of an application is the description how the computational control moves around through a running instance of the application or, in other words, which executing method invokes another method and passes hereby the control to the latter one. Within the server-side processing, each URL consist of a composition of components. The control flow within such a composition is the way control flows passes through the composition, from one action or JSP view to another. In figure 3.2, the control flow is indicated with black arrows between actions or views. The ActionServlet gives control to the PreferencesAction, and after processing
40
Dependency Analysis of the GatorMail Webmail Application
the PreferencesAction forwards control (by means of the forward label input or success) to the preferences.jsp view. In reality, the ActionServlet also manages the control flow during the forwards since each action or view returns control to the ActionServlet after processing the request. In this chapter we abstract the control flow to the dispatching between the different actions and JSP views without the management of the ActionServlet in between. This is a realistic simplification, since the implementation of the ActionServlet does also directly pass the control to the next action or view without any other statement execution. Shared Data Interactions The dataflow describes how data moves through a collection of computations. Dataflow may follow the control flow (e.g. in passing arguments through methods), but dataflow and control flow can have separate paths within an application, as is the case in indirect data sharing. In the GatorMail webmail application, the arguments passed through method invocations are fixed for all components. Servlets (and consequently also JSP pages) have a HTTPServletRequest and HTTPServletResponse parameter in their process method. Struts actions additionally receive an ActionForm and an ActionMapping parameter. What is more interesting in exploring dataflow dependencies between components, are the shared objects that are attached to the HTTPServletRequest. As already said earlier, actions and JSP views can communicate anonymously by means of five shared data repositories. In figure 3.3 three shared repositories are shown for the given composition. In the request scope for example, the PreferencesAction does have both read and write interactions with the repository for the requestStartTime object, whereas later on in the composition the preferences.jsp page does only read the requestStartTime object while processing the request. For each request, the shared data repository on the request scope is newly created (and thus empty) at the start of the request. The repositories on the session and context scope remain for the lifetime of respectively the user session and the application context. Every user has his own shared repository for the session scope, while all users share a common shared repository for the context scope. Not all repository interactions shown in figure 3.3 have to occur in processing a request. The actual set of interactions for a particular request may be a subset of those shown in the figure. If interaction statements for example occur in a conditional block (e.g. if-then-else structure), their execution may among others depend on user’s input parameters, the state of the request or the state of the application. Listing 3.5 illustrates that the PreferencesAction does write the requestStartTime object to the repository only if the object does not exist already.
3.3 Dependency Analysis
41
Figure 3.3: Internal dataflow dependencies in Struts Listing 3.5: Conditional repository interaction if (request.getAttribute("requestStartTime") == null) { request. setAttribute("requestStartTime", new Long(System.currentTimeMillis())); }
3.3.1.2
External Viewpoint
In the external viewpoint, the interaction between the user and the application are explored (figure 3.4). Dependencies between the client and server are split up in the sequence of control flow transitions and data passing between client and server. Control Flow Transitions Similar to other client-server applications, a protocol exists between the user and the web application, expressing which messages in which order can be exchanged (figure 3.4). More specifically for web applications, the order in which URLs can be requested to the server may be constrained: adding an attachment is only meaningful after constructing a new message, and retrieving sensitive information is only allowed after successfully passing the authentication page. In a normal usage of a web application, the responses of the web server contain the possible URLs that can be requested next. In figure 3.5 for example all views
42
Dependency Analysis of the GatorMail Webmail Application
Figure 3.4: Interaction between client and server that have pointers to the /preferences.do URL are showed in rounded rectangles on the left side of the figure. The rectangles on the right side hold all URLs (of the web application) that can be reached by following pointers from the preferences.jsp view (which is part of the /preferences.do composition). Users however can also request bookmarked URLs or choose a URL by entering the URL manually in their browser. So, extra entry points to certain URLs of the application might also be allowed. In this dependency analysis bookmarked links are not taken into account.
Figure 3.5: Protocol transitions for /preferences.do Data passing In the interaction between user and application, also dataflows exist. In requesting
3.3 Dependency Analysis
43
a URL from the application, the browser can send extra input parameters together with the request (figure 3.6). These input parameters could be input fields from a web form, hidden fields in a web page or parameters attached to the URL. For example, when sending the preferences form to the application in order to save the user preferences, eight input parameters are attached to the request. Data is also flowing from the application to the user in the responses, but this dataflow is not considered within this analysis.
Figure 3.6: Data sharing between client and server
3.3.2
Heuristic Identification of Dependencies
To analyse the dependencies, a heuristic approach for finding the different transitions and data interactions in the GatorMail webmail application is described in this subsection. To do so, existing configuration files and tools for matching regular expressions are used. This heuristic approach does not give guarantees about the completeness of the dependencies, but the approach already reveals most of the existing dependencies within GatorMail. Similar or even better results can also be achieved by using more advanced techniques such as an appropriate meta-model representing the software system [120]. The presented heuristic approach lists both control flow transitions and data interactions in the internal and external viewpoint. 3.3.2.1
Internal Viewpoint
Control Flow Transitions The control flow dependencies in the internal viewpoint can easily be identified thanks to the explicit forwarding concept in Struts. In a first step, the different logical forwards are listed that can be returned by each action. In a second step, these logical forwards are for each action mapped to actual forwards. Both steps are now further described. Listing the logical forwards for each action is done by applying a simple search pattern on the Java source code of the actions. Since each ActionForwards is cre-
44
Dependency Analysis of the GatorMail Webmail Application
ated by calling the findForward method on the ActionMapping parameter, searching for the findForward pattern does return the logical forwards of a particular action. $ grep ’findForward’ edu/ufl/osg/webmail/actions/PreferencesAction.java return mapping.findForward("success");
In order to map the logical forwards to actual forwards, the global and local forwards are inspected in the Struts configuration file. In case of the PreferencesAction, two local forwards (input and success) are listed, both associated with the /preferences.jsp URL.
In this example, also a mismatch can be noticed between the described local forwards in the configuration file, and the actual used logical forward names in the implementation: the input forward described in the configuration is never returned from within the actual implementation. Since the heuristic pattern search can not guarantee completeness, the union between both sets is used in this dependency analysis as possible control flow transitions starting from the inspected action. Shared Data Interactions Since server-side interactions with the shared data repositories are not specified within the GatorMail application, searching the dataflow dependencies between the actions requires much more effort. Main idea here is again to use search patterns on the Java source code of the different actions to identify their interactions with the shared repositories. Servlets and actions can interact with shared repositories using the getAttribute and setAttribute methods on the request, session or scope object. The regular expression used for finding these interactions is [gs]etAttribute. $ grep ’[gs]etAttribute’ edu/ufl/osg/webmail/actions/PreferencesAction.java final PreferencesProvider pp = (PreferencesProvider)getServlet(). getServletContext().getAttribute(Constants.PREFERENCES PROVIDER); request. setAttribute("X-Image-Url", imageUrl); request. setAttribute("X-Image-Url", prefs.getProperty("compose.X-Image-Url"));
However, using this naive approach, some catches did occur. The actions in the GatorMail application for example use singletons and static methods for some checks and recurring activities, such as checking the user’s session. Listing 3.6
3.3 Dependency Analysis
45
shows the ActionsUtil class with its static method checkSession. This method checks the user’s session by interacting with the shared repository (lines 6-8) and throwing exceptions if needed. To do so, the request object is used as input parameter of the method. To also list these repository interactions, the search pattern is extended with methods known to interact with shared repositories. The regular expression (tailored to the GatorMail application) then looks like: getAttribute|setAttribute |checkSession|getFolder|getUser|getAddressList |getAttachList|getMailStore|removeAttachList|generateComposeKey|getMailSession
Listing 3.6: Use of static methods: ActionsUtil.java 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
final class ActionsUtil { public static void checkSession(final HttpServletRequest request) throws SessionExpiredException, InvalidSessionException, NoSuchProviderException { final HttpSession session = request.getSession (); // Cheeze hack to let us track how long a request took. if (request.getAttribute("requestStartTime") == null) { request. setAttribute("requestStartTime", new Long(System.currentTimeMillis()) ); } // check if the user was working on or trying to send a message // when his/her session timed out if ( session .isNew()) { throw new SessionExpiredException("New session created"); } if (Boolean.TRUE.equals(session.getAttribute(Constants.LOGGING IN))) { throw new InvalidSessionException("User trying" + "to login in concurrently."); } } }
Also in identifying the interactions between JSP views and the shared repositories, the naive search pattern turned out to be insufficient. Two noteworthy problems are now briefly discussed: the use of Struts Tiles and the use of the JSTL Expression Language (EL). Struts Tiles enable web designers to use some kind of layout manager in constructing JSP views, similar to layout managers in GUIs for stand-alone applications. A layout designs a view in an abstract way, after which the different parts (or tiles) can be filled in to create a concrete instance of the view. The preferences.jsp view for example is an instance of the defaultLayout, and is built up as shown in figure 3.7. One can also think of tiles as the server-side alternative of using frames in a web page.
46
Dependency Analysis of the GatorMail Webmail Application
To identify the repository interactions of a JSP view using Struts Tiles, the interactions of the layout and the included tiles need to be analysed as well.1
Figure 3.7: Structure of preferences.jsp using Struts tiles Another difficulty in identifying repository interactions is the use of the JSTL Expression Language (EL). The EL provides web designers a simple interface to access web application data, such as data in the shared repositories. EL expressions are surrounded by delimiters ${ and } and can also contain operators. EL expressions are run-time evaluated and are mostly used to output shared data and to construct test conditions. In listing 3.7, the focus of the login web form depends on the outcome of two tests (lines 4-5), expressed in EL: the existence of the loginForm object at the request scope and the username property of this object. EL expressions can be easily found with the search pattern ${.*}. Interpreting these ELs correctly however requires much more effort since there is no information available to indicate with which shared repository the JSP view is interacting. The data objects in EL are only at run time searched in the different shared repositories in a cascading way. In this analysis, EL expressions are processed manually. Listing 3.7: Use of the JSTL Expression Language: loginForm.jsp 1 2 3 4 5 6 7 8 9 10
<% String loginFocus = "username"; %>
<% loginFocus = "password"; %> 1 Actually the same behaviour also occurs when JSP views or servlets include other pages with the include statement or tag.
3.3 Dependency Analysis 11 12 13 14 15
47
Username:
Password:
Login
3.3.2.2
External Viewpoint
Control Flow Transitions Identifying control flow transitions in the external viewpoint is based on finding the set of hyperlinks that are created within each JSP view. To do so, two methods for defining hyperlinks in GatorMail are inspected. Firstly, hyperlinks can be constructed by directly outputting the anchor HTMLtag. The according search pattern to do so is a href. In applying this search pattern, only hyperlinks pointing to the GatorMail application are considered. Secondly, hyperlinks can be created by using the struts-html tag library. Three tags are important to identify the hyperlinks, resulting in the search pattern html:link|html: rewrite|html:form. The
tag represents a HTML anchor with a hyperlink, with as forward attribute the logical forward within Struts (line 9 of listing 3.8). The writes out a hyperlink without inserting the anchor tag (line 5 of listing 3.8). The action attribute of the tag indicates to which URL the web form has to be sent in order to be processed (line 10 of listing 3.4). Listing 3.8: Extract from the navigation tile: navBar.jsp 1 2 3 4 5 6 7 8 9 10 11 12
<%@page contentType="text/html"%> <%@taglib uri="/tags/struts-html" prefix="html"%>
Data passing The client-server data passing in the external viewpoint can easily be identified thanks to the explicit ActionForm concept in Struts. In the Struts configuration file (listing 3.1), the name attribute of an action defines the ActionForm that is constructed and validated before executing the given action.
48
Dependency Analysis of the GatorMail Webmail Application
In addition to the ActionForms, input parameters can also be read by using the getParameter method of the HTTPServletRequest. Identifying these dataflows is as simple as searching for the getParameter pattern.
3.3.3
Abstract Application Model
With the identified types of dependencies in mind, an abstract model of the webmail application can be constructed. This abstract model captures all the relevant information about the application structure and the control flow and dataflow dependencies, while abstracting programming details. This model allows to store all identified dependencies from section 3.3.2 for further analysis or dependency management. The abstract application model is shown in figure 3.8 and is now further explained. A Composition represents the server-side logic for processing a URL and is a chain of CompositionItems. These CompositionItems are pointers to components of the application, either a Struts action or a JSP view. A composition can also have an ActionForm for encapsulating the input parameters of the request. A component also specifies its interactions with the shared data repositories by means of SharedDataItems. A link between the Component and InputData lists the input parameters that are read directly from the request (instead of using an ActionForm). In addition, also the internal structure of a JSP view is included in the abstract model. A tiled JSP view (JSPComponent) is constructed by selecting a Layout, and overriding some of the predefined Tiles in the Layout. Layouts and Tiles specify their interactions with the shared data repositories by means of SharedDataItems. Similarly, they express external control transitions (i.e. the URL hyperlinks in the view) by means of UrlForwards.
3.4
Results
This section briefly discusses the results of the dependency analysis conducted on the GatorMail webmail application. Firstly, an overview of the identified dependencies is given. Secondly, some properties of these dependencies are discussed.
3.4.1
Overview of Dependencies
3.4.1.1
Internal Viewpoint
Table 3.1 shows an overview of the control flow transitions and shared data interactions in the internal viewpoint. 36 Struts actions and 29 JSP views were reused in 52 compositions, each representing the processing logic for one URL of the web application. Some components are even reused several times within one composition. The composition processing the /modifyMessage.do request for example consists of 11 components of which 6 are unique (figure 3.9).
3.4 Results
49
Figure 3.8: Abstract application model
# comp. log (fys)
# items
# reads
# writes
# read/writes
# items
# reads
# writes
# read/writes
# items
# reads
# writes
# read/writes
# shared data interactions
# control flow transitions
context
6 (6) 1 (1) 1 (1) 2 (2) 3 (3) 11 (8) 2 (2) 2 (2) 5 (5) 5 (5) 5 (5) 4 (4) 3 (3) 6 (6) 5 (5) 11 (6) 5 (5) 5 (5) 5 (5) 5 (5) 4 (4) 3 (3) 4 (4) 4 (4) 2 (2) 1 (1) 2 (2) 2 (2)
4 2 0 9 9 14 6 6 8 8 8 8 7 17 12 12 12 12 12 12 17 10 17 17 9 4 5 2
7 1 0 5 6 20 5 5 11 11 11 10 6 12 10 20 10 10 10 10 8 6 8 8 2 1 5 2
0 0 0 6 6 21 4 4 9 9 9 8 5 10 10 18 9 9 9 9 9 8 9 9 3 2 2 0
2 1 0 2 3 8 1 1 3 3 3 2 2 10 4 8 4 4 4 4 10 3 10 10 6 1 1 1
7 3 3 4 4 4 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 3 3 4 3
6 1 3 2 3 10 2 2 5 5 5 4 3 6 5 10 5 5 5 5 4 3 4 4 2 1 2 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 2 0 3 5 14 2 2 6 6 6 4 4 6 7 14 7 7 7 7 5 5 5 5 2 2 3 2
1 0 0 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 2 2 2 0 1 1
2 0 0 2 2 6 1 1 2 2 2 2 1 4 3 6 3 3 3 3 4 2 4 4 2 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 5 3 20 25 79 15 15 36 36 36 30 21 48 39 76 38 38 38 38 40 27 40 40 17 7 14 8
5 0 0 1 2 11 1 1 4 4 4 3 4 5 4 12 5 5 5 5 3 2 3 3 1 0 1 1
Dependency Analysis of the GatorMail Webmail Application
/login.do /checkCookies.do /logout.do /folder.do /emptyTrash.do /modifyFolder.do /folderManage.do /folderManageModify.do /createFolder.do /renameFolder.do /changeSubscribed.do /deleteFolder.do /performDeleteFolder.do /deleteMessage.do /deleteMessages.do /modifyMessage.do /moveMessage.do /copyMessage.do /moveMessages.do /copyMessages.do /message.do /rawMessage.do /nextMessage.do /prevMessage.do /printerFriendly.do /attachment.do /addressbook.do /addAddress.do
session
50
request
# control flow transitions
# shared data interactions
# read/writes
# writes
# reads
# items
context
# read/writes
# writes
# reads
8 6 15 24 4 2 4 4 4 9 9 15 11 1 2 1 1 1 1 2 3 0 5 5
# items
# reads
6 6 21 22 4 2 4 3 3 11 11 11 11 1 2 1 1 1 1 2 3 0 9 9
# read/writes
# items
5 (5) 3 (3) 6 (6) 11 (7) 2 (2) 2 (2) 2 (2) 2 (2) 2 (2) 4 (4) 4 (4) 8 (7) 6 (6) 1 (1) 1 (1) 2 (2) 1 (1) 1 (1) 1 (1) 2 (2) 2 (2) 1 (1) 2 (2) 2 (2)
# writes
# comp. log (fys) /saveAddress.do /deleteAddress.do /selectAddresses.do /saveAddresses.do /errorCopy.do /errorCopyToSent.do /errorCopyToTrash.do /compose.do /composeResume.do /forward.do /reply.do /modifyCompose.do /send.do /errorBasic.do /errorUncaught.do /errorLogout.do /noInbox.do /about.do /help.do /feedback.do /preferences.do /CSS.do /failMessage.do /failMessageList.do
session
3.4 Results
request
3 3 4 5 0 8 1 2 0 0 29 4 3 2 4 3 0 6 1 1 0 0 21 3 13 12 4 8 0 7 2 5 0 0 60 5 23 23 4 13 0 15 2 9 0 0 107 10 0 1 3 2 0 2 1 1 0 0 10 1 0 1 3 2 0 2 1 1 0 0 8 1 0 1 3 2 0 2 1 1 0 0 10 1 0 1 4 2 0 4 1 2 0 0 13 1 0 1 4 2 0 3 1 1 0 0 11 1 8 3 5 4 0 7 1 4 0 0 35 3 8 3 5 4 0 7 1 4 0 0 35 3 7 5 5 8 1 12 1 6 0 0 54 8 7 4 5 6 1 8 1 5 0 0 42 5 0 0 1 1 0 0 1 1 0 0 3 0 0 0 1 1 0 0 1 1 0 0 4 0 0 0 3 4 0 0 1 1 0 0 6 1 0 0 1 1 0 0 1 1 0 0 3 0 0 0 1 1 0 0 1 1 0 0 3 0 0 0 1 1 0 0 1 1 0 0 3 0 0 1 3 2 0 2 1 1 0 0 8 1 1 1 3 2 0 2 1 2 0 0 11 2 0 0 0 0 0 0 0 0 0 0 0 0 6 2 4 2 0 3 1 2 0 0 20 1 6 2 4 2 0 3 1 2 0 0 20 1 Total control flow interactions and shared data interactions: 1516 Table 3.1: Overview of control flow transitions and shared data interactions in the internal viewpoint
51
# URL pointers
# referring URLs
3 39 21 38 4 4 1 0 0 0 21 6 6 37 2 7 0 0 0 0 0 0 0 42 37 0
# input parameters
3 0 1 0 1 1 4 2 2 2 3 3 2 0 2 4 0 0 8 9 9 0 0 0 9 1
# ActionForms
1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 0 0 0 1 0
# URL pointers
# referring URLs
/login.do /logout.do /emptyTrash.do /folderManage.do /createFolder.do /changeSubscribed.do /performDeleteFolder.do /deleteMessages.do /moveMessage.do /moveMessages.do /message.do /nextMessage.do /printerFriendly.do /addressbook.do /saveAddress.do /selectAddresses.do /errorCopy.do /errorCopyToTrash.do /composeResume.do /reply.do /send.do /errorUncaught.do /noInbox.do /help.do /preferences.do /failMessage.do
# input parameters
Dependency Analysis of the GatorMail Webmail Application
# ActionForms
52
12 /checkCookies.do 0 0 0 0 0 /folder.do 1 5 48 13 13 /modifyFolder.do 1 3 21 13 12 /folderManageModify.do 1 4 7 13 15 /renameFolder.do 1 1 4 15 15 /deleteFolder.do 1 4 7 13 12 /deleteMessage.do 1 1 0 18 13 /modifyMessage.do 1 2 6 13 13 /copyMessage.do 1 2 0 13 13 /copyMessages.do 1 2 0 13 19 /rawMessage.do 1 3 6 13 19 /prevMessage.do 1 3 6 19 4 /attachment.do 1 2 0 0 12 /addAddress.do 0 0 3 11 13 /deleteAddress.do 1 2 3 12 20 /saveAddresses.do 1 5 2 20 2 /errorCopyToSent.do 0 0 0 2 2 /compose.do 1 8 38 11 11 /forward.do 1 9 0 14 14 /modifyCompose.do 1 10 7 14 14 /errorBasic.do 0 0 0 3 3 /errorLogout.do 0 0 0 3 4 /about.do 0 0 37 3 5 /feedback.do 0 0 38 4 10 /CSS.do 0 0 48 0 13 /failMessageList.do 0 1 0 13 Total control flow transitions and data passings: 682 Table 3.2: Overview of control flow transitions and data passing in the external viewpoint
1369 shared data interactions are identified in GatorMail. For each composition and scope (request, session and context) the number of data items and the number of read, write and read/write interactions are summarised in table 3.1. In addition, 147 control flow transitions were identified within the 52 compositions. 3.4.1.2
External Viewpoint
In the external viewpoint 133 client-server data passings are identified, next to 549 control flow transitions (table 3.2). The client-server data passings are a combination of the data fields encapsulated in ActionForms and input parameters directly read by the components. The counted control flow transitions are the URL pointers that are generated by the different views of a composition. In addition also the number of compositions pointing to the given composition are listed in table 3.2.
3.4 Results
53
Figure 3.9: Composition Processing /modifyMessage.do
3.4.2
Key Characteristics
In this subsection some properties of the identified dependencies are briefly discussed: the crosscuttingness, the number of dependencies, the ease of identification and the complementarity of the dependencies. Crosscuttingness. The interactions with the shared repositories crosscut the implementation of the Struts actions and JSP views in the GatorMail webmail application. Figure 3.10 shows the result of the Aspect Browser [50], run on the edu.ufl.osg.webmail.actions package. Each column represents the implementation of an action, and the marked code lines visualise the interactions with the shared repositories. Similarly, the control flow pointers in the external viewpoint also crosscut the implementation of the JSP views. The external client-server data passing and internal control flow transitions on the other hand, have much more cohesion and lower coupling to the implementation, since they are both described in the Struts configuration file. Number of Dependencies. The number of dependencies for the GatorMail webmail application is quite high. With 1369 shared data interactions and 147 control flow transitions in the internal view, and 133 client-server data passings and 549 control flow transitions in the external view, one can say that the dependency management for a simple application as GatorMail is already quite complex. The /saveAddresses.do composition for example contains 11 components, of which 7 are unique (figure 3.11). Next to 10 control flow transitions, also 107 interactions with the shared repository can be identified. Hence, modifying or extending such a composition without breaking any of the existing dependencies
54
Dependency Analysis of the GatorMail Webmail Application
Figure 3.10: Crosscuttingness of dependencies in GatorMail is quite hard without a proper dependency management. Figure 3.12 for example illustrates this complexity by visualising the dependencies in the /saveAddresses.do composition.
Figure 3.11: Composition processing /saveAddresses.do
Ease of Identification. The ease of identification of the different dependencies strongly depends on their explicitness and crosscuttingness. The more explicitly they are described within the application, the easier dependencies can be identified and taken into account. The explicit concept of global and local forwards for
3.4 Results
55
Figure 3.12: Visualisation of the dependencies for /saveAddresses.do declaratively specifying the internal control flow is a good example of how the explicitly described control flow transitions can easily be identified. Similarly the external client-server data passings are explicitly described by the concept of ActionForms. The internal shared data interactions and external control flow transitions however remain completely implicit within the application. Since they also strongly crosscut the implementation, identifying the correct dependencies is not obvious. In these cases achieving a proper dependency management in case of software evolution is a challenging task for software developers. Complementarity. In this analysis, four types of dependencies are identified: internal dataflows, internal control flows, external data flows and external control flows. Although each type has its merit on its own, the different types also seem to be very complementary in achieving a better understanding of the dependencies. Combining the internal and external dependencies results in a global dependency view, both for control flow and dataflow. A single control flow graph can be constructed, combining the internal and external control flow transitions. Similarly, the internal and external dataflow can be coupled to express the impact of input parameters on the internal data flow. Also the combination of data flow and control flow can be useful. Since for each request, the shared data repositories are transported along the control flow
56
Dependency Analysis of the GatorMail Webmail Application
transitions, it is necessary to take the control flow into account in managing the data flow dependencies. If one wants to check for example that any read interaction on a shared object only occurs after a write operation for that object is completed, a combination of control flow and dataflow is necessary. For objects on the request scope, combining internal dataflow and control flow is sufficient, but for objects on the session and context scope, external control flow must also be taken into account.
3.5
Conclusion
The dependency analysis conducted in this chapter identified four types of dependencies in the GatorMail webmail application, resulting in more than 2000 data interactions and control flow transitions. Most of these dependencies exist implicitly within the application, and are difficult to find due to their crosscuttingness. Others are explicitly described in configuration files, which makes them much easier to find. Although the presented, heuristic approaches to identify these dependencies do not give guarantees about completeness, the conducted analysis reveals already most of the existing dependencies within GatorMail and also reveals the complexity of identifying them. The dependency analysis also shows that managing dependencies within a relatively simple application as GatorMail is not that obvious. Since most of the dependencies remain implicit in current software development, and due to their multitude, modifying or extending such applications without breaking any of the existing dependencies is quite hard to achieve without a proper dependency management. Identifying dependencies is only a first step towards a better dependency management is software development. Automatic reasoning and verification of the identified dependencies could help in building more reliable software systems and better support software evolution, without sacrificing the benefits of reusable, loosely-coupled software components.
Chapter 4
Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems To maintain loose coupling and facilitate dynamic composition, components in a pipe-and-filter architecture have a very limited syntactic interface and often communicate indirectly by means of a shared data repository. This severely limits the possibilities for compile time compatibility checking. Even static type checking is made largely irrelevant due to the very general types given in the interfaces. The combination of pipe-and-filter and a shared data repository is widely used, and in this chapter we study this problem in the context of the Struts framework. We propose simple, but formally specified, behavioural contracts for components in such frameworks and show that automated formal verification of certain semantical compatibility properties is feasible. In particular, our verification guarantees that indirect data sharing through the shared data repository is performed consistently. This chapter contains the following paper [38]: Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems. In Software Composition, volume 2006/4089 of Lecture Notes in Computer Science, pages 34–49. Springer Berlin / Heidelberg, 2006. Only some minor editorial changes have been made to the original paper and parts of the background information are moved to section 2.3. 57
58
4.1
Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems
Introduction
Current component systems often promote loosely-coupled components to enhance component reuse. The pipe-and-filter style [107] for example is a very popular architectural style for constructing flow-oriented component frameworks. It is often combined with the repository style [107] to support anonymous communication between components. Current state-of-the-art web component frameworks such as Java Servlets [68] or the popular Struts framework [112] are examples of such frameworks. The main advantage of this kind of architecture is that it makes “wiring” of components at the syntactical level very simple: components are independent entities and interact with the shared data repository through a generic untyped interface. The corresponding drawback is that semantical compatibility checks are absolutely minimal: even compile-time or composition-time type checking is circumvented. For instance, in Java retrievals from the repository are done under the Object type, and the retrieved object is then downcasted to the expected type at run time. This potentially leads to exceptions at run time and is in nowadays Java-like languages a relevant problem. This in turn significantly hinders independent extensibility of applications built in such frameworks, and reuse of components in new compositions. It is for instance up to the composer to make sure that all data that a given component expects to find on the repository is guaranteed to be present in the constructed composition. Oversights of the composer can lead to run-time errors. To enhance component reuse and third-party composability, a precise documentation of the semantical behaviour of the components is essential. By making parts of the component contract formal, automated tool support for verifying some level of semantical compatibility at composition time becomes feasible. As a consequence, certain types of bugs can be detected at compile time or at composition time instead of at run time. In this chapter we propose formal component contracts written in JML, the Java Modeling Language [119], that specify part of the behaviour of components in the Struts framework and we show that static verification with state-of-theart verifiers for JML is feasible. Our contracts specify for instance what data a component expects on the repository, and what data the component puts onto the repository. Verification checks whether (1) implementations of components honour their contract, and whether (2) compositions always respect the contracts of their constituents. Our approach has been validated on GatorMail [40], an open-source, Struts-based webmail application. While we have worked out our contracts for the case of Struts, the same idea is applicable to any framework based on the pipe-and-filter and repository architectural styles. The rest of this chapter is structured as follows. Section 4.2 provides some background information on component contracts and static verification. Next, the
4.2 Background
59
problem statement is elaborated in section 4.3 and solutions for verifying two composition properties are proposed in section 4.4. Section 4.5 validates the proposed solutions in the open-source webmail application GatorMail. In section 4.6, the presented work is related to existing research and, finally, section 4.7 summarises the contributions of this chapter.
4.2 4.2.1
Background Component Contracts and Static Verification
Component contracts have already often been proposed before for various purposes [114]. For components written in Java, The Java Modeling Language (JML) [119] is a popular formal contract specification language. In this chapter, JML notation is used to specify pre- and post-conditions as well as frame conditions for methods that process HTTP requests. Frame conditions specify what part of the state a method is allowed to modify. One of the main advantages of JML is the large amount of tool support that is available [16]. Tools are available for run-time contract checking, test generation, static verification and inference of specifications. Of particular interest to us are tools for static verification of JML contracts. A variety of verification tools is available that make different trade-offs in verification power and need for user interaction. In the experiments reported on in this chapter, we used the ESC/Java2 verifier [72]. The main advantage of this verifier is that it requires no user interaction. On the downside, the verifier is far from complete, and has some known sources of unsoundness [77, 20]. In section 4.4.3, we explain how this impacts verification of our proposed contracts.
4.3
Problem Statement
Although the declarative forwarding mechanism and indirect data sharing in Struts highly facilitate the composition of a web application from a syntactical point of view, they also introduce hidden complexities for the software composer. In order to achieve correctly functioning compositions, the software composer needs to bear in mind all the hidden data interactions through the shared data repository, and anticipate all possible forwards of the actions. This hidden complexity should not be underestimated. We investigated GatorMail [40], an open-source webmail application of the University of Florida, built upon the Struts framework. In this web application (consisting of about 20.000 lines of code), we identified 36 Struts actions and 29 JSP views, reused in 52 request processing flows [36]. The FolderAction for instance was reused in more than 20 processing flows. All the flows contributed to 147 declarative control flow transitions in the webmail application, and to 1369 data repository interactions.
60
Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems
The control flow transitions were specified in the composition configuration by means of global and local forwards, but none of the data interactions with the shared repository were documented. It should be clear that under these circumstances it is not obvious how to reuse existing components or to contribute to an open-source project such as GatorMail, without breaking any of the existing, hidden data dependencies between actions, or without leaving some dangling control flow transitions1 , unless of course, a full source code study is undertaken to identify the declarative forwards and the data repository interactions. To focus on the essence of the problem, we now define a simplified version of the Struts application model. This simplified version mainly takes the declarative forwarding mechanism and the indirect data sharing into account. The presented application model is then used to define some desired composition properties at the end of the section. The problem we address in this chapter is how we can verify these properties statically. The simplified application model is sufficiently generic to reflect the common characteristics of many pipe-and-filter applications with a shared data repository. Hence, the proposed solution of section 4.4 is generally applicable to this kind of applications. In section 4.5, the simplified model is further specialised towards the Struts application framework in order to apply our solution to real, existing Struts applications.
4.3.1
Simplified Application Model
In the simplified application model (shown in figure 4.1), an application is still a composition of actions. All actions implement an execute method taking two parameters: a Request and a Form. A Request is a first class entity representing the request that is being processed and the request provides access to the shared data repository (setDataItem, getDataItem and removeDataItem) associated with the request. The Form encapsulates the request parameters provided by the client for processing the request. The execute method of an Action returns a string, logically indicating which control flow transition should be taken. A Configuration object encapsulates the local and global forwards of a composition and maps the strings to corresponding actions. The RequestProcessor then repeatedly executes an action for a given request and based on the return value it selects an appropriate succeeding action from the Configuration. JSP views are reduced to normal actions in the simplified application model, but they do not produce a forward. 1 With a dangling control flow transition, we mean that at run time the action returns a logical forward, but that no mapping can be found to an actual forward in the list of local or global forwards of the running configuration.
4.3 Problem Statement
61
Figure 4.1: The simplified application model
4.3.2
Composition Example
To illustrate the simplified application model, a basic composition example is now introduced. The composition is part of an online calendar system and allows a user to schedule a meeting with several participants at a given time slot and location. The composition consists of four actions and is shown in figure 4.2. The rounded boxes represent actions and the solid arrows indicate control flow transitions.
Figure 4.2: Composition example: scheduling a meeting The first action to be executed in scheduling a meeting is the AddMeetingAction. This action tries to schedule the requested meeting. On success, the request is processed by an EmailNotificationAction which sends a notification to the participants of the meeting. Afterwards, the scheduled meeting is shown to the web user (AddedMeetingView ). On failure, the AddMeetingFailedView lists the different conflicts which make the scheduling impossible. The labels on the control flow transitions represent the return values of the different actions. The AddMeetingAction can either return “success” or “fail”, indicating whether or not the scheduling was successful. The EmailNotificationAction only returns “success”, whereas views do not produce a forward. The interactions with the shared data repository are indicated by dashed lines.
62
Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems
The AddMeetingAction stores the meeting information (containing the participants, time slot and location) on the shared repository. In case the meeting cannot be scheduled, a list of conflicts is saved as well. All other actions retrieve the meeting information from the shared repository. In addition, the AddMeetingFailedView also reads the list of conflicts.
4.3.3
Desired Composition Properties
Based on the simplified application model, a number of desired composition properties can be defined in loosely-coupled compositions with a declarative control flow and indirect data sharing. Some examples are: No dangling forwards: Every logical forward in the composition is mapped to an actual forward in the configuration. No broken data dependencies: A shared data item is only read after being written. For each shared data read interaction, the shared data item that is already written on the repository is of the type expected by the read operation. In the next section, solutions are proposed to statically verify these composition properties in the simplified application model.
4.4
Solution
Figure 4.3 shows an overview of the solution. In order to statically verify the composition properties of the previous section, each action is extended with an appropriate action contract. These contracts are then verified in two phases. Firstly, the compliance of the action implementation with the action contract is checked. Secondly, the composition properties are checked based on the different action contracts.
Figure 4.3: Overview of the solution
4.4 Solution
63
The action contracts are expressed in a framework-specific contract language. Listing 4.1 for example, shows such a framework-specific contract of AddMeetingAction. These framework-specific contracts are then translated into JML contracts in order to verify them with existing verifiers. For the rest of the chapter we have chosen to show the translated JML contracts since JML is a fairly well-known contract language. Listing 4.2 shows the EBNF notation of the framework-specific contract language. The translation into JML contracts shown is straightforward. Listing 4.1: Framework-specific contract of AddMeetingAction //spec: forwards {”success”,” fail ”}; //spec: writes {Meeting meeting}; //spec: on forward == ”fail” also writes {Vector conflicts };
Listing 4.2: ENBF notation of the framework-specific contract language CONTRACT := SPECLINE∗ SPECLINE := SPECTAG ( FORWARDS | WRITES | READS | CONDITIONAL) SPECTAG := ’//spec: ’ FORWARDS := ’forwards ’ STRINGSET ’;’ WRITES := ’writes ’ OBJECTSET ’;’ READS := ’reads ’ OBJECTSET ’;’ CONDITIONAL := ’on forward == ’ STRING ’ also writes ’ OBJECTSET ’;’ STRINGSET := ’{’ (STRING,)∗ STRING ’}’ STRING := ’"’ IDENTIFIER ’"’ OBJECTSET := ’{’ (OBJECT,)∗ OBJECT ’}’ OBJECT := TYPE NAME TYPE := IDENTIFIER NAME := IDENTIFIER
This section only highlights key points of the solution. The full action contracts of the composition example (in the framework-specific contract language and in JML) and the appropriate translation tool can be found on the chapter’s accompanying website [33].
4.4.1
No Dangling Forwards Property
4.4.1.1
Action Contracts for the No Dangling Forwards Property.
In order to verify the no dangling forwards property, the action contract needs to include sufficient information about the possible declarative forwards (i.e. the different return values). This can simply be done in a JML specification by restricting the return values of an action as part of the action’s post-condition. In Listing 4.3 for example, two possible return values are declared in the action’s contract: the strings “success” and “fail”.
64
Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems Listing 4.3: Contract for declarative forwarding (AddMeetingAction.spec)
public class AddMeetingAction extends Action { //@ also //@ ensures \result == ”success” || \ result == ”fail”; public String execute(Request request, Form form); }
Note that for this simple component also exceptions could have been used instead of return values to indicate the success of the scheduling. Nevertheless, return values are used in the simplified application model to reflect the control flow transition mechanism in Struts model and to also allow a richer set of forwards than only the success/fail pair. 4.4.1.2
Static Checking of the No Dangling Forwards Property.
To check the compliance of the action’s contract with its implementation, a very pragmatic approach such as applying a simple search pattern on the Java source could be used. If however the source code is not that straightforward anymore (e.g. if programming constants are used, or if the return value is constructed in several statements), a static checker tool such as ESC/Java2 can be used to verify the compliance with the ensures clause. Verifying the no dangling forwards property itself is trivial and can be done by using a simple algorithm that verifies that for each possible return value of the action either a corresponding local forward or global forward exists in the composition-specific configuration. In practice, the declarative forwarding property is not individually verified, but is verified in combination with the no broken data dependencies property as will be explained in section 4.4.2.2.
4.4.2
No Broken Data Dependencies Property
4.4.2.1
Action Contracts for the No Broken Data Dependencies Property.
The action contracts have to specify the interactions between actions and the shared data repository. These interactions can be expressed in terms of the preand post-state of the repository by using the getDataItem method of the Request. Because methods used in specifications may not have side-effects, the getDataItem method is declared pure, i.e. the method will not modify the program state. A more precise definition of purity can be found in [75]. For read interactions, the action’s contract indicates that the action requires that a non-null data item of the specified type can be read from the shared repository, as is shown in Listing 4.4. In JML, instanceof expressions of the form E instanceof T yield true when the value of E is not null and the corresponding cast would succeed [76].
4.4 Solution
65
Listing 4.4: Contract for indirect data sharing (EmailNotificationAction.spec) public class EmailNotificationAction extends Action { //@ also //@ requires request != null ; //@ requires request .getDataItem(”meeting”) instanceof Meeting; //@ ensures \result == ”success”; public String execute(Request request, Form form); }
For write interactions, the ensures pragma states which data items on the shared repository will be non-null and of the specified type after method execution. In Listing 4.5 for example, the JML contract of the execute method of AddMeetingAction states that the shared data item meeting will be a non-null Meeting object. Since write interaction may also depend on certain conditions (e.g. if a write interaction occurs in an if-then-else structure), this must also be reflected in the action’s contract. In Listing 4.5 an implication expression (==>) is used to express that the data item conflicts is only written in case the return value equals “fail”. Listing 4.5: JML contract for indirect data sharing (AddMeetingAction.spec) public class AddMeetingAction extends Action { //@ also //@ requires request != null ; //@ ensures request.getDataItem(”meeting”) instanceof Meeting; //@ ensures \result == ”fail” ==> request.getDataItem(”conflicts”) // instanceof Vector; //@ ensures \result == ”success” || \ result == ”fail”; public String execute(Request request, Form form); }
4.4.2.2
Static Checking of the No Broken Data Dependencies Property.
To verify the no broken data dependencies property, ESC/Java2 is used to verify both the compliance of the implementation of the execute method with the contract, and the composition property itself. To check the compliance of the action, a specification of the shared repository is introduced, as listed in 4.6. Hereby, explicit JML pragmas and a ghost variable are introduced for each shared data item, since the current version of the ESC/Java2 tool does not support reasoning about hashtable indirections.
66
Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems Listing 4.6: JML contract of the shared data repository (Request.spec)
public class Request { //@ public ghost Object meeting; //@ public ghost Object conflicts ; //@ requires isKey(name); //@ ensures name == ”meeting” ==> this.meeting == value; //@ ensures name == ”conflicts” ==> this.conflicts == value; public void setDataItem(String name, Object value); //@ requires isKey(name); //@ ensures name == ”meeting” ==> \result == this.meeting; //@ ensures name == ”conflicts” ==> \result == this.conflicts; public /∗@ pure @∗/ Object getDataItem(String name); //@ ensures \result <==> key == ”meeting” || key == ”conflicts”; public /∗@ pure @∗/ boolean isKey(String key); }
To verify the no dangling forwards property and the no broken data dependencies property, a composition-specific check method is automatically generated and then verified by ESC/Java2. The verification assumes that actions are stateless or that at least the action contracts do not depend on the action state. In that case, the automated generation of the check method is straightforward. The check method (shown in Listing 4.7) firstly initializes the different actions used in the composition. Secondly, based on the local and global forwards of the composition configuration, a complete control flow graph is statically constructed, similar to what would happen at run time by repeatedly using the RequestProcessor. The unreachable pragmas are able to detect violations to the no dangling forwards property, since they are only reachable if an action returns a value that does not match any of its local or global forwards. The no broken data dependencies property is implicitly verified. Since, for every method call in the method body, ESC/Java2 checks that the preconditions are fulfilled, each data item read must be preceded by a data item write in the execution path and comply with the type requirements in order to satisfy the JML contract of the read interaction. Listing 4.7: Composition-specific check method to be verified by ESC/Java2 //@ requires request != null ; public void check addMeeting(Request request, Form form){ AddMeetingAction addMeetingAction = new AddMeetingAction(); EmailNotificationAction emailNotificationAction = new EmailNotificationAction(); AddedMeetingView addedMeetingView = new AddedMeetingView(); FailedAddedMeetingView failedAddedMeetingView = new FailedAddedMeetingView();
4.4 Solution
67
String forward1 = addMeetingAction.execute(request,form); if (forward1.equals("success")){ String forward2 = emailNotificationAction.execute(request,form); if (forward2.equals("success")){ addedMeetingView.execute(request,form); } else { //@ unreachable; } } else if (forward1.equals("fail")){ failedAddedMeetingView.execute(request,form); } else { //@ unreachable; } }
4.4.3
Unsoundness with ESC/Java2
ESC/Java2 has a number of known sources of unsoundness [77, 20]. One of these sources also impacts the soundness of our approach, namely ESC/Java2’s default handling of framing. As defined in JML, ESC/Java2 has a default for missing modifies clauses (i.e. modifies \everything) to unhide unexpected changes to variables caused by calling a routine, but logic to reason about routine bodies that contain these modifies clauses has not yet been implemented in ESC/Java2 [20]. As a result, methods without explicit modifies clauses can be verified since the default frame condition includes everything. However in calling such methods, the current implementation of ESC/Java2 does not take this default frame condition into account resulting in an unsound verification. In our case this means that an intermediate action can break the dependencies between one action writing shared data and another action retrieving that data, without ESCJava/2 being able to detect that violation. To counter this unsoundness, each action annotation is extended with a frame condition, explicitly stating which data items on the shared repository are changed (as illustrated in listing 4.8). Also the methods in the Request to store and retrieve data from the repository need to have explicit frame conditions. By doing so, ESCJava/2 is able to detect unspecified write interaction with the repository. In addition, other methods interacting with the repository (such as library methods) also require an explicit modifies clause and their contracts need to be verified as well. Since the current JML notations do not support modifies pragmas in terms of pure methods or hashtable values, the inserted pragmas in the actions are quite verbose. For example, the frame condition of the EmailNotificationAction states that the state of the repository is not changed by executing its action (listing 4.8). Note that the isKey method used in this frame condition is state independent. The contract of the isKey method in listing 4.6 lists which keys are allowed in the repository, irrespectively of the state of the repository.
68
Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems Listing 4.8: Frame condition of EmailNotificationAction
//@ ensures (\forall String s; request .isKey(s) ==> // \old( request .getDataItem(s)) == request.getDataItem(s));
In the examples of this chapter the modifies pragmas are omitted, but the full annotation with frame conditions can be found at [33].
4.5
Validation
In this section, we validate the solutions of section 4.4 in the open-source webmail application GatorMail. Firstly, we introduce some slight refinements to the presented solution in order to be applicable to real Struts web applications. Secondly, we investigate the JML annotation overhead of the presented approach and the performance of the ESC/Java2 verification tool while verifying the GatorMail web application. Finally, we discuss our validation results.
4.5.1
Verifying Struts Applications: an Example
To illustrate the verification of Struts applications, a small composition example extracted from the GatorMail application is used. In GatorMail, the web URL /createFolder.do is mapped to the composition of figure 4.4 and allows a web user to create a new IMAP mailfolder. The composition consists of three Struts actions and two JSP views. Four control flow transitions occur in the composition: all three actions can return a “success” forward, and in addition the CreateFolderAction can return a “fail” forward. The interactions of the composition with the shared data repository are listed in table 4.1.
Figure 4.4: /createFolder.do composition in GatorMail
4.5.1.1
Verifying the Declarative Forwarding.
In the Struts framework, the execute method of an action returns an ActionForward object instead of a string. This ActionForward does not only encapsulate the declarative forward, but also contains the composition-specific forward path associated with the declarative forward. To do so, the Struts application framework
4.5 Validation
69
Table 4.1: Indirect data dependencies in /createFolder.do Folder folder: FolderManageAction (write) folderManage.jsp (read) FolderManageModifyAction (write) folderManageModify.jsp (read) ResultBean result: CreateFolderAction (write) List quotaList: FolderManageAction (write) folderManage.jsp (read) FolderManageModifyAction (write) folderManageModify.jsp (read)
String requestStartTime: CreateFolderAction (read/write) FolderManageAction (write) folderManage.jsp (read) FolderManageModifyAction (write) FolderManageModifyAction (write) String isSubscribed: FolderManageModifyAction (write) folderManageModify.jsp (read) List folderBeanList: FolderManageAction (write) folderManage.jsp (read)
loads the local and global forwards of the composition into the ActionMapping object at run time, and the returned ActionForward is then constructed by calling the findforward method on the ActionMapping parameter (Listing 4.9). Listing 4.9: Declarative forwarding in Struts public class FolderManageAction extends Action { public ActionForward execute(ActionMapping mapping, ActionForm form, HttpServletRequest request, HttpServletResponse response) throws Exception { // ... return mapping.findForward("success"); } }
To be able to express the local forward string in the JML contracts of the actions, extra specification is introduced for ActionMapping (Listing 4.10). The specification states that the declarative forward used as parameter of the findForward method is equal to the name property of the returned ActionForward. By doing so, the declarative forwards can be expressed in term of the name property of the returned result (Listing 4.11). Listing 4.10: JML specification of ActionMapping public class ActionMapping extends ActionConfig { //@ requires name != null; //@ ensures \result != null ; //@ ensures \result.getName() == name; public ActionForward findForward(String name); }
Listing 4.11: Declarative forward specification of FolderManageAction //@ ensures \result.getName() == ”success”;
70
Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems
4.5.1.2
Verifying Indirect Data Sharing.
Since indirect data sharing via a shared repository in Struts is identical to the simplified application model, the solution of section 4.4 can be applied to Struts applications without any modification.
4.5.2
Results of the GatorMail Experiment
To validate the applicability of our approach, we annotated 12 actions and 8 views of the GatorMail webmail application. With these annotations we were able to verify the declarative forwarding and indirect data sharing properties in 17 composition flows (i.e. one third of all flows in GatorMail). We used this subset of the application to investigate the annotation overhead and the performance of the verification. Only the results are reported in this subsection. The full annotations and a short description of how to verify both the implementation conformance and the composition properties can be found at [33]. The website also includes more information on refactorings of GatorMail needed to adhere to the Liskov principle and additional specification hints required by the verifier. 4.5.2.1
JML Annotation Overhead.
As a quantification of the annotation overhead, a JML line count is performed on the annotated actions. As shown in table 4.2, at most 15 lines of JML annotation are used in an action contract to express the control flow transitions and the shared repository interactions. The JML contract of FolderAction for example, consists of 12 annotation lines, illustrated in Listing 4.12. But this quite verbose JML contract is actually generated from a more concise, Struts-specific contract specified in Listing 4.13. Table 4.2: JML notation overhead in GatorMail
Action ChangeSubscribedAction2 CheckSessionAction CreateFolderAction DeleteFolderAction DeleteMessagesAction FolderAction
# JML lines 14 7 10 10 10 12
Action FolderManageAction FolderManageModifyAction ModifyFolderAction2 MoveCopyAction PerformDeleteFolderAction2 RenameFolderAction
# JML lines 10 11 15 11 15 9
The Struts-specific contracts are at most 4 lines of annotations, and they are much easier to write by a Struts developer and to read by a software composer. The Struts-specific contracts of the GatorMail case and a tool for converting them 2 These actions extend the LookupDispatchAction, and have alternative substitutes of the execute method. Thus, it’s obvious that these actions have a higher JML line count, since several methods are annotated.
4.5 Validation
71
into verifiable JML annotations (including the frame conditions) can be found at [33]. Listing 4.12: JML contract of FolderAction //@ //@ //@ //@ //@ //@ //@ // //@ //@ //@ //@
also requires request != null ; requires mapping != null; ensures \result != null ; ensures \result.getName() == ”success” || \result.getName() == ”inbox”; ensures request.requestStartTime instanceof Long; ensures \result.getName() == ”success” ==> request .folderBeanList instanceof List ; ensures \result.getName() == ”success” ==> request.folder instanceof Folder; ensures \result.getName() == ”success” ==> request.messages instanceof List; ensures \result.getName() == ”success” ==> request.quotaList instanceof List; requires form instanceof FolderForm;
Listing 4.13: Struts-specific contract of FolderAction //struts : forwards {”success”,”inbox”}; //struts : writes {Long requestStartTime}; //struts : on forward == ”success” also writes {List folderBeanList, Folder folder , // List messages, List quotaList};
4.5.2.2
Verification Performance with ESC/Java2.
To evaluate the performance of the verification process, the verification time and memory usage is measured for verifying the implementation compliance and the composition properties. The performance tests were run on a Pentium M 1.4 with 512MB RAM, running Debian Linux, while using Java 1.4.2 09, ESC/Java2 2.0a9 and Simplify 1.5.4. Table 4.3 shows the performance results of verifying a subset of GatorMail. Both verification steps can be done in a reasonable amount of time (less than 15 seconds per verification) and limited memory resources (not exceeding 25MB). If also the frame conditions are checked, the verification takes up to 700 seconds, but since most bugs are already found without checking the frame conditions, this type of verification has to be run less regularly. In addition, since the verification is done modularly (i.e. action per action), the verification complexity is linear and the verification process is scalable to larger software projects as well, under the assumption that the complexity of the individual actions remains equivalent.
4.5.3
Discussion
One of the problems that we were confronted with was ESC/Java2’s poor support to reason about hashtable indirections. Since the dynamics of loosely-coupled
72
Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems
Table 4.3: Verification performance Action ChangeSubscribedAction CheckSessionAction CreateFolderAction DeleteFolderAction DeleteMessagesAction FolderAction FolderManageAction FolderManageModifyAction ModifyFolderAction MoveCopyAction PerformDeleteFolderAction RenameFolderAction Composition flow /folder.do /folderManage.do /folderManageModify.do /createFolder.do /renameFolder.do /changeSubscribed.do /deleteFolder.do /performDeleteFolder.do /modifyFolder.do /deleteMessages.do /moveMessage.do /copyMessage.do /moveMessages.do /copyMessages.do /errorCopy.do /errorCopyToSent.do /errorCopyTrash.do
Verification time (with frame condition) 1.960 s (13.151 s) 0.252 s (2.241 s) 0.951 s (5.106 s) 0.978 s (61.193 s) 4.607 s (24.542 s) 14.18 s (711.654 s) 1.407 s (10.475 s) 2.126 s (205.791 s) 0.831 s (1.699 s) 4.334 s (20.957 s) 1.390 s (5.833 s) 0.844 s (4.993 s) Verification time 0.853 s 0.506 s 0.555 s 1.639 s 1.741 s 1.733 s 1.145 s 2.497 s 7.638 s 1.819 s 2.468 s 1.960 s 2.338 s 1.936 s 0.435 s 0.725 s 0.446 s
Memory usage 16 13 15 17 20 24 16 16 14 19 16 15
MB MB MB MB MB MB MB MB MB MB MB MB
Memory usage 14 MB 15 MB 15 MB 17 MB 17 MB 18 MB 18 MB 19 MB 22 MB 23 MB 24 MB 25 MB 17 MB 19 MB 20 MB 20 MB 18 MB
4.6 Related Work
73
component systems such as Struts strongly rely on hashtable indirections in the implementation, we were forced to circumvent this lack of support by introducing very verbose specifications or statically constructing the complete control flow graph. Additionally, ESC/Java2 is far from complete, for instance reasoning about loops is fairly weak. Also, known sources of unsoundness, related to framing and re-entrancy need to be avoided. Again, this made specifications more verbose than they could be. This is however a temporary problem and future versions of the tool are expected to improve in the different domains. Another issue that we encountered in verifying GatorMail was the violation of the Liskov substitution principle. The DeleteMessagesAction for example extends the FolderAction, while having a stronger precondition regarding the expected data items on the shared repository for the execute method. Since verification tools rely on the Liskov substitution principle, we had to slightly refactor GatorMail in order to comply with the Design by Contract concept. While the GatorMail case study shows that annotation overhead and verification performance are fine, it can not give us data about the usefulness of our approach for detecting bugs early. Since GatorMail is a mature application, bugs due to broken dependencies have been ironed out already. Therefore, it would be interesting to apply our approach to less mature software systems or to study a development process that incorporates our approach in future research.
4.6
Related Work
To the best of our knowledge, this is the first proposal for automatic verification of indirect data sharing in Java-based component frameworks. However, our approach is strongly inspired by ongoing research in several research domains. In software architecture research, several Architecture Description Languages (such as Wright, Darwin and Rapide) are proposed to support architecture-based reasoning, ranging from semi-formal diagrams with boxes and lines to formal notations [80]. Architecture analysis techniques have already been developed to detect problems such as deadlock and component mismatch [59, 7]. Comparable approaches (such as CL [61] and Piccola [4]) are proposed in the domain of coordination and software composition. CL, for example, is a composition language for predictable assembly from certifiable components. In CL, the run-time behaviour of an assembly of components can be predicted from known properties of components and their patterns of interaction [61]. The use of JML or related languages such as Spec# [11] for verifying component properties is a very active research domain. For example, Smans et al. [108] specify and verify code access security properties; Jacobs et al. [62] verify datarace-freeness in concurrent programs, and Pavlova et al. [94] focus on security properties of applets. Other applications of JML are surveyed in [16].
74
4.7
Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems
Conclusion
This chapter has focussed on two desirable composition properties in pipe-andfilter and repository based component systems. We proposed framework-specific component contracts to specify a component’s possible forwards and its interactions with the shared repository and translated them into JML annotations. The contracts are sufficiently simple to have an acceptable annotation overhead and a very reasonable automatic verification time. Although, as discussed in section 4.5.3, there are still some drawbacks with the current state of the verification tool, the conducted experiments show that using existing contract annotation languages and verification tools in order to achieve more robust compositions looks promising.
Acknowledgements The authors would like to thank Bart Jacobs, Adriaan Moors and Jan Smans for their useful comments and insights while proofreading this chapter.
Chapter 5
Bridging the Gap Between Web Application Firewalls and Web Applications Web applications are the Achilles heel of our current ICT infrastructure. NIST’s national vulnerability database clearly shows that the percentage of vulnerabilities located in the application layer increases steadily. Web Application Firewalls (WAFs) play an important role in preventing exploitation of vulnerabilities in web applications. However, WAFs are very pragmatic and ad hoc, and it is very hard to state precisely what security guarantees they offer. The main contribution of this chapter is that it shows how, through a combination of static and dynamic verification, WAFs can formally guarantee the absence of certain kinds of erroneous behaviour in web applications. We have done a prototype implementation of our approach building on an existing static verification tool for Java, and we have applied our approach to a medium-sized J2EE based web application. This chapter contains the following paper [37]: Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Bridging the Gap Between Web Application Firewalls and Web Applications. In Proceedings of the 2006 ACM Workshop on Formal Method in Security Engineering (FMSE 2006), pages 67–77, November 2006. Only some minor editorial changes have been made to the original paper and parts of the background information are moved to section 2.3. 75
76 Bridging the Gap Between Web Application Firewalls and Web Applications
5.1
Introduction
Nowadays web applications are wide-spread and more and more companies incorporate e-commerce in their business model to increase their revenues. But web applications tend to be error-prone, and these bugs are a welcome target for attackers due to their high accessibility and possible profit gain. Therefore, the number of security incidents with web applications is rapidly increasing [21, 88]. A wide range of countermeasures exists and more recently Web Application Firewalls (WAFs) are added to the network infrastructure to counter the shortcomings of traditional network firewalls. WAFs may among others prevent broken access control vulnerabilities. For example, WAFs can enforce a strict request flow to prevent vulnerabilities exploited by forceful browsing attacks. One of the problems with using WAFs for the strict request flow enforcement is the fact that their configuration tend to be quite independent of implementationspecific application details. In that way, they can protect applications against quite general attacks, but there is no direct relationship to the implementationlevel bugs that actually reside in the application they want to protect. Thus, there is no guarantee that an enforced WAF policy on incoming requests protects against attacks that exploit application-specific implementation bugs. The main contribution of this chapter is that it shows how, through a combination of static and dynamic verification, WAFs can formally guarantee the absence of certain kinds of erroneous behaviour in web applications. We have done a prototype implementation of our approach building on an existing static verification tool for Java, and we have applied our approach to a medium-sized J2EE based web application. In particular, we guarantee that if the combination of a web application and a WAF policy passes our verification process, no client/server interaction will break the data dependencies on the shared session state between server-side components. The research presented in this chapter is based upon previous work [38], in which formal contracts specify interactions with a shared data repository and static verification is used to guarantee that no data dependencies are broken within a given software composition. In the previous work, we investigated applications with a deterministic, sequential program execution, whereas in this chapter we model reactive, indeterministic program execution, combine static and dynamic verification, and use semantically richer component contracts. The rest of this chapter is structured as follows. Section 5.2 provides some background information on web applications, web vulnerabilities and Web Application Firewalls. Next, the problem statement is elaborated in section 5.3 and our solution to guarantee that no client/server interaction leads to unintended repository interactions is proposed in section 5.4. Section 5.5 applies the proposed solution to a small e-commerce site and section 5.6 discusses some identified problems with ESC/Java2, the verification tool that we build on. In section 5.7, the presented work is related to existing research and, finally, section 5.8 summarises
5.2 Background
77
the contributions of this chapter.
5.2 5.2.1
Background Web Vulnerabilities and Web Application Firewalls (WAFs)
Existing network security fails to effectively protect web applications against attackers [71]. Network firewalls such as stateful packet filters typically operate on the network or transport layer (e.g. granting access to a complete web application by allowing TCP port 80 traffic), whereas web applications are typically attacked on the application layer. For example, attackers exploit among others design flaws in the application logic and known weaknesses in the HTTP protocol, the browser or the web server technology. A network firewall only addresses network access control in order to control whether or not a web server can be reached, irrespective of the kind of web requests and associated data that is sent to the server. The Open Web Application Security Project (OWASP) documented the ten most critical web applications vulnerabilities in their OWASP Top Ten [93]. In this chapter, we mainly focus on broken access control vulnerabilities, in particular on vulnerabilities leading to forceful browsing [97]. Forceful browsing is the act of directly accessing web pages (URLs) without consideration for their context within an application session. Bypassing intended application flow can lead to unauthorised access to resources or unexpected application behaviour [125]. To counter web application vulnerabilities, Web Application Firewalls (WAFs) operate on the application layer (OSI layer-7) and analyse web requests between a browser and the web server [104, 10]. Often, WAFs are placed inline between the browser and server (as displayed in figure 5.1), and enforce real-time access control, based on application-level information such as the requested URL, the supplied credentials and input parameters and the user session’s history. A WAF can either use a positive or negative security model as basis for access decisions. In case of a positive security model, access control is based upon known positive behaviour; in case of a negative security model, access is denied to requests that reflect known dangerous traffic. A positive security model can be configured manually by the administrator or can be built automatically by observing legitimate network traffic. In the remainder of this chapter, we will focus on WAFs with a positive security model that implement criterion 4.6 of the ”Web Application Firewall Evaluation Criteria” [124], i.e. the strict request flow enforcement. This criterion refers to the technique where a WAF monitors individual user sessions and keeps track of the links already followed and of the links that can be followed at any given time [124].
78 Bridging the Gap Between Web Application Firewalls and Web Applications
Figure 5.1: Web Application Firewall infrastructure
5.3
Problem Statement
Without strict request flow enforcement, forceful browsing attacks can compromise the correct functioning of a web application in various ways. Depending on the application, these attacks can among others circumvent access control or input validation, can corrupt server-side state or can bring the server in an inconsistent state and thus result in unexpected behaviour. More generally, the outcome of tampering with the client/server protocol can break the application logic or trigger server-side bugs. WAFs with strict request flow enforcement are accepted as an effective countermeasure for forceful browsing attacks. However, since WAFs are mainly configured in a heuristic way, either manually or by observing legitimate network traffic, their configuration tend to be quite independent of implementation-specific application details. Therefore, no formal guarantees can be given about the effectiveness of applying a WAF policy to protect against certain types of implementation bugs. In section 5.4, we propose our solution to formally bridge the gap between the WAF enforcement policy and the web application. By combining static verification and dynamic enforcement, we are able to guarantee that the enforcement engine used protects the application against certain types of application-specific implementation bugs. Although we restrict our focus in this chapter to errors that can occur on the non-persistent, server-side session state, we strongly believe that our approach is also applicable to other types of implementation bugs or infringements of the application logic. In the next paragraph, a simple servlet-based web application illustrates the kind of errors that can occur on the server-side session state due to forceful browsing. We will then retake this application in sections 5.4 and 5.5 to clarify parts of our solution. Although we mainly use servlets in this chapter as an illustration, the problem is as well valid in other web technologies such as ASP.NET or PHP, and our solution could be applied there as well.
5.3 Problem Statement
79
The Duke’s BookStore web application. The Duke’s BookStore web application is an exemplary Java Servlet application that is bundled together with the J2EE 1.4 Tutorial [9]. This small e-commerce application consists of about 3500 lines of code, and implements the basic functionality of a web shop by using Java Servlets. The core application logic is supplied by 6 servlets and 1 filter: BookStoreServlet The BookStore servlet returns the main web page for the Duke’s Bookstore. From this start page, links are provided to browse the book catalog, or jump to the book details of a particular book (e.g. a book in promotion). BookDetailsServlet The BookDetail servlet returns information about any book that is available from the bookstore. Links are provided to the user to either add the book to the shopping cart, or to look further into the book catalog. CashierServlet The Cashier servlet asks for the user’s name and credit card number so that the user can buy the books in his shopping cart. Payment information is sent to the Receipt servlet. CatalogServlet The Catalog servlet displays the book catalog, and provides the possibility to add books to the user’s shopping cart or to buy the books in the shopping cart by redirecting to the cashier servlet. ReceiptServlet The Receipt servlet processes the order by updating the book database inventory. Afterwards the servlet invalidates the user session. ShowCartServlet The ShowCart servlet returns information about the books in the user’s shopping cart. OrderFilter The Order filter provides server-side logging of shopping orders, whenever the ReceiptServlet is called. These components interact with the shared session repository as listed in table 5.1. The interactions are specified by a type (e.g. ResourceBundle), a string identifier (e.g. messages) and the type of interaction. The interaction types used in table 5.1 are more fine-grained that just simple read and write operations. The type def. read/write stands for a defensive read/write operation as shown in listing 5.1, i.e. the application can handle a null pointer as result of the read operation, and in that case the servlet stores a not null object of the expected type to the shared session repository. The label cond. means that the operation possibly occurs, depending on an unspecified condition such as run-time state of the book database inventory. Listing 5.1: Example of a defensive read/write operation in BookDetailsServlet Currency c = (Currency) session.getAttribute("currency"); if (c == null) { c = new Currency();
80 Bridging the Gap Between Web Application Firewalls and Web Applications
BookDetailsServlet: ResourceBundle messages (read) Currency currency (cond. def. read/write) BookStoreServlet : ResourceBundle messages (def. read/write) ReceiptServlet: ResourceBundle messages (read) ShoppingCart cart (def. read/write) OrderFilter: ShoppingCart cart (read) Currency currency (read)
CashierServlet: ResourceBundle messages (read) ShoppingCart cart (def. read/write) Currency currency (def. read/write) CatalogServlet: ResourceBundle messages (read) ShoppingCart cart (def. read/write) Currency currency (def. read/write) ShowCartServlet: ResourceBundle messages (read) ShoppingCart cart (def. read/write) Currency currency (cond. def. read/write)
Table 5.1: Interactions with the shared session repository in the BookStore application session . setAttribute("currency", c); }
Session repository interactions are typically not specified in a Servlet-based application and neither they are in this J2EE tutorial application. Thus, the implicit assumptions of the developer on how a servlet or filter should be used with respect to its interactions with the shared session repository are not shipped together with the source code. This makes correct deployment or software evolution very hard without reanalysing the complete source code. Even in this small e-commerce application, the interactions with the shared session repository impose restrictions on the allowed client/server interaction protocol. If for example a user session starts with any URL path other than the /bookstore starting point of the application (which is a typical forceful browsing attack), the execution of any servlet ends up with a NullPointerException: every servlet retrieves the messages data item from the shared repository and assumes in its execution that the retrieved ResourceBundle is not null. Another NullPointerException occurs in this small application if the OrderFilter (applied on the ReceiptServlet) is called in a user’s session before the cart and currency data items are stored to the shared repository. The problem does however not occur if the OrderFilter is not applied to the ReceiptServlet, which may indicate that this error was introduced to the application due to evolution. The impact of a NullPointerException during execution depends on the particular application. Possible consequences include the execution of unexpected application logic, information leakage due to bad error handling, broken data integrity by storing null strings to the database back-end, skipping of clean-up code (such as the code that closes database connections) which in turn may lead to a Denial-of-Service, and many more. In the remainder of this chapter we assume that the occurrence of a NullPointerException due to data repository interactions in a web application negatively affects the security of the application and thus should
5.4 Solution
81
be prevented from happening. More precisely, we define the desired application property as follows: No broken data dependencies in the user’s session shared data repository: No client request causes a data item to be read from the server-side, shared session repository before it is actually written. For each shared data read interaction, the shared data item that is already written to the shared session repository is of the type expected by the read operation.
5.4
Solution
In this section we propose our solution: we specify a component’s interactions with the shared session repository and use static and dynamic verification to guarantee that no client/server interaction leads to violation of the the desired application property. Figure 5.2 depicts an overview of our solution. At the left side of the figure the different artefacts of our application are listed. Next to the implementation and the deployment information, also the WAF strict flow enforcement policy and the run-time web traffic are used as input for our verification process. The verification process consists of three steps. Firstly, the interactions with the shared session repository are explicitly specified in component contracts, and static verification is used to verify that the component implementations obey to the contract specification. Secondly, static verification ensures that any client/server interaction protocol that complies with the WAF enforcement policy actually satisfies the component’s preconditions on the shared session repository. Finally, runtime policy enforcement is used to guarantee that only web requests that obey the WAF enforcement policy are allowed to be processed by the web application. This is exactly the functionality that WAFs with strict request flow enforcement offer. By combining these three verification steps, our solution ensures the desired application property. Our solution is sound (if the assumptions underlying our pragmatic framing hold for the application being protected, see subsection 5.6.2). Our solution is however not complete: to avoid undecidability issues, the static verification is necessarily conservative. Server-side specification and verification In order to specify a component’s interactions with the shared session repository, each web component is extended with an appropriate component contract. The contract is expressed in a problem-specific contract language, which is easy to understand for application developers. Listing 5.2 for example, shows such a problem-specific contract of the ShowCartServlet, which is a straightforward mapping of the interactions specified in table 5.1. The EBNF notation of the
82 Bridging the Gap Between Web Application Firewalls and Web Applications
Figure 5.2: Solution overview Listing 5.2: Problem-specific specification of ShowCartServlet //spec: reads {ResourceBundle messages, Nullable<ShoppingCart> cart, // Nullable currency} from session; //spec: writes {cart == null => ShoppingCart cart} on session; //spec: possible writes {currency == null => Currency currency} on session;
problem-specific contract is shown in listing 5.3. Next, static verification is used to verify that a component’s implementation obeys its contract, i.e. that only the read and write interactions happen that are specified in the contract. Listing 5.3: ENBF notation of the problem-specific contract language CONTRACT := SPECLINE∗ SPECLINE := SPECTAG ( READS | WRITES | CONDITIONALWRITES) SPECTAG := ’//spec: ’ READS := ’reads ’ READOBJECTSET ’ from session;’ READOBJECTSET := ’{’ (READOBJECT,)∗ READOBJECT ’}’ READOBJECT := ( ’Nullable<’ TYPE ’>’ NAME | TYPE NAME ) WRITES := ’writes ’ WRITEOBJECTSET ’ on session;’ WRITEOBJECTSET := ’{’ (WRITEOBJECT,)∗ WRITEOBJECT ’}’ WRITEOBJECT := (CONDITION ’==>’ TYPE NAME | TYPE NAME) CONDITION := NAME ( ’== null’ | ’!= null’ ) CONDITIONALWRITES := ’possibly’ WRITES TYPE := IDENTIFIER NAME := IDENTIFIER
5.4 Solution
83
Figure 5.3: Client/server interaction protocol Listing 5.4: EBNF notation of the client-server protocol PROTOCOL := /bookstore + SERVLET A ∗ + RECEIPT RECEIPT := ( SERVLET B + SERVLET ∗ + orderfilter + /bookreceipt ) | nil SERVLET := SERVLET A | SERVLET B SERVLET A := /bookstore | /bookdetails | /bookshowcart | /banner | nil SERVLET B := /bookcatalog | /bookcashier
Application-specific protocol verification As the first input artefact for the application-specific protocol verification, the strict enforcement policy of the WAF is required. This representation of the intended client/server interaction protocol can be expressed in various ways such as a regular expression, an EBNF notation or a labelled state transition system. For example, listing 5.4 and figure 5.3 are two different representations of the intended protocol for the Duke’s BookStore application. Note that in web applications the protocol can be interrupted at any time, e.g. if a web user stops surfing to the given web application or closes the browser. This is indicated by nil in the EBNF notation, and with grey transitions in the labelled state machine. In order to statically verify that any client/server interaction (that conforms to the intended protocol) does not violate the desired application property, the intended protocol is verified in combination with the component’s contracts in a given deployment. In a J2EE web application for example, the web deployment descriptor contains among others the mapping between URLs and servlets, as well the servlets on which filters are applied. Run-time protocol enforcement Finally, the verified client/server protocol needs to be enforced at run-time. This is done by loading the protocol specification into a supporting WAF.
84 Bridging the Gap Between Web Application Firewalls and Web Applications
5.5
Prototype Implementation
In this section, we describe our prototype implementation and discuss how it can be used to secure the Duke’s BookStore application.
5.5.1
Server-side Specification and Verification
In order to use existing verifiers to check if the implementation of a component adheres to its contract, the problem-specific contracts are translated into the Java Modeling Language (JML) [119], which is a popular formal contract specification language for components written in Java. The JML contract in listing 5.5 expresses interactions between actions and the shared data repository in terms of pre- and post-state of the repository. Listing 5.5: Contract for shared session repository interactions (ShowCartServlet.spec) package servlets; public class ShowCartServlet extends HttpServlet { //@ also //@ requires request != null ; //@ requires response != null ; //@ requires request . session .messages instanceof ResourceBundle; //@ requires request . session . cart instanceof ShoppingCart // || request . session . cart == null; //@ requires request . session .currency instanceof Currency // || request . session .currency == null; //@ ensures request.session . cart instanceof ShoppingCart; //@ ensures request.session .currency instanceof Currency // || request . session .currency == null; //@ ensures \old(request.session .currency) instanceof Currency ==> // \old( request . session .currency) == request.session.currency; //@ modifies request. session . cart ; //@ modifies request. session .currency; public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException; }
For read interactions, the component’s contract indicates that the component requires that a non-null data item of the specified type can be read from the shared repository. For write interactions, the ensures pragma states which data items on the shared repository will be non-null and of the specified type after method execution. In Listing 5.5 for example, the JML contract of the doGet method of the ShowCartServlet states that among others the shared data item messages will be a non-null ResourceBundle object before execution and after execution that the
5.5 Prototype Implementation
85
data item cart is ensured to be a non-null ShoppingCart. In addition, the modifies clause expresses the frame condition, i.e. what part of the session state a method is allowed to modify. Finally, notice the use of the also keyword. The ShowCartServlet extends the HttpServlet, and by doing so it inherits the public method specification of the doGet method. To refine the specification of an overridden method (e.g. by weakening preconditions or by strengthening postconditions), the specification in JML starts with the also keyword, which combines the specifications of the supertype and the subtype. Similarly, the also keyword can also be used in regular specification to combine different specification blocks into a nested specification. More information about the desugaring of also combinations in JML can be found in [99, 20]. Since the doGet method of the HttpServlet does not provide common behaviour for all the inheriting servlets, the supertype method is annotated with the strongest possible precondition, i.e. the requires false pragma. In this way, all inheriting servlets are able to weaken this precondition conform the Liskov principle. Note that this design decision prohibits polymorphic use. This was not an issue in the prototype implementation, since we did not encounter polymorphic use of the HttpServlet objects in the different servlet implementations or the applicationspecific check method. In our prototype, we chose to use the ESC/Java2 verifier [72]. To check the compliance of the component implementation with ESC/Java2, the specification of the shared repository is generated (listing 5.6). Hereby, explicit JML pragmas provide a mapping between a ghost field and the state of a specific data item in the hashtable since the current version of the ESC/Java2 tool does not support reasoning about hashtable indirections. This mapping allows us to express the state of the data repository in a component’s contract in terms of the object fields rather than hashtable indirections, and allows us to still reason about this state without losing the verification power of ESC/Java2. Listing 5.6: JML contract of the session repository (HttpSession.spec) package javax.servlet.http; public interface HttpSession { //@ public ghost Object cart ; //@ public ghost Object currency; //@ public ghost Object messages; //@ //@ //@ //@ //@ //@ //@
requires name == ”cart”; ensures this. cart == value; modifies this . cart ; also requires name == ”currency”; ensures this.currency == value; modifies this .currency;
86 Bridging the Gap Between Web Application Firewalls and Web Applications //@ also //@ requires name == ”messages”; //@ ensures this.messages == value; //@ modifies this .messages; public void setAttribute(String name, Object value); //@ requires name == ”messages”; //@ ensures \result == this.messages; //@ also //@ requires name == ”currency”; //@ ensures \result == this.currency; //@ also //@ requires name == ”cart”; //@ ensures \result == this.cart; public /∗@ pure @∗/ Object getAttribute(String name); //@ requires false ; public void removeAttribute(String name); }
In verifying the compliance, we used a pragmatic framing approach instead of verifying the JML modifies clauses with ESC/Java2, as will be explained in subsection 5.6.2. Hereby, we verify the specified frame conditions only with respect to state changes in the shared repository, and ignore state changes in other parts of the application. This pragmatic framing is sufficient for our approach since we are only interested in the component’s interactions with the shared repository.
5.5.2
Application-specific Protocol Verification
To statically verify that any client/server interaction does not violate the desired application property, a server-side protocol check is automatically generated from the protocol specification. This protocol check simulates the intended protocol in a server-side method body, in which every web interaction is translated into a method call to the appropriate request processing component (if needed preceded by one or more filters). In addition, reactive or indeterministic behaviour is translated by applying the java.util.Random class, if-then-else branches, switch-cases and while-loops. The protocol-simulating check method for the Duke’s BookStore application is listed in listing 5.7. The application-specific protocol verification is then reduced to statically verifying the implementation of the check method with ESC/Java2. Compliance to a component’s assumption on the shared session state is verified implicitly since ESC/Java2 checks that the preconditions are fulfilled for each method that is called.
5.5 Prototype Implementation
Listing 5.7: Protocol-simulating check method to be verified by ESC/Java2 //@ requires request != null ; //@ requires request . session .messages == null && request.session.cart == null // && request.session.currency == null; public void protocolCheck(HttpServletRequest request, HttpServletResponse response){ try { Random random = new Random(); bookstore.doGet(request,response); while(random.nextBoolean()){ int randomInt = random.nextInt(); switch(randomInt){ case 0: showcart.doGet(request,response); break; case 1: banner.doGet(request,response); break; case 2: bookstore.doGet(request,response); break; case 3: bookdetail.doGet(request,response); break; default: break; } } if (random.nextBoolean()){ switch(random.nextInt()){ case 0: cashier .doGet(request,response); break; default: catalog.doGet(request,response); break; } while(random.nextBoolean()){ switch(random.nextInt()){ case 0: showcart.doGet(request,response); break; case 1: catalog .doGet(request,response); break; case 2: cashier .doGet(request,response); break; case 3: bookstore.doGet(request,response); break; case 4: bookdetail.doGet(request,response); break; case 5: banner.doGet(request,response); break; default: break; } } orderFilter . doFilter(request,response,null); receipt .doPost(request,response); } } catch(Exception e) { e.printStackTrace(); } }
87
88 Bridging the Gap Between Web Application Firewalls and Web Applications
Figure 5.4: Class diagram of the run-time enforcement engine
5.5.3
Run-time Protocol Enforcement
Since the static verification step requires that the protocol at run time adheres to the intended client/server protocol, run-time enforcement is needed to ensure that only requests conform the intended protocol are processed by the application. As a proof of concept, we embedded a lightweight run-time enforcement engine in our web application container by installing a custom J2EE Filter. Before a servlet is invoked by means of the service(ServletRequest request, ServletResponse response) method in a J2EE web application, a chain of deployed filters is always applied to the request. At deployment time, our enforcement engine is loaded with an object-oriented instantiation of the labelled state transition system (figure 5.4). For each user session the current state is stored, and for each incoming web request, the enforcement engine verifies that the transition is allowed and the current state is updated before the request is dispatched to the servlet. In case of a protocol violation, a pluggable strategy is consulted, defining the action that should be taken ranging from blocking access to the originator’s IP or invalidating the user’s session to just logging the access violation.
5.6 5.6.1
Discussion Results of the BookStore Experiment
Annotation overhead As a quantification of annotation overhead, a specification line count is performed on the annotated components. At most 4 lines of specification are used to express the interactions with the shared session repository. In addition, thanks to the pragmatic framing (see subsection 5.6.2) almost no library calls need to be instrumented to verify the different components.
5.6 Discussion Component BookDetailsServlet BookStoreServlet ReceiptServlet ShowCartServlet
89 Verif. time 67.285 s 16.943 s 5.433 s 216.212 s
Code lines 74 61 65 157
Component OrderFilter CashierServlet CatalogServlet
Verif. time 31.632 s 62.742 s 225.866 s
Code lines 61 60 123
Table 5.2: Verification performance Static verification performance To evaluate the performance of the static verification process, the verification time is measured. The performance tests were run on a Pentium Mobile (1.4GHz) with 512MB RAM, running Debian Linux, while using Java 1.4.2 09, ESC/Java2 2.0a9 and Simplify 1.5.4. Table 5.2 shows the performance results of verifying the implementation compliance. The verification of the protocol-simulating method succeeded smoothly in about 11 seconds. Run-time enforcement overhead To estimate the overhead of the run-time flow enforcement, we ran the following experiment on the BookStore application with and without our enforcement filter. We sequentially simulated 1000 different visitors, in which each user’s protocol consisted of 6 web requests and 2 % of the visitors applied forceful browsing. In this experiment, we measured a run-time overhead of 1.3 %. The BookStore application was deployed on the Sun Java System Application Server Platform Edition 8.2.
5.6.2
Limitations
Sequential processing In our solution, we assume that the web requests in a user’s session are processed sequentially on the web server, in the same order as they pass the WAF. In practice however web servers tend to process incoming requests in a multithreaded way (e.g. by using thread pools). At this moment our solution is only valid if our assumption on sequential execution holds, and therefore, supporting concurrent request processing is an important topic for future research. Pragmatic framing In order to reduce the verification complexity and the overhead of instrumenting all library calls, we use a pragmatic framing approach to verify if a component’s implementation obeys to its contract. Instead of letting ESC/Java2 verify the
90 Bridging the Gap Between Web Application Firewalls and Web Applications modifies clauses, we use a component-specific specification of the session repository, in which we constrain the allowed write operations to the actual write interactions that the component claims to have in its modifies clauses. Listing 5.8 is an example of such a component-specific annotation to use with the ShowCartServlet: the precondition of the setAttribute method states that only write operations are allowed for the cart and currency data item. In contrast to the complete specification of the session repository (listing 5.6), the messages data item may not be modified by the ShowCartServlet. Listing 5.8: Component-specific specification of the repository (HttpSession.spec for ShowCartServlet) package javax.servlet.http; public interface HttpSession { //@ public ghost Object cart ; //@ public ghost Object currency; //@ public ghost Object messages; //@ requires name == ”cart”; //@ ensures this. cart == value; //@ modifies this . cart ; //@ also //@ requires name == ”currency”; //@ ensures this.currency == value; //@ modifies this .currency; public void setAttribute(String name, Object value); //@ requires name == ”messages”; //@ ensures \result == this.messages; //@ also //@ requires name == ”currency”; //@ ensures \result == this.currency; //@ also //@ requires name == ”cart”; //@ ensures \result == this.cart; public /∗@ pure @∗/ Object getAttribute(String name); //@ requires false ; public void removeAttribute(String name); }
In case the component’s implementation triggers an unspecified state change in the shared data repository, the verification of the component with ESC/Java2 will detect this contract violation (even without checking the component’s modifies clauses), since the state change will also violate the precondition of the component-
5.7 Related Work
91
specific setAttribute annotation of the shared repository. Of course, our pragmatic framing approach only guarantees correct framing regarding to state changes on the shared session repository, since other state changes in the application are neglected on purpose. Although such a pragmatic framing is not applicable for general verification purposes, this framing approach is sufficient for our verification process since we are only interested in the component’s interactions with the shared repository. In fact, the pragmatic framing approach can also be seen as an advantage rather than a limitation of our approach. Thanks to the pragmatic framing, we are able to verify partially specified components, i.e. we only specify the parts of the contract that we are actually interested in (namely the interactions with the shared repository). In case we apply traditional framing, we also have to specify every state change that occurs in the application by executing a component’s method, as well as to annotate every called library method with its appropriate frame condition. For the application-specific protocol verification, we let ESC/Java2 rely on the original modifies clauses of the different components, in combination with the full specification of the shared session repository from listing 5.6.
5.7
Related Work
Several implementation-centric security countermeasures for web applications have already been proposed [98, 52, 15, 90, 53], but most of them focus on injection attacks (SQL injection, command injection, XSS, . . . ) and use tainting or data flow analysis. Our solution targets another set of implementation bugs, namely bugs due to broken data dependencies on the shared, server-side state and to do so we rely on the verification of component contracts. We combine in our solution static and dynamic verification to reduce the runtime enforcement overhead. The idea of combining static and dynamic verification is not new, and is for instance already adopted by Yao-Wen Huang et al. in securing web application against web vulnerabilities caused by insecure information flow, such as SQL injection, XSS and command injection [55]. Their approach uses a lattice-based static analysis algorithm for verifying information flow based on type systems and type state. Sections of code considered vulnerable are automatically instrumented with run-time guards. In contrast, our approach aims to reduce runtime errors due to composition problems. In addition, our approach is based on program annotations and the verification of component preconditions. In [92], Jeff Offutt et al. generate bypass tests which check if an online web application is vulnerable to forceful browsing or parameter tampering attacks. The bypass tests are blackbox tests using data that circumvents client-side checks. They define three levels of bypass testing: value level, parameter level and control flow level bypass testing. At this moment, our approach only counters the latter
92 Bridging the Gap Between Web Application Firewalls and Web Applications one, but in future work we want to investigate how well our approach is suited to counter the other two levels as well. The bypass tests are able to detect bugs in the application, but they work implementation-agnostic and do not give formal guarantees about the absence of bugs. Firewall configuration analysis is proposed to manage complex network infrastructures (such as networks with multiple network firewalls and network intrusion detection systems) [121, 48]. Their approaches aim to achieve efficiency and consistency between the different network-layer security devices, whereas our approach focusses on the application-layer consistency between the WAF and the web application. The use of JML or related languages such as Spec# [11] for verifying component properties is a very active research domain. For example, Smans et al. [108] specify and verify code access security properties; Jacobs et al. [62] verify absence of data races and Pavlova et al. [94] focus on security properties of applets. Other applications of JML are surveyed in [16].
5.8
Conclusion
This chapter has focussed on bridging the gap between WAFs which enforce strict request flow, and some of the implementation-specific bugs that such firewalls try to protect. We showed that through a combination of static and dynamic verification, WAFs can formally guarantee the absence of certain kinds of erroneous behaviour in web applications. In particular, we did guarantee that if the combination of a web application and a WAF policy passes our verification process, no client/server interaction will break the data dependencies on the shared session state between server-side components. Although there are still some limitations with our proposed solution (as discussed in subsection 5.6.2), the conducted experiment shows that using existing verification tools to improve web application security looks promising.
Acknowledgements The authors would like to thank Wolfram Schulte (from Microsoft Research), Bart Jacobs, Adriaan Moors and Jan Smans (from the Katholieke Universiteit Leuven) for their useful comments and insight in some interesting discussions on this research.
Chapter 6
Conclusion 6.1
Summary
In this section we summarise the specific contributions presented in this thesis and recapitulate the results of the experiments that validate the usefulness of these contributions.
6.1.1
Main Contributions
We started this thesis by developing good insights in the dependency management of component-based software systems with a shared data repository. In particular, we investigated dependencies that occur due to composing different components that share a common data item on the shared data repository. We identified some typical composition problems in such data-centered applications by exploring application development in two different application domains: component-based protocol stacks and servlet-based web applications. In addition, to investigate the complexity of the dependency management in larger data-centered applications, we also performed an in-depth dependency analysis of GatorMail, a medium-sized webmail application. After the identification of some common composition problems in data-centered applications, we defined the no broken data dependencies composition property for component-based software applications with a shared data repository. Achieving this property in a given composition eliminates a large number of the identified composition problems in data-centered applications. The main goal of this thesis is then to reduce the number of run-time errors by formally verifying that a given composition does not violate the no broken data dependencies composition property. The formal verification of the property is achieved in two steps. In a first 93
94
Conclusion
step, we statically verify the desired composition property in deterministic software compositions. To do so, each component of the composition is extended with a component contract, stating in a problem-specific contract language what the shared data interactions of the component are. These contracts are verified in two steps. First, the compliance of the component’s implementation with its contract is checked. Next, the composition property is verified within the given composition relying on the different component contracts. In a second step, we extend the verification of the desired composition property towards reactive software systems by combining the static verification approach with run-time checking. Once again, each component of the composition is extended with a component contract, stating in a problem-specific contract language what the shared data interactions of the component are. In addition, the expected, reactive protocol between the client and the server is expressed in a labelled state machine. Next, we statically verify with the earlier proposed solution whether or not the desired property is violated in a given composition, while assuming that the actual interaction protocol adheres to the expressed state machine. Finally, we use run-time checking of the client-server protocol to guarantee that the protocol adheres to the verified labelled state machine and thus that the no broken data dependencies composition property also holds in the given composition.
6.1.2
Validation
We validated the formal verification of the no broken data dependencies composition property in both a deterministic and a reactive software composition, i.e. the GatorMail and BookStore case studies. We were particularly focussed on achieving the two extra-functional criteria formulated in chapter 2: reasonable overhead and applicability to real-life applications. The GatorMail Case Study. To validate the verification process for deterministic compositions, we applied the solution proposed in chapter 4 to the mediumsized, webmail application GatorMail. For this case study, we automatically translated the problem-specific component contracts into more general and widespread JML contracts, in order to verify them with the existing verification tool ESC/Java2. Next, to verify the composition property in a given composition, a composition-specific check method is automatically generated simulating the control flow transitions within the composition. We showed in this first case study that, thanks to the modular verification process, our solution scales quite easily to larger software applications. In addition, with some slight refinements, the presented approach is applicable to real-life web applications using Java Servlets or Struts. Furthermore, we measured in this case study a maximum annotation overhead of 4 lines of component specification and
6.1 Summary
95
a maximum per-component verification time of 15 seconds.
The BookStore Case Study. We validated the verification process for reactive applications (chapter 5) by applying our approach to the Duke’s BookStore application, taken from the J2EE tutorial. Although this application is much smaller than the GatorMail webmail application, this application is much more appealing to validate because of the more interesting data interactions with the shared session repository. Similar to the GatorMail case study, we translated the problem-specific contracts into JML annotations and verified them with ESC/Java2. Next, we used the labelled state machine as input to automatically build the composition-specific check method and verified this method statically with ESC/Java2. Finally, we built by means of a J2EE filter a primitive Web Application Firewall which drops web traffic that does not comply with the labelled state machine. In this second case study, we measured a small annotation overhead of at most 4 lines of specification. Due to the longer method bodies however, the modular verification time grew up to 4 minutes for two of the components, which is still feasible but already indicates the well-known limitations of current verification tools. In addition, we measured in our experiments a quite reasonable run-time overhead of 1.3 % for the run-time request flow enforcement.
Both case studies showed that improving the reliability and security of web application by using existing verification tools looks promising. We achieved a reasonable overhead both in terms of workload and verification time. Furthermore, we strongly believe that with improved tool support, the software developer and composer can be further shielded from the underlying technical details, as will be further explained in section 6.3. In the two case studies, we also illustrated that the proposed solution scales to larger, real-life software applications thanks to the modular specification and verification. In addition, the practical case studies also revealed some opportunities to improve the current generation of verification tools. For example, modern applications and software frameworks tend to use a lot of reflection and indirections both in the control flow and the dataflow. In the control flow for instance, the declarative forwarding concept used in the Struts case study and event-based communication mechanisms are quite often applied in nowadays frameworks. Similarly, the use of hashtables and shared data repositories create an indirection on the dataflow. Both indirection mechanisms lead to extra difficulties in verifying the cooperation between components in component-based software compositions and constitute an obstacle to easily verify larger scale software applications.
96
6.2
Conclusion
Future Work
This section briefly discusses two topics for extending the research presented in this work, and signals opportunities to generalise our solution to achieve concernspecific software verification. One of the limitations of the presented verification approach in reactive systems is the assumption of server-side, sequential processing of the incoming requests. In practice however, web servers typically use thread pools to process incoming requests in a multi-threaded way. To make the proposed solution also sound in multi-threaded environments, coarse-grained concurrency control can relatively easy be added to the Web Application Firewall. For example, by taking a lock on the user’s session, the Web Application Firewall can serialise the requests belonging to the same user session. Since this serialisation may affect the performance and responsiveness of the application, it would be interesting to investigate more fine-grained concurrency models and their impact on the performance. Another topic for future research is the granularity of the labelled state machine. In the current approach, a transition is triggered by a web request for a certain URL, but in a more fine-grained state machine of a web application, the values of the supplied input parameters can also influence the resulting end state. In addition, component specifications can be extended to report the reading of these input parameters or can have preconditions based on their values. Combining the extended contracts and the more fine-grained state machine results in a more powerful verification process in which the passing of input parameters between the client and the server is also taken into account. In particular, the verification process would, for example, also be able to verify whether server-side input validation checks are sufficient to protect a particular software composition. In this thesis we attempt to reduce the number of run-time errors due to implementation-specific bugs. To do so, we use a problem-specific specification language to formally specify and verify one particular composition property, i.e. the no broken data dependencies property. A quite similar approach is also applied in our research group to prevent race conditions and deadlocks in multi-threaded applications [63] and to prevent stack-inspection based bugs [108]. Therefore, it would be interesting to generalise this idea of concern-specific verification in Java-like programs and investigate which type of run-time errors can be detected or prevented by using formal verification of concern-specific annotations. A first attempt to such concern-specific verification in Java-like programs is presented in [30]. In addition, further research is also needed to investigate the composability of several concern-specific annotations and the static verification of these contracts in real-life software systems.
6.3 Developer-centric Verification
6.3
97
Developer-centric Verification
In this thesis, formal specification and static verification is used as a basic instrument to ensure quality characteristics in component-based software compositions. Nevertheless, the problem statement and the proposed solutions are still close to a developer’s view on component-based software development and the formal tools are used as black box solution enablers. This viewpoint resulted in a pragmatic approach to apply formal techniques. Firstly, in this thesis, existing formal specification and verification tools are used from an end-user perspective. This allows us to build a rapid prototype of our solution without the need to build a sound, static verifier from scratch, and without deep inside knowledge of the verification tool used. In addition, using existing verification tools in larger software projects also gives us better insights in the power of these tools on the one hand and in the limitations and shortcomings on the other hand. The drawback of this end-user’s perspective is however that in our prototype we have to avoid some shortcomings of the current version of the verification tool used, such as the limited reasoning on hashtable indirections and known unsoundness issues. In a next generation prototype of the proposed solutions, it would be interesting to further investigate the advantages of a dedicated software verifier or a dedicated type system in order to guarantee the no broken data dependencies property, especially, now that the current prototype already illustrates the feasibility of the verification process. Secondly, since the main goal of this research is to reduce the number of runtime errors by verifying that a particular composition property holds (i.e. the no broken data dependencies composition property), we use partial specification of the component’s behaviour in a concern-specific annotation language to significantly reduce the annotation overhead for the developer. However, this partial specification also implies that, as a consequence, the framing conditions of the different components are partially specified as well, and that the traditional framing verification is not feasible in such scenarios. In this thesis, we use a pragmatic framing approach to only verify state changes on the shared data repository. Although this particular framing approach rather is an ad hoc solution, and it is as such not applicable to verify other concern-specific annotations, it would be an interesting feature for verification tools to support more generally this kind of limited framing scope, i.e. only a well-defined part of state changes is relevant in verifying a component’s frame condition. Thirdly, the developer-centric point of view also influences the solutions proposed in this thesis. Every step of the solution aims to achieve the right balance between the power of formal verification techniques on the one hand and limited overhead for the developer and software composer and applicability to real-life software applications on the other hand. Moreover, valorising this research in a concrete developer’s tool would probably even shift this balance further in favour
98
Conclusion
of the developer and composer in several steps of the solution: Concern-specific annotation and contract verification To reduce the annotation overhead for the software developer, the concern-specific annotation of a component can be deduced from its implementation. This specification inference significantly lowers the impact on the development process, but also implies that some implementation bugs will only be identified during the property verification phase (typically close to deployment), rather than during the contract verification phase in which mismatches can already be detected between the concern-specific annotation and the component’s implementation. Moreover, the specification deduction eliminates the need for checking the compliance between concern-specific annotation and the component’s implementation. Application-specific protocol specification and run-time enforcement In reactive systems, the application-specific protocol is needed as input artefact for the static verification and run-time checking. This protocol can be expressed in different ways, such as a labelled state machine or an EBNF notation. At this moment, the labelled state machine is constructed manually, based on the expected use of the web application and the URLs generated by the different web pages. In order to reduce the composer’s involvement in the verification process, this application specific protocol can also be constructed automatically based on a representative client implementation (e.g. in case of a rich web client), by analysing the hyperlinks generated by the different web pages or by observing legitimate web traffic as is often done in Web Application Firewalls. Static property verification Chapter 4 and 5 already illustrated that automatic static verification of the composition property is feasible. But, to be more composer-friendly, extra support is needed to give useful feedback in case the verification process fails. This is especially the case if the software composer is almost completely shielded of the underlying specification and verification process. Integrated tool To be practically useful for software developers, providing an integrated tool is essential. The different subtasks of the solution can be integrated in a single framework-specific tool, and is preferably embedded in the developer’s Integrated Development Environment (IDE).
6.4 Concluding Thoughts
6.4
99
Concluding Thoughts
In this thesis, only a small subset of the appealing challenges for formal verification in loosely-coupled software systems have been explored. The application of formal verification techniques to the indirect sharing problem led to an interesting feasibility study of state-of-the-art verification tools in medium-sized software applications. The encouraging results of this feasibility study strengthen us in our belief that software verification will become a key enabler for several tracks in future software engineering. In addition, we think that a developer-centric view on software verification as illustrated in this thesis can help to get formal techniques more widely accepted. From an industry perspective, we observe a strong-growing need for software verification. In several application domains such as Enterprise Applications and Telecommunication, the current requirements to build very flexible, services-based, open software systems stretch current vendors to the limits of their know-how and even beyond. Building such extremely flexible systems in a reliable and secure fashion becomes almost infeasible, and is nevertheless a key requirement in upcoming business models. For instance, the Networked European Software and Services Initiative (NESSI) started quite recently to bundle European research efforts in services-based software engineering and defined research priorities, quite similar to the ones of the seventh framework programme of the European Union (FP7). They aim to fill the gaps in current service architectures and to create an open environment that will facilitate the development and dynamic composition of services. Therefore they drive research towards providing safe, reliable and secure delivery of new services across different domains. In this research space, we see several opportunities and challenges for formal verification techniques. At the same time, we identify recent trends in software composition that push the loose coupling of components even a step further. In recent software frameworks, additional flexibility is introduced in the application, for instance, by introducing indirections in both the control flow and the dataflow. This additional flexibility seriously reduces the number of syntactical and semantical consistency checks that can be done by current compilers and type systems. This results in less reliable software compositions and in an increase of run-time errors. More generally stated: the looser the coupling, the more additional verification is needed to verify if compositions are well composed. In this thesis we investigated indirect data sharing and declarative control flow, but also a broad spectrum of other flexibility techniques is widely used. For example, similar to the indirect sharing technique, the use of an event bus enables anonymous cooperation of loosely-coupled components. Software systems can also achieve high configuration flexibility by applying late composition of the components, i.e. compositions are described in XML based configuration files or
100
Conclusion
the software system allows run-time reconfigurations, and reflection techniques are used to build the composition at run time. Similarly, services can be dynamically composed in services-based applications to cope among others with varying enduser requirements and the availability of services. Also the composition of filters in an interception architecture or the application of aspects in aspect-oriented software development impacts the execution of the application and may require additional verification of the overall composition. Finally, we observe an increasing contribution from the formal verification community to software engineering research. Formal verification is already able to leverage reliability and security research in several software engineering tracks. In particular, we see very appealing synergy opportunities between formal verification and end-to-end attestation, and between formal verification and model driven engineering. The technologies of formal software verification and trusted computing provide complementary assurance guarantees and the combination of both technologies can result in more powerful assurances. Based on supporting hardware, the virtual machine monitor (VMM) of the trusted computing platform allows multiple virtual machines to be isolated from each other and to run in parallel. An attestation service in the VMM can then provide assurance that a specific program is actually running with a specific configuration. This technology is a key enabler for many security solutions, such as Digital Rights Management (DRM) and sealed storage. In the past, the two technologies have mainly been studied in isolation. We see three concrete opportunities in the combination of both technologies. Firstly, combining formal verification of certain software properties with remote attestation of the produced soundness proofs results in end-to-end attestation of the verified properties. Secondly, formal verification can be applied to verify more complex attestation services in the VMM or security services in isolated virtual machines. Finally, services in the VMM can fulfil the needed run-time verification in cases where static and dynamic verification needs to be combined, or they can ease the verification process by providing additional infrastructural assurances, such as atomic execution or adherence to a fairness protocol. The combination of model driven engineering and formal verification can lead to easier software verification and more reliable software systems. Model driven engineering is based on the extensive use of models to explicitly document artefacts in different phases of the software development lifecycle, in combination with (semi-)automatic transformations between these models. In model driven software, formal verification can be applied to verify whether the transformations used are property preserving. If so, the verification of software properties in a specific application can be done based on the explicitly modelled artefacts. To conclude, we see several challenging opportunities for formal verification techniques in multiple research domains and consider formal verification as one
6.4 Concluding Thoughts
101
of the key enablers in future software engineering. We also believe that the work presented in this thesis is a small but important contribution in this evolution.
102
Conclusion
Bibliography [1] GatorMail, Webmail at the University of Florida. http://webmail.ufl.edu/. [2] ISO/IEC 17799:2000 – Information technology – Code of practice for information security management. [3] SSE–CMM : Systems Security Engineering – Capability Maturity Model (homepage). http://www.sse-cmm.org/. [4] Franz Achermann and Oscar Nierstrasz. Applications = Components + Scripts — A Tour of Piccola. In Mehmet Aksit, editor, Software Architectures and Component Technology, pages 261–292. Kluwer, 2001. [5] Alcatel Bell and K.U.Leuven. Towards Service Centric Access Networks (SCAN), IWT research project #010319, August 2001–2003. [6] Jonathan Aldrich. Using Types to Enforce Architectural Structure. PhD thesis, University of Washington, August 2003. [7] Robert Allen and David Garlan. A formal basis for architectural connection. ACM Trans. Softw. Eng. Methodol., 6(3):213–249, 1997. [8] Jo ao P. A. Almeida, Maarten Wegdam, Marten van Sinderen, and Lambert Nieuwenhuis. Transparent Dynamic Reconfiguration for CORBA. In Proceedings of the 3rd International Symposium on Distributed Objects and Applications (DOA’01), pages 197–207. IEEE Computer Society, September 2001. [9] Eric Armstrong, Jennifer Ball, Stephanie Bodoff, Debbie Bode Carson, Ian Evans, Dale Green, Kim Haase, and Eric Jendrock. The J2EE 1.4 Tutorial. Sun Microsystems, Inc., December 2005. [10] Izhar Bar-Gad. Web application firewalls protect data. http://www. networkworld.com/news/tech/2002/0603tech.html, March 2002. 103
104
BIBLIOGRAPHY
[11] Mike Barnett, K. Rustan M. Leino, and Wolfram Schulte. The Spec# Programming System: An Overview. Lecture Notes in Computer Science, 3362, 2004. [12] Len Bass, Paul Clements, and Rich Kazman. Software Architecture in Practice. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1998. [13] Elisa Bertino, Danilo Bruschi, Stefano Franzoni, Igor Nai-Fovino, and Stefano Valtolina. Designing a Secure Database in a Web Application. In Communications and Multimedia Security, pages 159–171. International Federation of Information Processing (IFIP), Springer, 2005. [14] Antoine Beugnard, Jean-Marc J´ez´equel, No¨el Plouzeau, and Damien Watkins. Making components contract aware. In IEEE Software, pages 38–45, june 1999. [15] Stephen W. Boyd and Angelos D. Keromytis. Sqlrand: Preventing sql injection attacks. In ACNS, pages 292–302, 2004. [16] Lilian Burdy, Yoonsik Cheon, David Cok, Michael Ernst, Joe Kiniry, Gary T. Leavens, K. Rustan M. Leino, and Erik Poll. An overview of JML tools and applications. International Journal on Software Tools for Technology Transfer (STTT), 7(3):212–232, June 2005. [17] David Chadwick. Threat Modelling for Active Directory. In Communications and Multimedia Security, pages 173–182. International Federation of Information Processing (IFIP), Springer, 2005. [18] Paul C. Clements. A survey of architecture description languages. In Proceedings of the 8th International Workshop on Software Specification and Design, page 16. IEEE Computer Society, 1996. [19] Danny De Cock, Karel Wouters, Dries Schellekens, Dave Singelee, and Bart Preneel. Threat Modelling for Security Tokens in Web Applications. In Communications and Multimedia Security, pages 183–193. International Federation of Information Processing (IFIP), Springer, 2005. [20] David R. Cok. ESC/Java2 Implementation Notes. [21] Web Application Security Consortium. The Web Hacking Incidents Database. http://www.webappsec.org/projects/whid/. [22] Eric M. Dashofy, Andr´e Van der Hoek, and Richard N. Taylor. A highlyextensible, xml-based architecture description language. In Proceedings of the Working IEEE/IFIP Conference on Software Architecture (WICSA’01), page 103. IEEE Computer Society, 2001.
BIBLIOGRAPHY
105
[23] Eric M. Dashofy, Andr´e van der Hoek, and Richard N. Taylor. An infrastructure for the rapid development of xml-based architecture description languages. In Proceedings of the 24th International Conference on Software Engineering, pages 266–276. ACM Press, 2002. [24] John D. Day and Hubert Zimmermann. The OSI Reference Model. IEEE, 71:1334–1340, December 1983. [25] Chrysanthos Dellarocas. Toward a design handbook for integrating software components. In Proceedings of the 5th International Symposium on Assessment of Software Tools (SAST ’97), page 3. IEEE Computer Society, 1997. [26] Lieven Desmet. Adaptive System Software with the DiPS Component Framework. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium, May 2002. [27] Lieven Desmet, Liesbeth Jaco, Koenraad Mertens, and Tine Verhanneman. COTS, the safety nightmare of component-oriented frameworks. Report CW 367, Department of Computer Science, K.U.Leuven, Leuven, Belgium, September 2003. [28] Lieven Desmet, Bart Jacobs, Frank Piessens, and Wouter Joosen. A Generic Architecture for Web Applications to Support Threat Analysis of Infrastructural Components. In Communications and Multimedia Security, pages 125– 130. International Federation of Information Processing (IFIP), Springer, 2005. [29] Lieven Desmet, Bart Jacobs, Frank Piessens, and Wouter Joosen. Threat Modelling for Web Services Based Web Applications. In Communications and Multimedia Security, pages 131–144. International Federation of Information Processing (IFIP), Springer, 2005. [30] Lieven Desmet, Bart Jacobs, Frank Piessens, Wolfram Shulte, Jan Smans, and Dries Vanoverberghe. Concern-specific annotation languages to support static detection of bugs in Java-like programs. In Proceedings of the 5th International Symposium on Formal Methods for Components and Objects (FMCO 2006), November 2006. [31] Lieven Desmet, Sam Janssens, Nico Michiels, , Frank Piessens, , Wouter Joosen, and Pierre Verbaeten. Towards Preserving Correctness in SelfManaged Software Systems. In Proceedings of the 2004 ACM SIGSOFT Workshop On Self-Managed Systems (WOSS 2004), pages 34–38, Newport Beach, CA, USA, 2004. ACM SIGSOFT, ACM press.
106
BIBLIOGRAPHY
[32] Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Infrastructural support for data dependencies in data-centered software systems. In Proceedings of the Third AOSD Workshop on Aspects, Components, and Patterns for Infrastructure Software. [33] Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Static verification of composition properties. http://www.cs.kuleuven. be/∼lieven/research/SC2006/. [34] Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Improving software reliability in data-centered software systems by enforcing composition time constraints. In Proceedings of ICSE 2004 Third Workshop on Architecting Dependable Systems (WADS), pages 32–36. ICSE, The Institution of Electrical Engineers (IEE), 2004. [35] Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Improving software reliability in data-centered software systems by enforcing composition time constraints. In Proceedings of Third Workshop on Architecting Dependable Systems (WADS2004), pages 32–36, Edinburgh, Scotland, May 2004. [36] Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Dependency analysis of the Gatormail webmail application. Report CW 427, Department of Computer Science, K.U.Leuven, Leuven, Belgium, September 2005. URL = http://www.cs.kuleuven.ac.be/publicaties/rapporten/cw/CW427.pdf. [37] Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Bridging the Gap Between Web Application Firewalls and Web Applications. In Proceedings of the 2006 ACM Workshop on Formal Methods in Security Engineering, pages 67–77, November 2006. [38] Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Static Verification of Indirect Data Sharing in Loosely-coupled Component Systems. In Software Composition, volume 4089 of Lecture Notes in Computer Science, pages 34–49. Springer Berlin / Heidelberg, 2006. [39] K.U.Leuven DistriNet Research Group. DiPS home page. http://www.cs.kuleuven.ac.be/cwis/research/distrinet/projects/DIPS/. [40] Drake Emko, Sandy McArthur, Todd Williams, and Stephen L. Ulmer. GatorMail WebMail. http://sourceforge.net/projects/gatormail/. [41] Roy Fielding, Jim Gettys, Jeffrey Mogul, Henrik Frystyk, Larry Masinter, Paul Leach, and Tim Berners-Lee. Hypertext Transfer Protocol – HTTP/1.1. http://www.ietf.org/rfc/rfc2616.txt, 1999. Request For Comments: 2616 (Category: Standards Track).
BIBLIOGRAPHY
107
[42] Rune Fredriksen, Monica Kristiansen, Bjørn Axel Gran, Ketil Stølen, Tom Arthur Opperud, and Theo Dimitrakos. The CORAS framework for a model-based risk management process. 2434:94–??, 2002. [43] Free Software Foundation, Inc. CVS - Open Source Version Control. http: //www.nongnu.org/cvs/. [44] Eric Freeman, Ken Arnold, and Susanne Hupfer. JavaSpaces Principles, Patterns, and Practice. Addison-Wesley Longman Ltd., 1999. [45] David Garlan, Jeff Kramer, and Alexander L. Wolf, editors. Proceedings of the First ACM SIGSOFT Workshop on Self-Healing Systems (WOSS’02), Charleston, SC, USA, 2002. ACM Press, New York, NY, USA. [46] David Garlan, Jeff Kramer, and Alexander L. Wolf, editors. Proceedings of the 2004 ACM SIGSOFT Workshop on Self-Managing Systems (WOSS 2004), Newport Beach, CA, USA, 2004. ACM Press, New York, NY, USA. [47] Steven Gevers. Security analysis of web-based business applications and implementation of a countermeasure in the tomcat web container. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium, May 2005. [48] Korosh Golnabi, Richard K. Min, Latifur Khan, and Ehab Al-Shaer. Analysis of Firewall Policy Rules Using Data Mining Techniques. In 10th IEEE/IFIP Network Operations and Management Symposium (NOMS 2006), April 2006. [49] R¨ uidiger Grimm and Henrik Eichst¨adt. Threat Modelling for ASP.NET. In Communications and Multimedia Security, pages 145–158. International Federation of Information Processing (IFIP), Springer, 2005. [50] William G. Griswold, Jimmy J. Yuan, and Yoshikiyo Kato. Exploiting the map metaphor in a tool for software evolution. In ICSE ’01: Proceedings of the 23rd International Conference on Software Engineering, pages 265–274, Washington, DC, USA, 2001. IEEE Computer Society. [51] Zigmunt Haas and Siamak Tabrizi. On some challenges and design choices in ad-hoc communications. In IEEE MILCOM ’98, Bedford, Massachusetts, 1998. [52] Vivek Haldar, Deepak Chandra, and Michael Franz. Dynamic taint propagation for java. acsac, 0:303–311, 2005. [53] William G. J. Halfond and Alessandro Orso. Amnesia: analysis and monitoring for neutralizing sql-injection attacks. In ASE ’05: Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, pages 174–183, New York, NY, USA, 2005. ACM Press.
108
BIBLIOGRAPHY
[54] Michael Howard and David LeBlanc. The STRIDE Threat Model. From the Book ’Writing Secure Code’. Microsoft Press, 2002. [55] Yao-Wen Huang, Fang Yu, Christian Hang, Chung-Hung Tsai, Der-Tsai Lee, and Sy-Yen Kuo. Securing web application code by static analysis and runtime protection. In WWW ’04: Proceedings of the 13th international conference on World Wide Web, pages 40–52, New York, NY, USA, 2004. ACM Press. [56] J. Hunter and W. Crawford. Java Servlet Programming. O’Reilly, second edition, April 2001. [57] University of Southern California Information Sciences Institute. Internet protocol specification (ip). http://www.ietf.org/rfc/rfc0791.txt, 1981. Request for Comments: 791 (Category: Standards Track). [58] University of Southern California Information Sciences Institute. Transmission control protocol specification (tcp). http://www.ietf.org/rfc/rfc0793.txt, 1981. Request for Comments: 793 (Category: Standards Track). [59] Paola Inverardi and Massimo Tivoli. Automatic synthesis of deadlock free connectors for com/dcom applications. In Proceedings of the 8th European software engineering conference held jointly with 9th ACM SIGSOFT international symposium on Foundations of software engineering, pages 121–131. ACM Press, 2001. [60] Ioana S ¸ ora, Pierre Verbaeten, and Yolande Berbers. Using Component Composition for Self-customizable Systems. In Proceedings of Workshop On Component-Based Software Engineering: Composing Systems from Components, pages 23–26. IEEE, 2002. [61] James Ivers, Nishant Sinha, and Kurt Wallnau. A Basis for Composition Language CL. Technical Report CMU/SEI-2002-TN-026, SEI, Carnegie Mellon University, 2002. [62] Bart Jacobs, K. Rustan M. Leino, Frank Piessens, and Wolfram Schulte. Safe concurrency for aggregate objects with invariants. In Proceedings of the Third IEEE International Conference on Software Engineering and Formal Methods, pages 137–146. IEEE Computer Society, 2005. [63] Bart Jacobs, Jan Smans, Frank Piessens, and Wolfram Schulte. A Statically Verifiable Programming Model for Concurrent Object-Oriented Programs. In Proceedings of the Eighth International Conference on Formal Engineering Methods (ICFEM 2006), 2006.
BIBLIOGRAPHY
109
[64] Gerrit Janssen. Implementation of IPv6 in DiPS. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium, May 2002. [65] Nico Janssens, Lieven Desmet, Sam Michiels, and Pierre Verbaeten. NeCoMan: middleware for safe distributed service deployment in programmable networks. In Middleware 2004 companion workshop proceedings, pages 256– 261. ACM/IFIP/USENIX, ACM Press, 2004. [66] Nico Janssens, Wouter Joosen, and Pierre Verbaeten. NeCoMan: Middleware for Safe Distributed-Service Adaptation in Programmable Networks. IEEE Distributed Systems Online, 6(7), July 2005. [67] Nico Janssens, Sam Michiels, Tom Mahieu, and Pierre Verbaeten. Towards Hot-Swappable System Software: The DiPS/CuPS Component Framework. In Proceedings - The Seventh International Workshop on Component Oriented Programming (ECOOP-WCOP 2002), 2002. [68] Java servlet technology. http://java.sun.com/products/servlet/. [69] Jini. http://www.jini.org/. [70] Jean-Marc Jzquel and Bertrand Meyer. Design by contract: The lessons of ariane. Computer, 30(1):129–130. [71] Karl Forster, Lockstep Systems, Inc. Why Firewalls Fail to Protect Web Sites. http://www.lockstep.com/products/webagain/ why-firewalls-fail.pdf. [72] KindSoftware. The Extended Static Checker for Java version 2 (ESC/Java2). http://secure.ucd.ie/products/opensource/ESCJava2/. [73] Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and Frans M. Kaashoek. The click modular router. ACM Transactions on Computer Systems, 18(3):263–297, August 2000. [74] Ingolf H. Kr¨ uger and Reena Mathew. Systematic Development and Exploration of Service-Oriented Software Architectures. In Proceedings of the 4th Working IEEE/IFIP Conference on Software Architecture (WICSA-4), Oslo, Norway, 2004. IEEE/IFIP, IEEE. [75] Gary T. Leavens, Albert L. Baker, and Clyde Ruby. Preliminary design of JML: A behavioral interface specification language for Java. Technical Report 98-06-rev28, Iowa State University, Department of Computer Science, July 2005.
110
BIBLIOGRAPHY
[76] Gary T. Leavens, Erik Poll, Curtis Clifton, Yoonsik Cheon, Clyde Ruby, David R. Cok, Peter M¨ uller, Joseph Kiniry, and Patrice Chalin. JML reference manual. Department of Computer Science, Iowa State University. Available from http://www.jmlspecs.org, August 2006. [77] K. Rustan M. Leino, Greg Nelson, and James B. Saxe. ESC/Java User’s Manual. [78] Bixin Li. Managing dependencies in component-based systems based on matrix model. In Proceedings Of Net.Object.Days 2003, 22-25, Sept. 2003, Erfurt, Germany, 2003. [79] Jeff Magee and Jeff Kramer. Dynamic structure in software architectures. In Proceedings of the 4th ACM SIGSOFT symposium on Foundations of software engineering, pages 3–14. ACM Press, 1996. [80] Nenad Medvidovic and Richard N. Taylor. A classification and comparison framework for software architecture description languages. IEEE Trans. Softw. Eng., 26(1):70–93, 2000. [81] Sam Michiels. Component Framework Technology for Adaptable and Manageable Protocol Stacks. PhD thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium, November 2003. [82] Sam Michiels, Lieven Desmet, Nico Janssens, Tom Mahieu, and Pierre Verbaeten. Self-adapting concurrency: The DMonA architecture. In D. Garlan, J. Kramer, and A. Wolf, editors, Proceedings of the First Workshop on SelfHealing Systems (WOSS’02), pages 43–48, Charleston, SC, USA, 2002. ACM SIGSOFT, ACM press. [83] Sam Michiels, Lieven Desmet, Wouter Joosen, and Pierre Verbaeten. The DiPS+ software architecture for self-healing protocol stacks. In Proceedings of the 4th Working IEEE/IFIP Conference on Software Architecture (WICSA-4), Oslo, Norway, 2004. IEEE/IFIP, IEEE. [84] Sam Michiels, Lieven Desmet, and Pierre Verbaeten. A DiPS+ Case Study: A Self-healing RADIUS Server. Report CW-378, Dept. of Computer Science, K.U.Leuven, Leuven, Belgium, February 2004. [85] Sam Michiels, Nico Janssens, Lieven Desmet, Tom Mahieu, Wouter Joosen, and Pierre Verbaeten. A component platform for flexible protocol stacks. In Component-Based Software Development for Embedded Systems: An Overview of Current Research Trends, volume 3778/2005 of Lecture Notes in Computer Science, pages 185–208. Springer-Verlag, GmbH, November 2005.
BIBLIOGRAPHY
111
[86] Microsoft, Universit Degli Studi Di Milano, the Technical University of Ilmenau, the University of Salford, and the DistriNet and COSIC research groups of the Katholieke Universiteit Leuven. Designing Secure Applications Project (DeSecA), 2004. [87] Andrew C. Myers. JFlow: Practical mostly-static information flow control. In Symposium on Principles of Programming Languages, pages 228–241, 1999. [88] National Institute of Standards and Technology (NIST). National vulnerability database. http://nvd.nist.gov/statistics.cfm. [89] Gleb Naumovich, George S. Avrunin, Lori A. Clarke, and Leon J. Osterweil. Applying static analysis to software architectures. In Proceedings of the 6th European conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering, pages 77–93. SpringerVerlag New York, Inc., 1997. [90] Anh Nguyen-Tuong, Salvatore Guarnieri, Doug Greene, Jeff Shirley, and David Evans. Automatically hardening web applications using precise tainting. In Ryˆ oichi Sasaki, Sihan Qing, Eiji Okamoto, and Hiroshi Yoshiura, editors, SEC, pages 295–308. Springer, 2005. [91] National Institute of Standards and Technology. Common criteria for information technology security, 1999. [92] Jeff Offutt, Ye Wu, Xiaochen Du, and Hong Huang. Bypass testing of web applications. In ISSRE, pages 187–197, 2004. [93] Open Web Application Security Project (OWASP). Top ten most critical web application vulnerabilities. http://www.owasp.org/documentation/ topten.html, 2005. [94] Mariela Pavlova, Gilles Barthe, Lilian Burdy, Marieke Huisman, and JeanLouis Lanet. Enforcing high-level security properties for applets. In CARDIS, pages 1–16, 2004. [95] Charles E. Perkins. IP mobility support for IPv4. http://www.ietf.org/rfc/rfc3220.txt, January 2002. Request for Comments: 3220 (Category: Standards Track). [96] Dewayne E. Perry and Alexander L. Wolf. Foundations for the study of software architecture. ACM SIGSOFT Software Engineering Notes, 17(4):40–52, 1992. [97] Steve Pettit. Anatomy of a web application: Security considerations. Technical report, Sanctum, Inc., July 2001.
112
BIBLIOGRAPHY
[98] Tadeusz Pietraszek and Chris Vanden Berghe. Defending against injection attacks through context-sensitive string evaluation. In Proceedings of the 8th International Symposium on Recent Advances in Intrusion Detection (RAID2005), pages 124–145, 2005. [99] Arun D. Raghavan and Gary T. Leavens. Desugaring JML method specifications. Technical Report 00-03e, Iowa State University, Department of Computer Science, May 2005. [100] Vijay Raghvendra. Session tracking on the web. Internetworking, 3(1), March 2000. [101] The DistriNet and COSIC research groups of the Katholieke Universiteit Leuven and Ubizen. Software security for network applications project (sobenet), October 2003–2007. [102] Carl Rigney. Radius accounting. http://www.cis.ohio-state.edu/cgibin/rfc/rfc2866.html, 2000. Request for Comments: 2866 (Category: Informational). [103] Carl Rigney, Allen Rubens, William Simpson, and Steve Willens. Remote authentication dial in user service specification (radius). http://www.cis.ohiostate.edu/cgi-bin/rfc/rfc2138.html, 1997. Request for Comments: 2138 (Category: Standards Track). [104] Ivan Ristic. Web application firewalls primer. (IN)SECURE, 1(5):6–10, January 2006. [105] Vipin Samar. Unified login with pluggable authentication modules (pam), 1996. [106] Lui Sha, R. Rajkumar, and M. Gagliardi. Evolving dependable real-time systems. In 1996 IEEE Aerospace Applications Conference. Proceedings, pages 335–46, Aspen, CO, 3–10 1996. IEEE New York, NY, USA. [107] Mary Shaw and David Garlan. Software Architecture - Perspectives on an emerging discipline. Prentice-Hall, 1996. [108] Jan Smans, Bart Jacobs, and Frank Piessens. Static verification of code access security policy compliance of .NET applications. Journal of Object Technology, 5(3), April 2006. [109] Ioana S¸ora, Frank Matthijs, Yolande Berbers, and Pierre Verbaeten. Automatic composition of systems from components with anonymous dependencies specified by semantic-unaware properties, volume 732 of The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, 2003.
BIBLIOGRAPHY
113
[110] SourceForge.net. The world’s largest development and download repository of Open Source code and applications. http://sourceforge.net. [111] Judith A. Stafford and Alexander L. Wolf. Architecture-level dependence analysis in support of software maintenance. In Proceedings of the third international workshop on Software architecture, pages 129–132. ACM Press, 1998. [112] The struts framework. http://jakarta.apache.org/struts/. [113] Sun Microsystems, Inc. J2EE platform specification. http://java.sun.com/j2ee/.
Available at
[114] Clemens Szyperski. Component Software: Beyond Object-Oriented Programming. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002. [115] Clemens Szyperski, Dominik Gruntz, and S. Stephan Murer. Component Software - Beyond Object-Oriented Programming. Addison Wesley Professional, 2nd edition, November 2002. [116] Clemens Szyperski and Cuno Pfister. Component-oriented programming: Wcop’96 workshop report, 1997. [117] Andrew S. Tanenbaum. Computer Networks. Prentice Hall, 1996. [118] David Tennenhouse, Jonathan M. Smith, David W. Sincoskie, David Wetherall, and Gary J. Minden. A survey of active network research. IEEE Communications, 35(1):80–86, 1997. [119] The Java Modeling Language (JML). http://www.jmlspecs.org/. [120] Sander Tichelaar. Modeling Object-Oriented Software for Reverse Engineering and Refactoring. PhD thesis, University of Berne, December 2001. [121] Tom´ as E. Uribe and Steven Cheung. Automatic analysis of firewall and network intrusion detection system configurations. In FMSE ’04: Proceedings of the 2004 ACM workshop on Formal methods in security engineering, pages 66–74, New York, NY, USA, 2004. ACM Press. [122] Kris Vandebroek. Development of an IPSec based VPN solution with the DiPS component framework. Master’s thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium, May 2004. [123] Marlon Vieira and Debra Richardson. Analyzing dependencies in large component-based systems. In Proceedings of the 17 th IEEE International Conference on Automated Software Engineering (ASE’02), page 241. IEEE Computer Society, 2002.
114
BIBLIOGRAPHY
[124] Web Application Security Consortium. Web Application Firewall Evaluation Criteria, version 1.0. http://www.webappsec.org/projects/wafec/, January 2006. [125] webScurity, Inc. The Weakest Link: Mitigating Web Application Vulnerabilities. http://www.webscurity.com/pdfs/webapp vuln wp.pdf. [126] M. Welsh, D. Culler, and E. Brewer. SEDA: An Architecture for WellConditioned, Scalable Internet Services. In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles (SOSP-18), pages 171– 180, Banff, Canada, 2001. ACM Press, New York, NY, USA. [127] Gary T. Wong, Matti A. Hiltunen, and Richard D. Schlichting. A configurable and extensible transport protocol. In Proceedings of INFOCOM 2001, pages 319–328, 2001.
List of Publications Book chapters • S. Michiels, N. Janssens, L. Desmet, T. Mahieu, W. Joosen, and P. Verbaeten, A component platform for flexible protocol stacks, Component-Based Software Development for Embedded Systems: An Overview of Current Research Trends, (Atkinson, C. and Bunse, C. and Gross, H-G. and Peper, C., eds.), vol. 3778/2005, Lecture Notes in Computer Science, Springer-Verlag, GmbH, pages 185–208, November 2005.
Contributions at international conferences, published in proceedings • S. Michiels, L. Desmet, N. Janssens, T. Mahieu, and P. Verbaeten, SelfAdapting Concurrency: The DMonA Architecture, Proceedings of the First ACM SIGSOFT Workshop on Self-Healing Systems (WOSS’02) (Garlan, D. and Kramer, J. and Wolf, A., eds.), pages 43–48, 2002, • L. Desmet, F. Piessens, W. Joosen, and P. Verbaeten, Improving software reliability in data-centered software systems by enforcing composition time constraints, Proceedings of ICSE 2004 Third Workshop on Architecting Dependable Systems (WADS) (de Lemos, R. and Gacek, C. and Romanovsky, A., eds.), pages 32–36, 2004. • L. Desmet, N. Janssens, S. Michiels, F. Piessens, W. Joosen, and P. Verbaeten, Towards preserving correctness in self-managed software systems, Proceedings of the 2004 ACM SIGSOFT Workshop on Self-Managing Systems (Garlan, D. and Kramer, J. and Wolf, A., eds.), pages 34–38, 2004. • N. Janssens, L. Desmet, S. Michiels, and P. Verbaeten, NeCoMan: middleware for safe distributed service deployment in programmable networks, 115
Middleware 2004 companion workshop proceedings (Kon, F. and de Lara, E. and Jacobsen, H. and de Camargo, R., eds.), pages 256–261, 2004. • S. Michiels, L. Desmet, W. Joosen, and P. Verbaeten, The DiPS+ software architecture for self-healing protocol stacks, Proceedings of the 4th Working IEEE/IFIP Conference on Software Architecture (Magee, J. and Szyperski, C. and Bosch, J., eds.), pages 233–242, 2004. • L. Desmet, B. Jacobs, F. Piessens, and W. Joosen, Threat modelling for web services based web applications, Communications and Multimedia Security (Chadwick, D. and Preneel, B., eds.), pages 131–144, 2005. • L. Desmet, B. Jacobs, F. Piessens, and W. Joosen, A generic architecture for web applications to support threat analysis of infrastructural components, Communications and Multimedia Security (Chadwick, D. and Preneel, B., eds.), pages 125–130, 2005. • L. Desmet, F. Piessens, W. Joosen, and P. Verbaeten, Static Verification of Indirect Data Sharing in Loosely- coupled Component Systems, Software Composition, vol. 4089/2006, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, pages 34–49, August 2006. • L. Desmet, F. Piessens, W. Joosen, and P. Verbaeten, Bridging the Gap Between Web Application Firewalls and Web Applications, Proceedings of the 2006 ACM Workshop on Formal Methods in Security Engineering (FMSE 2006), pages 67–77, November 2006.
Contributions at international conferences, not published or only as abstract • L. Desmet, F. Piessens, W. Joosen, and P. Verbaeten, Infrastructural support for data dependencies in data-centered software systems, The Third AOSD Workshop on Aspects, Components, and Patterns for Infrastructure Software, ACP4IS 2004, March 22, 2004, Lancaster, United Kingdom. • L. Desmet, F. Piessens, W. Joosen, and P. Verbaeten, Bridging the Gap Between Web Application Firewalls and Web Applications: extended abstract, The 1st Benelux Workshop on Information and System Security, November 8-9, 2006, Antwerpen, Belgium. • L. Desmet, B. Jacobs, F. Piessens, W. Shulte, J. Smans, D. Vanoverberghe, Concern-specific annotation languages to support static detection of bugs in Java-like programs, 5th International Symposium on Formal Methods for Components and Objects (FMCO 2006), November 7-10, 2006, Amsterdam, The Netherlands.
Technical reports • L. Desmet, L. Jaco, K. Mertens, and T. Verhanneman, COTS, the safety nightmare of component-oriented frameworks, Department of Computer Science, K.U.Leuven, Report CW 367, Leuven, Belgium, September, 2003 • S. Michiels, L. Desmet, and P. Verbaeten, A DiPS+ case study: a self-healing RADIUS server, Department of Computer Science, K.U.Leuven, Report CW 378, Leuven, Belgium, February, 2004 • L. Desmet, F. Piessens, W. Joosen, and P. Verbaeten, Dependency analysis of the GatorMail webmail application, Department of Computer Science, K.U.Leuven, Report CW 427, Leuven, Belgium, September, 2005
Biography Lieven Desmet was born on January 16, 1979 in Roeselare (Belgium). He received a Bachelor of Applied Sciences and Engineering degree (Kandidaat Burgerlijk Ingenieur) and a Master of Applied Sciences and Engineering: Computer Science degree (Burgelijk Ingenieur in de Computerwetenschappen) from the Katholieke Universiteit Leuven in Belgium. He graduated magna cum laude in July 2002 with the thesis “Adaptive System Software with the DiPS Component Framework”, supervised by prof. Pierre Verbaeten. He started working as a Ph.D. student at the DistriNet (Distributed systems and computer Networks) research group of the Department of Computer Science at the Katholieke Universiteit Leuven in September 2002. Within DistriNet, he was active in both the networking and security task forces.
119
Statische en Dynamische Verificatie van het Indirect Delen van Data in Componentgebaseerde Toepassingen Nederlandse samenvatting
Beknopte samenvatting Moderne softwaresystemen worden steeds frequenter opgebouwd als composities van herbruikbare componenten. Dergelijke componentgebaseerde software promoot vaak eenvoudige syntactische interfaces om het bouwen van nieuwe composities zo eenvoudig mogelijk te maken. Hoewel deze componenten op syntactisch niveau gemakkelijk samen te stellen zijn tot nieuwe composities, blijken losgekoppelde componenten echter regelmatig semantisch van elkaar af te hangen. Dit is onder andere het geval bij composities met een gemeenschappelijke opslagplaats (shared repository) via dewelke componenten indirect data met elkaar kunnen delen. Typisch worden deze verborgen afhankelijkheden niet gecontroleerd tijdens de compilatie, en op die manier leiden inbreuken op deze afhankelijkheden in huidige softwaresystemen tot fouten tijdens de uitvoering. De hoofdbijdrage van deze thesis is een aanpak om in toepassingen met een gemeenschappelijke opslagplaats het aantal fouten tijdens de uitvoering sterk te reduceren. We hebben uitgebreid ervaring opgedaan met dergelijke toepassingen in twee toepassingsdomeinen: componentgebaseerde protocolstapels en web toepassingen. Deze ervaringen illustreren duidelijk dat gebroken data afhankelijkheden in dergelijke applicaties een relevant probleem zijn en dat de fouten die eruit voorkomen een belangrijke impact kunnen hebben op de robuustheid en veiligheid van de toepassing. Onze aanpak baseert zich op de formele verificatie van de geen gebroken data afhankelijkheden compositie eigenschap. De formele verificatie van deze eigenschap wordt bereikt in twee stappen. In een eerste stap verifi¨eren we statisch dat een gegeven compositie met deterministische controle overgangen voldoet aan de gewenste compositie eigenschap. Hiertoe worden de componenten uitgebreid met contracten die de interacties met de gemeenschappelijke opslagplaats beschrijven in een probleemspecifieke contracttaal. Vervolgens volgen twee statische verificatiestappen. Eerst wordt nagegaan of het opgestelde component contract overeenkomt met de implementatie van de
1
component. Ten slotte wordt de compositie eigenschap geverifieerd, steunend op de contracten van de verschillende componenten binnen de compositie. In een tweede stap breiden we de verificatie van de gewenste compositie eigenschap uit naar toepassingen met reactieve controle overgangen. Hierbij wordt de statische verificatie aanpak voor deterministische systemen gecombineerd met verificatie tijdens de uitvoering. Om de statische verificatie mogelijk te maken met de reactieve controle overgangen, defini¨eren we een bovengrens voor het reactieve gedrag. We noemen dit de vooropgestelde interacties tussen de web gebruiker en de toepassing, en drukken dit protocol uit in een gelabelde toestandsmachine. Daarna verifi¨eren we statisch of een gegeven compositie voldoet aan de compositie eigenschap, onder de veronderstelling dat elke interactie tussen de web gebruiker en de toepassing deel uitmaakt van de vooropgestelde interactie tussen de web gebruiker en de toepassing. We gebruiken hiervoor de oplossing voorgesteld voor deterministische controle overgangen. Ten slotte gebruiken we de protocolverificatie mogelijkheden van een Web Application Firewall om te garanderen dat enkel web aanvragen die deel uitmaken van de vooropgestelde interactie de web toepassing bereiken. Zo garanderen we dat de geen gebroken data afhankelijkheden compositie eigenschap geldt in een gegeven compositie. Ten slotte hebben we de formele verificatie van de geen gebroken data afhankelijkheden eigenschap gevalideerd in een toepassing met deterministische controle overgangen en een toepassing met reactieve controle overgangen. De statische verificatie uit de eerste stap werd succesvol toegepast op de webmail toepassing GatorMail. GatorMail is een middelgrote, open-source toepassing en bevat in totaal meer dan 1350 interacties met de gemeenschappelijke opslagplaats. Daarnaast werd de combinatie van statische en dynamische verificatie succesvol toegepast op de Duke’s Bookstore e-commerce toepassing. In beide validatie experimenten werd een beperkte annotatie- en verificatiekost gemeten. Bovendien illustreerden beide experimenten dat de voorgestelde aanpak lineair schaalt naar grotere, realistische softwaretoepassingen, dankzij de modulaire specificatie en verificatie.
i
1
Inleiding
Grote toepassingen volledig vanaf de grond op opbouwen vraagt veel tijd en is voor veel bedrijven niet kost-effectief. Als alternatief instelbare standaardsoftware aanschaffen beperkt soms de benodigde flexibiliteit en heeft een verminderd competitief voordeel naar concurrenten toe, die dezelfde standaardsoftware kunnen aanschaffen. Componentgebaseerde softwareontwikkeling combineert de voordelen van beide invalshoeken. Componenten zijn onafhankelijke bouwblokken, mogelijks aangeleverd door diverse leveranciers. Nieuwe toepassingen worden dan gebouwd door aangekochte en eigen-gecre¨eerde bouwblokken te combineren tot een nieuwe compositie [20]. In [21] defini¨eren Szyperski et al. softwarecomponenten als een eenheid van compositie met contractueel gespecificeerde interfaces en met expliciet beschreven context afhankelijkheden. In praktijk echter hanteren softwareontwikkelaars en leveranciers een minder strenge definitie van softwarecomponenten en componentgebaseerde toepassingen. Vaak worden componenten slechts voorzien van onvolledige of informele contracten, en beperken contracten zich vaak tot het syntactische niveau (en bevatten ze dus geen semantische informatie). Software systemen worden meer en meer missiekritisch. Computernetwerken, applicatie servers en vliegtuigsturingen hangen bijvoorbeeld sterk af van de beschikbaarheid en betrouwbaarheid van de softwaresystemen. In dergelijke systemen kunnen softwarefouten dan ook een gigantische impact hebben. Daarbij komt nog dat veel hedendaagse softwaresystemen hogere beveiligingsgaranties vragen. Meer en meer bedrijven realiseren bijvoorbeeld een deel van hun inkomsten via online handel, maar tegelijkertijd zijn hun web toepassingen vaak erg gevoelig voor fouten. Binnen de onderzoeksgroep DistriNet werken we aan diverse oplossingen om de betrouwbaarheid en veiligheid van modulair opgebouwde applicaties te verhogen. Zo werken leden van onze groep bijvoorbeeld aan de veilige herconfiguratie van dergelijke systemen tijdens de uitvoering [9], aan het verhinderen van race condities [8] en het nastreven van minimaal kwaliteitsverlies van de aangeboden diensten in geval van overbelasting [15]. In deze thesis verhogen we de betrouwbaarheid en veiligheid van componentgebaseerde toepassingen, waarbij de componenten indirect data met elkaar kunnen delen via een gemeenschappelijke opslagplaats (shared repository). Deze architecturale stijl wordt vaak toegepast om extra flexibiliteit te cre¨eren. Op die manier kunnen componenten zonder rechtstreekse interactie toch indirect data met elkaar delen. Hoewel componenten syntactisch niet van elkaar afhangen (wat betreft het delen van data), cre¨eert het indirect data delen wel semantische afhankelijkheden tussen componenten die eenzelfde data object delen. Vaak zijn deze semantische afhankelijkheden verborgen aanwezig in hedendaagse toepassingen, en leidt het breken van deze afhankelijkheden tot fouten tijdens de uitvoering. Veel gegooide uitzonderingen in deze context zijn bijvoorbeeld NullPointerExceptions of Class-
ii
CastExceptions. Het doel van deze thesis is gebroken data afhankelijkheden vroegtijdig te detecteren in composities met een gemeenschappelijke opslagplaats. Door formeel te verifi¨eren dat geen afhankelijkheden gebroken worden in een gegeven compositie, elimineren we verschillende types van fouten tijdens de compilatie of compositie, in plaats van tijdens de uitvoering. We werken in deze thesis formele verificatie oplossingen uit zowel voor toepassingen met deterministische controle overgangen, als voor toepassingen met reactieve controle overgangen. De rest van deze tekst is opgebouwd als volgt. Sectie 2 illustreert hoe softwarecomponenten indirect data met elkaar kunnen delen door gebruik te maken van een gemeenschappelijke opslagplaats. Tevens worden een aantal problemen beschreven die typisch kunnen voorvallen bij het gebruik van een dergelijke, gemeenschappelijke opslagplaats. Om een goed inzicht te krijgen in de typische interacties die plaatsvinden tussen componenten en een gemeenschappelijke opslagplaats, wordt in sectie 3 de reeds bestaande webmail toepassing GatorMail bestudeerd. Naast afhankelijkheden van componenten met betrekking tot de gemeenschappelijke opslagplaats, worden eveneens drie andere types van inter-component afhankelijkheden in kaart gebracht. De studie van deze realistische gevalsstudie illustreert onder meer de omvang van het indirect delen van data in recente toepassingen en de complexiteit om zonder bijkomende ondersteuning de afhankelijkheden van de verschillende componenten correct te beheren. In sectie 4 worden afhankelijkheden tussen componenten die indirect data delen via een gemeenschappelijke opslagplaats beschermd in toepassingen met deterministische controle overgangen. Dit zijn toepassingen waarbij de uitvoeringsdraad (thread of control ) doorheen het programma op een deterministische manier kan bepaald worden. Om de afhankelijkheden te beschermen, wordt gebruik gemaakt van softwarecontracten voor de verschillende componenten en statische verificatie. Vervolgens wordt in sectie 5 de oplossing voor deterministische softwarecomposities uitgebreid naar reactieve softwaretoepassingen. Hiervoor wordt de aanpak met softwarecontracten en statische verificatie uit sectie 4 gecombineerd met bijkomende verificatie van het indeterministische gedrag tijdens de uitvoering. Sectie 6 overloopt ten slotte kort de bijdragen van dit werk en de resultaten van de uitgevoerde validatie experimenten.
2 2.1
Het indirect delen van data in componentgebaseerde toepassingen Het indirect delen van data
In de architecturale stijl gemeenschappelijke opslagplaats (shared repository), bestaat een softwaresysteem uit een centrale opslagplaats (die de toestand van het systeem bevat) en een aantal afzonderlijke componenten die kunnen interageren met de centrale opslagplaats [18].
2.1
Het indirect delen van data
iii
Componenten kunnen data objecten wegschrijven naar de opslagplaats om ze zo beschikbaar te stellen voor andere componenten, of kunnen data ophalen van de opslagplaats om ze te gebruiken bij hun uitvoering. Op die manier kunnen componenten zonder rechtstreekse interactie toch indirect data delen. Deze architecturale stijl wordt onder meer gebruikt om extra flexibiliteit te cre¨eren in diverse raamwerken zoals JavaServlet containers [19], het PAM raamwerk (Pluggable Authentication Modules) [17] en JavaSpaces in Sun’s Jini [6, 10]. Een toepassing met een gemeenschappelijke opslagplaats is correct samengesteld als elke component van de toepassing tijdens zijn uitvoering de data terugvindt op de opslagplaats die hij verwacht terug te vinden. Dit betekent dat de goede werking van een component afhangt van de toestand van de gemeenschappelijke opslagplaats op het moment van zijn uitvoering. Dus in toepassingen met een gemeenschappelijke opslagplaats bestaan er impliciete afhankelijkheden tussen componenten die met eenzelfde data object op de opslagplaats interageren. Figuur 1 illustreert bijvoorbeeld dergelijke afhankelijkheden in een kleine toepassing met 2 gedeelde data objecten op de opslagplaats. Voor elk data object bestaat er een afhankelijkheid tussen de componenten die het data object lezen en schrijven.
Figuur 1: Data afhankelijkheden in toepassingen met een gemeenschappelijke opslagplaats Het typische gebruik van een gemeenschappelijke opslagplaats is verder ge¨ıllustreerd aan de hand van gevalsstudies, uitgevoerd in twee verschillende toepassingsdomeinen: componentgebaseerde web toepassingen en protocolstapels. Gevalsstudie 1: Componentgebaseerde web toepassingen. De eerste gevalsstudie bundelt ervaringen met servlet-gebaseerde web toepassingen. De Java Servlet technologie maakt deel uit van de J2EE specificatie en biedt een componentgebaseerde aanpak aan om web toepassingen te maken in Java [19, 7]. Servlets delen indirect data met elkaar op vijf verschillende gemeenschappelijke opslagplaatsen. Zo is er telkens ´e´en opslagplaats geassocieerd met elke component, met elke aanvraag, met elke individuele web gebruiker, met elke toepassing en met de
2.2
Doelstellingen van deze thesis
iv
server. In de gevalsstudie worden de data afhankelijkheden in kaart gebracht van een kleine e-commerce toepassing. Tevens worden verschillende manieren toegelicht waarop deze impliciete afhankelijkheden gebroken kunnen worden tijdens de uitvoering. Gevalsstudie 2: Componentgebaseerde protocolstapels. De tweede gevalsstudie bestudeert componentgebaseerde protocolstapels gebaseerd op de DiPS+ technologie [14, 16]. Deze technologie laat toe om op een snelle manier herconfigureerbare, componentgebaseerde systeemsoftware te bouwen zoals protocolstapels en bestandsystemen. In DiPS+ worden protocolstapels gemodelleerd als sequenti¨ele pijpen van componenten (conform de pipe-and-filter architecturale stijl), waarin de componenten gegroepeerd zijn per protocollaag. Een protocolstapel verwerkt een netwerkpakket door het pakket doorheen de pijp te sturen, waarbij de componenten ´e´en na ´e´en op het pakket kunnen inwerken. De componenten delen indirect data met elkaar via een gemeenschappelijke opslagplaats die geassocieerd is met het pakket. Elk pakket heeft dus zijn eigen opslagplaats, en deze opslagplaats is enkel beschikbaar voor de component die op dat moment op het pakket aan het inwerken is. In de gevalsstudie worden de data afhankelijkheden bestudeerd in drie protocollagen: IPv4, IPSec en Radius.
2.2
Doelstellingen van deze thesis
De hoofddoelstelling van deze thesis is het verminderen van fouten tijdens de uitvoering ten gevolge van gebroken data afhankelijkheden in toepassingen met een gemeenschappelijke opslagplaats. Dit beperkt dan ook het bereik van de thesis tot softwaresystemen met de volgende eigenschappen: • Componentgebaseerde softwaresystemen waarin componenten hergebruikt worden in verschillende composities. • Toepassingen waarin componenten indirect data delen via een gemeenshappelijke opslagplaats. • Zowel toepassingen met deterministische als reactieve controle overgangen behoren tot het bereik van deze thesis. Meer concreet defini¨eren we de volgende gewenste compositie eigenschap. Door formele garanties te bieden dat deze compositie eigenschap niet geschonden wordt in een compositie, worden in de interactie met de gemeenschappelijke opslagplaats uitvoeringsfouten als een NullPointerException of een ClassCastException ge¨elimineerd. Geen gebroken data afhankelijkheden: Voor elk data object op de gemeenschappelijke opslagplaats geldt dat alvorens het data object wordt uitgelezen
v
door een component, het data object eerst daadwerkelijk op de gemeenschappelijke opslagplaats is geschreven. Tevens geldt voor elke leesinteractie dat het data object dat wordt opgehaald door een component van het data type is dat door deze component verwacht wordt. Het reduceren van fouten tijdens de uitvoering verhoogt de betrouwbaarheid van de toepassing, maar om echt bruikbaar te zijn voor de programmeur zijn ook de volgende extra-functionele criteria belangrijk: Beperkte overhead De extra kost voor de softwareontwikkelaar is best zo klein mogelijk, zowel in termen van bijkomende werklast als van verificatietijd. Bovendien is het belangrijk dat de oplossing eenvoudig te begrijpen is door de doorsnee ontwikkelaar. Hoe minder overhead en hoe eenvoudiger, hoe meer kans dat de oplossing effectief wordt toegepast. Toepasbaarheid in realistische toepassingen De toepasbaarheid van de voorgestelde oplossing mag niet gelimiteerd zijn tot kleine, didactische voorbeelden. De oplossing moet ook toepasbaar zijn in grotere, realistische softwaretoepassingen. Dit houdt onder andere in dat de voorgestelde oplossing schaalt naar grotere toepassingen, en dat de oplossing niet beperkt wordt door de gebruikte taal of door het onderliggende raamwerk.
3
Studie van de verschillende afhankelijkheden in de webmail toepassing GatorMail
In deze sectie wordt de complexiteit ge¨ıllustreerd van inter-component afhankelijkheden in hedendaagse software. Hiervoor worden de afhankelijkheden bestudeerd tussen de componenten van de reeds bestaande webmail toepassing GatorMail [5]. Er worden in het bijzonder vier types van afhankelijkheden in kaart gebracht, die vaak voorkomen in toepassingen gebaseerd op het Struts raamwerk [22]. Dit wijdverspreide raamwerk bovenop J2EE technologie wordt vaak gebruikt om bijkomende ondersteuning te bieden aan de programmeur en om een duidelijke splitsing tussen logica en opmaak af te dwingen.
3.1
GatorMail
GatorMail is een Struts-gebaseerde, open-source webmail toepassing ontwikkeld door de University of Florida. Net als bij de meeste andere webmail toepassingen, is de functionaliteit opgesplitst in een aantal subsystemen: de authenticatie functionaliteit, de acties op email mappen, de acties op email berichten, de adresboek functionaliteit en de voorkeurinstellingen. De GatorMail toepassing bestaat uit 36 verschillende verwerkingscomponenten en 29 JSP opmaakpagina’s. Dit komt overeen met ongeveer 14350 lijnen Java code
3.2
Studie van vier types inter-component afhankelijkheden
vi
en 6100 lijnen JSP opmaak. De verschillende componenten en pagina’s worden hergebruikt in 52 verschillende composities die elk de nodige logica bevatten om ´e´en type web aanvragen te verwerken.
3.2
Studie van vier types inter-component afhankelijkheden
De inter-component afhankelijkheden worden op twee niveaus bestudeerd. In eerste instantie worden de afhankelijkheden tussen componenten aan de serverkant in kaart gebracht. In tweede instantie worden ook de afhankelijkheden bestudeerd die het gevolg zijn van de interacties tussen de web gebruiker en de web server. We verwijzen hier verder naar als de interne en externe afhankelijkheden in GatorMail. 3.2.1
Interne afhankelijkheden
De interne afhankelijkheden worden ge¨ıllustreerd aan de hand van een klein voorbeeldje van een compositie uit GatorMail. In figuur 2 is de compositie weergegeven die de voorkeurinstellingen van een gebruiker weergeeft en verwerkt. De compositie bestaat uit 1 verwerkingscomponent PreferencesAction en 1 opmaakpagina preferences.jsp, beide voorgesteld als afgeronde rechthoeken in figuur 2.
Figuur 2: Interacties met de gemeenschappelijke opslagplaats in Struts
Declaratieve controle overgangen In het Struts raamwerk vormen uitvoeringscomponenten in een compositie een sequenti¨ele pijp van componenten. De uitvoeringscomponenten worden tijdens de uitvoering aaneengeschakeld op basis van de terugkeerwaarde van hun uitvoeringsmethode. Na de uitvoering van een
3.2
Studie van vier types inter-component afhankelijkheden
vii
component wordt dus op basis van de terugkeerwaarde de volgende component geselecteerd en uitgevoerd. Dit proces wordt herhaald tot het einde van de pijp bereikt is en een opmaakpagina een antwoord terugstuurt naar de web gebruiker. De manier waarop de volgende component wordt geselecteerd kan per configuratie aangepast worden. Het configuratiebestand van een specifieke toepassing (dat per instantiatie verschillend kan zijn) beschrijft immers declaratief welke terugkeerwaardes tijdens de uitvoering overeenkomen met welke volgende uitvoeringscomponent. Deze extra flexibiliteit zorgt ervoor dat componenten gemakkelijk in nieuwe composities hergebruikt kunnen worden. Afhankelijk van de noden in een specifieke compositie wordt de uitvoeringsdraad doorheen het programma aangepast door de declaratieve beschrijving te wijzigen, en dus niet door de componenten zelf te wijzigen. In figuur 2 is de horizontale pijl tussen PreferencesAction en preferences.jsp een voorbeeld voor een dergelijke declaratief beschreven overgang van controle in de toepassing. Na de uitvoering van de uitvoeringsmethode van PreferencesAction zijn er twee terugkeerwaardes mogelijk, namelijk input en success. In het bijhorende configuratiebestand komen beide terugkeerwaardes overeen met de opmaakpagina preferences.jsp. In een dergelijke compositie is de correcte werking van de toepassing afhankelijk van een goeie overeenkomst tussen de declaratief beschreven overgangen in het configuratiebestand en de terugkeerwaardes die de componenten tijdens de uitvoering kunnen teruggeven. Indien een compoent immers een terugkeerwaarde produceert die in de configuratie niet beschreven is zal dit typisch leiden tot een fout tijdens de uitvoering. Het indirect delen van data Naast de declaratief beschreven controle overgangen, ondersteunt het Struts raamwerk ook het indirect delen van data tussen de verschillende uitvoeringscomponenten en opmaakpagina’s. In de compositie van figuur 2 wordt er indirect data gedeeld op drie niveaus: per inkomende aanvraag, per gebruikerssessie en per toepassing. Elk van de niveaus is in de figuur aangeduid met een grijs ovaal, en de data objecten die gedeeld worden zijn voorgesteld door witte rechthoeken. Zoals reeds aangehaald in sectie 2.1, introduceert het gebruik van een gemeenschappelijke opslagplaats voor elk data object afhankelijkheden tussen de componenten die interageren met het specifieke data object. Iedere component die data objecten van de gemeenschappelijke opslagplaats ophaalt, verwacht immers dat deze data objecten enerzijds aanwezig zijn op de opslagplaats, en anderzijds ook van de gewenste types zijn. 3.2.2
Externe afhankelijkheden
Om de externe afhankelijkheden in kaart te brengen wordt de interactie tussen de web gebruiker en de web server bestudeerd. Ook hier worden de afhankelijkheden
3.2
Studie van vier types inter-component afhankelijkheden
viii
opgesplitst in afhankelijkheden met betrekking tot de controle overgang, en afhankelijkheden ten gevolge van het delen van data tussen de web gebruiker en de web server. Reactieve controle overgangen Naar analogie met andere client-server toepassingen bestaat er ook tussen de web gebruiker en web toepassing een protocol zoals is weergegeven in figuur 3. Het protocol bestaat typisch uit een opeenvolging van asynchrone aanvraag-antwoord stappen, die elk verantwoordelijk zijn voor de aanvraag van 1 URL van de web toepassing. De volgorde waarin de verschillende URL’s worden opgevraagd is niet deterministisch bepaald, maar is afhankelijk van de hyperlinks die de gebruiker wenst te volgen in zijn browser. Men spreekt dan ook van een reactieve toepassing wanneer de controle overgangen het gevolg zijn van niet-deterministisch gedrag van de gebruiker. De uitvoering van de web toepassing is dus afhankelijk van de niet-deterministische controle overgangen, die door de gebruiker vrij te kiezen zijn tijdens de uitvoering.
Figuur 3: Interacties tussen de web gebruiker en de web toepassing Toch is het in vele concrete web toepasingen niet onlogisch om de controle overgangen in het protocol tussen web gebruiker en web toepassing te beperken tot een logische subset waarin de gebruiker nog steeds reactief de overgangen kan kiezen. Zo is het bijvoorbeeld in de webmail toepassing niet logisch om een bijlage toe te voegen aan een bericht alvorens het bericht aan te maken, of om gevoelige informatie te raadplegen alvorens de authenticatieprocedure doorlopen te hebben. Het delen van data tussen web gebruiker en web server Tijdens de reactieve controle overgangen wordt eveneens data uitgewisseld tussen de web gebruiker en de web toepassing. Bij het aanvragen van een URL aan de toepassing kan de web gebruiker extra invoerparameters meesturen met de aanvraag. Voorbeelden van invoerparameters zijn invoervelden van formulieren, verborgen velden, of parameters toegevoegd aan de URL. Bij het wijzigen van de voorkeurinstellingen van de webmail toepassing bijvoorbeeld worden 8 invoerparameters meegestuurd (figuur 4).
3.3
Overzicht van de resultaten
ix
De verwerking van aanvragen in de web toepassing is dus afhankelijk van de meegestuurde data. Daarnaast wordt er ook data uitgewisseld tussen de toepassing en de gebruiker bij het terugsturen van een antwoord. Deze laatste datastroom wordt echter buiten beschouwing gelaten in onze studie.
Figuur 4: Het delen van data tussen de web gebruiker en de web toepassing
3.3
Overzicht van de resultaten
Ten slotte worden in deze subsectie kort de resultaten van de afhankelijkheidsstudie toegelicht. Een vollediger rapport van deze afhankelijkheidsstudie is te vinden in [4]. Interne afhankelijkheden De 36 uitvoeringscomponenten en 39 JSP opmaakpagina’s in GatorMail worden in 52 composities hergebruikt. De studie identificeerde hierbij 147 declaratief beschreven, interne controle overgangen. Daarnaast werden 1369 interacties met de gemeenschappelijke opslagplaats in kaart gebracht. Externe afhankelijkheden Bij de interactie tussen de web gebruiker en de GatorMail toepassing worden 133 verschillende invoerparameters meegestuurd. Daarnaast werden 549 mogelijke externe controle overgangen ge¨extraheerd uit de opmaakpagina’s die naar de gebruiker worden teruggestuurd. Tijdens de studie werden vier eigenschappen met betrekking tot de afhankelijkheden ge¨ıdentificeerd: Het verspreid zijn doorheen de code. De interacties met de gemeenschappelijke opslagplaats zijn heel verspreid doorheen de code van de uitvoeringscomponenten en de opmaakpagina’s. Dit zorgt ervoor dat het heel moeilijk is om op een eenvoudige manier een volledig beeld te krijgen van alle interacties. Ook de externe controle overgangen zijn sterk verspreid in de code van de opmaakpagina’s.
3.3
Overzicht van de resultaten
x
Het grote aantal ge¨ıdentificeerde afhankelijkheden. Het aantal in kaart gebrachte afhankelijkheden in GatorMail ligt vrij hoog. Met onder andere 1369 interacties met de gemeenschappelijke opslagplaats en 549 externe controle overgangen is het duidelijk dat de complexiteit om afhankelijkheden te beheren in een middelgrote toepassing als GatorMail niet te onderschatten is. Figuur 5 visualiseert bijvoorbeeld de interacties met de gemeenschappelijke opslagplaats voor de compositie die adressen opslaat in het adresboek van GatorMail.
Figuur 5: Grafische voorstelling van de afhankelijkheden in /saveAddresses.do De complexiteit van identificatie. De complexiteit om de verschillende afhankelijkheden te identificeren hangt sterk af van hoe expliciet deze beschreven zijn, bijvoorbeeld in documentatie of configuratiebestanden, of hoe verspreid deze zijn in de code. Zo zijn de declaratieve controle overgangen in de compositie expliciet beschreven en eenvoudig in kaart te brengen. De interne interactie met de gemeenschappelijke opslagplaats en de externe controle overgangen daarentegen zijn nergens beschreven, en hun verspreiding in de code zorgt ervoor dat ze heel moeilijk te identificeren zijn. Een goed beheer van deze afhankelijkheden is dan ook allesbehalve evident, zeker in het geval van software evolutie. Complementariteit van de afhankelijkheden. In deze studie werden vier types van afhankelijkheden ge¨ıdentificeerd. Om een volledig beeld te krijgen van de afhankelijkheden is het belangrijk de vier types te combineren. Zo zijn
xi
de interacties met de gemeenschappelijke opslagplaats pas zinvol als ook de onderliggende uitvoeringsdraad doorheen de toepassing duidelijk is. Bijkomend geven de externe controle overgangen ook extra informatie over hoe de verschillende ge¨ısoleerde composities op basis van de externe controle overgangen de uitvoeringsdraad doorheen de complete toepassing beschrijven, en hoe data die door de gebruiker doorgestuurd wordt naar de toepassing gecorreleerd is met de data die terecht komt op de gemeenschappelijke opslagplaats.
4 4.1
Statische verificatie van het indirect delen van data Gewenste eigenschappen van een compositie
Zoals toegelicht in sectie 3, is het beheer van afhankelijkheden in Struts gebaseerde toepassingen niet evident. Meer algemeen zoeken we in deze sectie oplossingen om de interne afhankelijkheden beter te beheren in toepassingen met een gemeenschappelijke opslagplaats of met declaratieve controle overgangen. We zijn voornamelijk ge¨ınteresseerd in oplossingen die het aantal fouten tijdens de uitvoering kunnen terugbrengen door fouten in de compositie reeds tijdens de compilatie of tijdens de compositie te detecteren. Op basis van de studie van GatorMail defini¨eren we dan ook volgende gewenste eigenschappen in componentgebaseerde toepassingen met een gemeenschappelijke opslagplaats en/of declaratief controle overgangen: Geen ongedefinieerde controle overgangen: Voor elke interne controle overgang tussen uitvoeringscomponenten tijdens de uitvoering komt een declaratieve beschrijving overeen in de compositie configuratie. Op die manier zal het raamwerk tijdens de uitvoering nooit in een situatie terechtkomen waarbij voor een terugkeerwaarde geen overeenkomstige volgende uitvoeringscomponent gevonden kan worden. Geen gebroken data afhankelijkheden: Voor elk data object op de gemeenschappelijke opslagplaats geldt dat alvorens het data object wordt uitgelezen door een uitvoeringscomponent, het data object eerst daadwerkelijk op de gemeenschappelijke opslagplaats is geschreven. Tevens geldt voor elke leesinteractie dat het data object dat wordt opgehaald door de uitvoeringscomponent van het data type is dat door deze component verwacht wordt. Op deze manier worden tijdens de uitvoering de typische NullPointerException en ClassCastException fouten vermeden.
4.2
4.2
Overzicht van de oplossing
xii
Overzicht van de oplossing
In deze sectie wordt een oplossing voorgesteld om op een statische manier te verifi¨eren dat een compositie voldoet aan beide hierboven beschreven eigenschappen. Een overzicht van deze oplossing is grafisch weergegeven in figuur 6.
Figuur 6: Overzicht van de oplossing In deze oplossing worden beide eigenschappen samen geverifieerd. Hiervoor wordt in een eerste stap voor elke component een specificatie opgesteld in de vorm van een component contract. Deze specificatie bevat alle relevante informatie over de interacties tussen de component en de gemeenschappelijke opslagplaats, en over de mogelijke terugkeerwaardes horende bij de specifieke component. Vervolgens wordt voor elke component statisch geverifieerd dat de implementatie van de component overeenkomt met de specificatie van de component. In een tweede stap wordt statisch geverifieerd of er in een specifieke compositie geen inbreuken mogelijk zijn op beide gewenste eigenschappen. Deze laatste verificatiestap gebeurt op basis van de compositie configuratie en de component specificaties van elk van de componenten die deel uitmaken van de compositie. De twee stappen van de oplossing worden verder toegelicht aan de hand van de voorbeeldcompositie in figuur 7.
Figuur 7: Een voorbeeld compositie: het plannen van een vergadering
4.2
Overzicht van de oplossing
4.2.1
xiii
Stap 1: Component specificatie en verificatie
De component specificaties worden uitgedrukt in een probleemspecifieke contracttaal. Listing 1 geeft een voorbeeld van een dergelijke probleemspecifieke contracttaal. Deze contracten worden dan automatisch vertaald in de meer courant gebruikte contracttaal JML (de Java Modeling Language (de Java Modeling Language)) [12], zodat bestaande verificatieprogramma’s gebruikt kunnen worden. In de rest van de uitleg worden de vertaalde JML contracten gebruikt ter illustratie. E´en van de belangrijkste voordelen van JML is de grote hoeveelheid beschikbare ondersteunende programma’s [2]. Zo zijn er programma’s beschikbaar voor contract verificatie tijdens de uitvoering, voor test generatie, voor statische verificatie en voor het infereren van contracten. Listing 1: Het probleemspecifieke contract van AddMeetingAction //spec: forwards {”success”,” fail ”}; //spec: writes {Meeting meeting}; //spec: on forward == ”fail” also writes {Vector conflicts };
Om de eigenschap geen ongedefinieerde controle overgangen te kunnen verifi¨eren is het nodig dat de component specificaties voldoende informatie bevatten over de mogelijke declaratieve overgangen. Dit kan eenvoudig gebeuren door de mogelijke terugkeerwaardes op te nemen als onderdeel van de postconditie van de component. Listing 2 drukt via het ensures pragma uit dat success de enige terugkeerwaarde van de EmailNotificationAction is. Listing 2: Het JML contract van EmailNotificationAction public class EmailNotificationAction extends Action { //@ also //@ requires request != null ; //@ requires request .getDataItem(”meeting”) instanceof Meeting; //@ ensures \result == ”success”; public String execute(Request request, Form form); }
Voor de verificatie van de geen gebroken data afhankelijkheden eigenschap is het nodig dat een component contract de interacties beschrijft tussen de component en de gemeenschappelijke opslagplaats. De leesinteracties worden uitgedrukt met behulp van een preconditie die de vereiste toestand van de gemeenschappelijke opslagplaats beschrijft. In listing 2 wordt bijvoorbeeld uitgedrukt dat de EmailNotificationAction component voor uitvoering vereist dat het data object meeting op de gemeenschappelijke opslagplaats niet-null is, en dat het object van het data type Meeting is. Een preconditie wordt uitgedrukt met behulp van een requires pragma. De toestand van de gemeenschappelijke opslagplaats wordt hierin beschreven door de getDataItem methode.
4.2
Overzicht van de oplossing
xiv
Voor schrijfinteracties wordt in de postconditie van de component specificatie uitgedrukt welke data objecten op de gemeenschappelijke opslagplaats beschikbaar zijn na uitvoering, en van welk type deze objecten zijn. Opnieuw worden deze uitgedrukt in functie van de getDataItem methode. Listing 3 specificeert bijvoorbeeld dat na uitvoering van de AddMeetingAction het data object meeting niet-null zal zijn, en van het data type Meeting. In listing 3 is ook een conditionele postconditie opgenomen. Er wordt enkel gegarandeerd dat het data object conflicts op de gemeenschappelijke opslagplaats beschikbaar is na uitvoering als fail de terugkeerwaarde van de execute methode is. Listing 3: Het JML contract van AddMeetingAction public class AddMeetingAction extends Action { //@ also //@ requires request != null ; //@ ensures request.getDataItem(”meeting”) instanceof Meeting; //@ ensures \result == ”fail” ==> request.getDataItem(”conflicts”) // instanceof Vector; //@ ensures \result == ”success” || \ result == ”fail”; public String execute(Request request, Form form); }
Vervolgens worden bestaande verificatieprogramma’s voor JML gebruikt om na te gaan of de opgestelde specificatie overeenkomt met de daadwerkelijke implementatie van de component. 4.2.2
Stap 2: Statische verificatie van de gewenste eigenschappen
In een tweede stap moeten de gewenste eigenschappen geverifieerd worden in een specifieke compositie. Ook hiervoor wordt gebruik gemaakt van bestaande verificatieprogramma’s voor JML. Om de gewenste eigenschappen te verifi¨eren wordt een compositiespecifieke verificatiemethode gegenereerd, uitgedrukt in Java. Deze verificatiemethode initialiseert eerst de verschillende gebruikte componenten. Daarna wordt uit alle mogelijke controle overgangen voor elke component een statische boomstructuur gemaakt die alle mogelijke uitvoeringsdraden doorheen de toepassing beschrijft. Dit gebeurt op basis van de declaratieve beschrijving uit het configuratiebestand van de compositie. Deze boomstructuur omvat alle mogelijke uitvoeringspaden die tijdens de uitvoering van de compositie kunnen voorkomen. Om na te gaan dat geen inbreuken voorkomen op de geen ongedefinieerde controle overgangen eigenschap wordt op elke splitsing van de boom een extra tak toegevoegd met een unreachable pragma. Deze takken zijn enkel bereikbaar indien de component een terugkeerwaarde teruggeeft die niet opgenomen is in de declaratieve beschrijving.
4.3
Validatie in de webmail toepassing GatorMail
xv
Daarna verifi¨eren we de gegeneerde verificatiemethode met een bestaand verificatieprogramma. Een dergelijk verificatieprogramma zal voor alle mogelijke uitvoeringspaden doorheen de methode verifi¨eren of voor elk statement in het uitvoeringspad de precondities van de opgeroepen methode vervuld zijn. Dus impliciet zal het verificatieprogramma voor elke component controleren of de gemeenschappelijke opslagplaats op dat moment in de uitvoering in de toestand is die de component verwacht. Om dit te verifi¨eren steunt het verificatieprogramma onder andere op de postcondities van componenten die vroeger in het pad uitgevoerd zijn. Het slagen van de verificatie van de gegenereerde verificatiemethode impliceert dus dat de compositie voldoet aan de geen gebroken data afhankelijkheden eigenschap. Het verificatieproces zal slechts slagen als het verificatieprogramma kan uitsluiten dat er ooit een unreachable pragma bereikt wordt tijdens het doorlopen van alle mogelijke uitvoeringspaden. Dit impliceert dat een compositie waarvoor de gegenereerde verificatiemethode verifieert eveneens voldoet aan de geen ongedefinieerde controle overgangen eigenschap.
4.3
Validatie in de webmail toepassing GatorMail
De voorgestelde aanpak werd gevalideerd in de webmail toepassing GatorMail. Hiervoor werden 12 uitvoeringscomponenten en 8 opmaakpagina’s van specificatie voorzien. Met deze component contracten hebben we voor 17 bestaande composities uit GatorMail zowel de geen ongedefinieerde controle overgangen eigenschap als de geen gebroken data afhankelijkheden eigenschap gevalideerd. Hiervoor werd gebruik gemaakt van het verificatieprogramma ESC/Java2 [11]. Het grote voordeel van dit verificatieprogramma is dat deze de verificatie kan uitvoeren zonder interactie met de gebruiker. Nadeel is wel dat het verificatieprogramma onvolledig is, en een aantal bronnen van onbetrouwbaarheid heeft waarmee we tijdens de uitwerking van de oplossing rekening moesten mee houden [13, 3]. We hebben de resultaten van deze experimenten gebruikt om enerzijds de hoeveelheid vereiste specificatie te meten, en om anderzijds de snelheid van de verificatie in kaart te brengen. Hoeveelheid specificatie Tijdens het experiment werden ten hoogste 15 lijnen JML specificatie gebruikt per component. Uitgedrukt in termen van de meer condense, probleemspecifieke contracten beslaan de specificaties hoogstens 4 lijnen per component. Tevens zijn de probleemspecifieke contracten eenvoudiger neer te schrijven en te begrijpen door een programmeur, en kunnen ze automatisch vertaald worden naar de meer algemeen gebruikte JML contracten. Verificatiesnelheid Het meten van de verificatiesnelheid gebeurde op een stan-
xvi
daard laptop1 . De tijd die nodig is om te verifi¨eren of een component specificatie overeenkomt met de implementatie van de component, bedroeg maximaal 15 seconden. Indien eveneens extra frame condities werden geverifieerd, liep de verificatietijd op tot 700 seconden. De verificatie van de twee eigenschappen in een gegeven compositie slaagde in maximaal 8 seconden. De verificatiesnelheid is aanvaardbaar, zeker omdat de controle met de frame condities minder frequent dient uitgevoerd te worden. Bovendien gebeurt de verificatie modulair (dit wil zeggen component per component, compositie per compositie), wat maakt dat het verificatieproces lineair schaalt voor grotere projecten. Dit is onder voorbehoud dat de complexiteit van de componenten zelf niet stijgt.
5
Statische en dynamische verificatie van het indirect delen van data in reactieve toepassingen
In de vorige sectie hebben we toepassingen bestudeerd met deterministische, sequenti¨ele controle overgangen. In deze sectie bestuderen we toepassingen met reactieve controle overgangen. Om te kunnen garanderen dat ook in dergelijke toepassingen geen data afhankelijkheden gebroken worden, breiden we de oplossing uit sectie 4 uit met dynamische verificatie. E´en van de gekende aanvallen op web toepassingen is het doorbreken van de reactieve controle overgangen, wat forceful browsing wordt genoemd. Hierbij volgt de web gebruiker geen van de via hyperlinks aangeboden controle overgangen uit de opmaakpagina, maar vraagt een URL aan naar keuze. Deze manier van werken kan op diverse manieren de veiligheid of robuustheid van de toepassing in gevaar brengen. Afhankelijk van de toepassing kunnen dergelijke aanvallen onder andere leiden tot het omzeilen van toegangscontrole en tot corruptie van de toestand op de server. Meer algemeen kan het doorbreken van de vooropgestelde reactieve controle overgangen de toepassingslogica breken of implementatiefouten aan de serverkant uitbuiten. Web Application Firewalls (WAF’s) die een strikt protocol kunnen afdwingen (zoals voorgesteld in criterium 4.7 van de Web Application Firewall Evaluation Criteria (WAFEC) [23]) worden algemeen aanvaard als een effectieve maatregel tegen forceful browsing. Toch is de bescherming van een WAF vaak heuristisch geconfigureerd en is het onmogelijk om een formele garantie te geven dat er ondanks de WAF geen data afhankelijkheden zullen worden gebroken in toepassingen met een gemeenschappelijke opslagplaats. Met de aanpak beschreven in dit hoofdstuk kunnen we de kloof tussen de WAF configuratie en de eigenlijke implementatie van de web toepassing verklei1 Standaard laptop (Hardware: Pentium M 1.4 Ghz met 512 MB RAM; Software: Debian linux als besturingssyteem, versie 1.4.2 09 van Java, en ESC/Java2 versie 2.0a9)
5.1
Gewenste eigenschappen van een compositie
xvii
nen. Hierbij combineren we de protocolverificatie mogelijkheden van een WAF met de statische verificatiemogelijkheden uit sectie 4. Als resultaat kunnen we formele garanties bieden dat een gegeven WAF configuratie een gegeven toepassing dermate beschermt zodat er geen data afhankelijkheden gebroken worden tijdens de uitvoering, onafhankelijk welke inbreuken de web gebruikers pleegt op het de vooropgestelde reactieve controle overgangen.
5.1
Gewenste eigenschappen van een compositie
Concreet beschrijven we de gewenste compositie eigenschap in een web toepassing met reactieve controle overgangen als: Geen gebroken data afhankelijkheden op de gemeenschappelijke opslagplaats eigen aan een gebruiker: Geen enkele aanvraag verstuurd door de web gebruiker heeft als gevolg dat een data object van de gemeenschappelijke opslagplaats, eigen aan de gebruiker, wordt opgehaald alvorens het daadwerkelijk door een component op de opslagplaats is opgeslagen. Tevens geldt voor elke leesinteractie dat het data object dat beschikbaar is op de gemeenschappelijke opslagplaats tijdens de leesinteractie van het type is dat verwacht wordt door de lezende component.
5.2
Overzicht van de oplossing
Figuur 8 toont een overzicht van onze oplossing bij web toepassingen met reactieve controle overgangen. Het verificatieproces bestaat uit drie stappen. De drie stappen samen garanderen dan dat een gegeven compositie voldoet aan de eigenschap geen gebroken data afhankelijkheden op de gemeenschappelijke opslagplaats eigen aan de gebruiker.
Figuur 8: Overzicht van de oplossing
5.2
Overzicht van de oplossing
xviii
De eerste verificatiestap is analoog aan de oplossing voor deterministische controle overgangen uit sectie 4. Eerst worden de interacties met de gemeenschappelijke opslagplaats expliciet beschreven in de component contracten. Vervolgens wordt met behulp van statische verificatie nagegaan of de component contracten overeenkomen met de implementatie van de contracten. In een tweede stap wordt op basis van de configuratie informatie geverifieerd of aan de gewenste compositie eigenschap voldaan is. Hierbij maken we de veronderstelling dat we voor elke web gebruiker een logische bovengrens kunnen vastleggen voor het toegelaten verkeer tussen de web gebruiker en de web toepassing. Deze bovengrens is het protocol dat we verwachten dat zal plaatsvinden tussen de web gebruiker en deze specifieke web toepassing. Deze bovengrens kan bijvoorbeeld de policy zijn die een Web Application Firewall afdwingt om forceful browsing tegen te gaan. Deze verificatiestap garandeert dan dat aan de gewenste compositie eigenschap voldaan is mits het protocol voldoet aan deze bovengrens tijdens de uitvoering. De derde verificatiestap zal tijdens de uitvoering van de toepassing aanvragen filteren zodat het protocol tussen de web gebruiker en de web toepassing voldoet aan de bovengrens. 5.2.1
Stap 1: Component specificatie en verificatie
De eerste verificatiestap is heel gelijkaardig aan de eerste stap van de statische oplossing uit sectie 4. Een component contract beschrijft voor elke component zijn interacties met de gemeenschappelijke opslagplaats eigen aan de gebruiker. Daarna wordt voor elke component via statische verificatie de overeenkomst gecontroleerd tussen het component contract en de implementatie van de component. Opnieuw wordt hier een probleemspecifieke contracttaal gebruikt, zoals ge¨ıllustreerd in listing 4. In dit voorbeeld zijn de contracten semantisch rijker geworden. Zo kan een component aangeven dat het ophalen van informatie vanop de gemeenschappelijke opslagplaats defensief geprogrammeerd is (aangeduid door Nullable) en kan de component ook schrijfinteracties uitdrukken in functie van de toestand van de gemeenschappelijke opslagplaats. Listing 4: Probleem-specifieke specificatie van ShowCartServlet //spec: reads {ResourceBundle messages, Nullable<ShoppingCart> cart, // Nullable currency} from session; //spec: writes {cart == null => ShoppingCart cart} on session; //spec: possible writes {currency == null => Currency currency} on session;
5.2
Overzicht van de oplossing
5.2.2
xix
Stap 2: Toepassing specifieke statische verificatie van de gewenste eigenschap
Om statische verificatie van de gewenste eigenschap mogelijk te maken wordt een bovengrens gedefinieerd voor de interacties tussen de web gebruiker en de specifieke web toepassing. Deze kan voorgesteld worden op verschillende manieren, zoals een reguliere expressie, een EBNF notatie of een gelabelde toestandsmachine. Figuur 9 geeft bijvoorbeeld de toestandmachine weer voor de Duke’s BookStore toepassing, waarin elke transitie overeenkomt met een mogelijke web aanvraag tussen de web gebruiker en de toepassing. Een dergelijke toestandsmachine kan bijvoorbeeld afgeleid worden op basis van een bestaand client programma, uit de analyse van legitiem netwerkverkeer of uit de hyperlinks van de diverse opmaakpagina’s.
Figuur 9: Client/server interactieprotocol Vervolgens wordt met behulp van statische verificatie nagegaan of aan de gewenste compositie eigenschap voldaan is voor elk mogelijk protocol tussen de web gebruiker en de web toepassing (dat voldoet aan de gelabelde toestandsmachine). Dankzij de gedefinieerde bovengrens is deze verificatiestap gelijkaardig aan de statische verificatiestap uit de oplossing van sectie 4. Op basis van de gelabelde toestandsmachine wordt een toepassingsspecifieke verificatiemethode gegenereerd. Deze verificatiemethode bevat alle mogelijke paden doorheen de toestandsmachine, en elke transitie wordt vertaald naar de oproep van de overeenkomstige component aan de serverkant. Het reactieve of indeterministishe gedrag is in deze verificatiemethode gemodelleerd door gebruik te maken van de java.util.Random klasse, if-then-else takken, switch-cases and while-lussen. Listing 5 toont de toepassingsspecifieke verificatiemethode voor de Duke’s BookStore toepassing. Listing 5: Toepassingsspecifieke verificatiemethode //@ requires request != null ; //@ requires request . session .messages == null && request.session.cart == null // && request.session.currency == null; public void protocolCheck(HttpServletRequest request, HttpServletResponse response){
5.2
Overzicht van de oplossing
xx
try { Random random = new Random(); bookstore.doGet(request,response); while(random.nextBoolean()){ int randomInt = random.nextInt(); switch(randomInt){ case 0: showcart.doGet(request,response); break; case 1: banner.doGet(request,response); break; case 2: bookstore.doGet(request,response); break; case 3: bookdetail.doGet(request,response); break; default: break; } } if (random.nextBoolean()){ switch(random.nextInt()){ case 0: cashier .doGet(request,response); break; default: catalog.doGet(request,response); break; } while(random.nextBoolean()){ switch(random.nextInt()){ case 0: showcart.doGet(request,response); break; case 1: catalog .doGet(request,response); break; case 2: cashier .doGet(request,response); break; case 3: bookstore.doGet(request,response); break; case 4: bookdetail.doGet(request,response); break; case 5: banner.doGet(request,response); break; default: break; } } orderFilter . doFilter(request,response,null); receipt .doPost(request,response); } } catch(Exception e) { e.printStackTrace(); } }
5.2.3
Stap 3: Verificatie van het protocol tijdens de uitvoering
In een laatste stap wordt de bovengrens op het protocol tussen de web gebruiker en de web toepassing afgedwongen. Dit kan gebeuren door de protocol bovengrens te configureren op een WAF die een strikt protocol kan afdwingen, of door een eigen filter toe te voegen aan het verkeer tussen de web gebruiker en de web toepassing en aldus de aanvragen te filteren alvorens deze aankomen op de web toepassing.
5.3
5.3
Validatie in de Duke’s BookStore toepassing
xxi
Validatie in de Duke’s BookStore toepassing
De voorgestelde aanpak werd gevalideerd in de Duke’s BookStore toepassing, die deel uitmaakt van de J2EE tutorial [1]. De toepassing bestaat uit ongeveer 3500 lijnen code en implementeert de basisfunctionaliteit van een e-commerce toepassing met behulp van 6 componenten. Deze componenten hebben we van de nodige specificatie voorzien en de overeenkomst met hun implementatie geverifieerd met behulp van ESC/Java2. Op basis van de configuratie informatie en de gelabelde toestandsmachine uit figuur 9, werd de toepassingsspecifieke verificatiemethode gegenereerd. Een deel van deze verificatiemethode is voorgesteld in listing 5. Gegeven de bovengrens voor het protocol tussen de web gebruiker en de toepassing (uitgedrukt in de gelabelde toestandsmachine), werd vervolgens geverifieerd dat de eigenschap geen gebroken data afhankelijkheden op de gemeenschappelijke opslagplaats eigen aan de gebruiker op geen enkel ogenblik geschonden wordt. Hiervoor werd de toepassingsspecifieke verificatiemethode geverifieerd met behulp van ESC/Java2. Als laatste stap werd een J2EE filter gebruikt in de web toepassing om te garanderen dat enkel de aanvragen die voldoen aan de gelabelde toestandsmachine effectief worden verwerkt door de toepassing. We hebben de resultaten van deze experimenten gebruikt om enerzijds de hoeveelheid vereiste specificatie te meten, en om anderzijds de snelheid van de verificatie in kaart te brengen (zowel van de statische als de dynamische verificatie). Hoeveelheid specificatie Tijdens het experiment werden ten hoogste 4 lijnen probleemspecifieke specificatie gebruikt per component. Daarnaast werd door het toepassen van een heel pragmatische framing ook het aantal methoden uit bibliotheken dat dient geannoteerd te worden sterk gereduceerd. Verificatiesnelheid Het meten van de verificatiesnelheid gebeurde op een standaard laptop2 . De tijd die nodig is om te verifi¨eren of een component specificatie overeenkomt met de implementatie van de component, bedroeg maximaal 225 seconden. De verificatie van de toepassingsspecifieke verificatiemethode nam 11 seconden in beslag. Performantiekost dynamische verificatie Om de kost op de performantie in te schatten van de dynamische verificatie werd het volgende experiment uitgevoerd met en zonder de dynamische verificatie. We simuleerden 1000 sequenti¨ele gebruikers die elk een protocol uitvoeren bestaande uit 6 aanvragen. 2% van de bezoekers paste hierbij forceful browsing toe. In dit experiment maten we een performantiekost van 1.3%. 2 Standaard laptop (Hardware: Pentium M 1.4 Ghz met 512 MB RAM; Software: Debian linux als besturingssyteem, versie 1.4.2 09 van Java, en ESC/Java2 versie 2.0a9)
xxii
6
Besluit
Deze sectie somt de specifieke bijdragen van deze thesis op en vat de resultaten van de uitvoerde validatie-experimenten samen.
6.1
Bijdragen van deze thesis
We zijn de thesis gestart met het verkrijgen van goede inzichten in het beheer van afhankelijkheden in componentgebaseerde toepassingen met een gemeenschappelijke opslagplaats. We hebben in het bijzonder de afhankelijkheden bestudeerd tussen componenten die via de gemeenschappelijke opslagplaats met elkaar een data object delen. We hebben een aantal typische compositieproblemen ge¨ıdentificeerd in twee verschillende toepassingsdomeinen: componentgebaseerde protocolstapels en servlet-gebaseerde web toepassingen. Ten slotte hebben we ook de complexiteit van het beheer van afhankelijkheden in kaart gebracht door een diepgaande afhankelijkheidsstudie van de middelgrote, webmail toepassing GatorMail. Na de identificatie van een aantal vaakvoorkomende compositieproblemen in toepassingen met een gemeenschappelijke opslagplaats, hebben we de geen gebroken data afhankelijkheden compositie eigenschap gedefinieerd. Het bereiken van deze eigenschap elimineert een substantieel deel van de ge¨ıdentificeerde compositieproblemen in dergelijke composities. Op basis hiervan werd de hoofddoelstelling van de thesis geformuleerd als het reduceren van het aantal fouten tijdens de uitvoering door formele garanties te bieden dat een gegeven compositie op geen enkel moment tijdens de uitvoering een inbreuk pleegt op de geen gebroken data afhankelijkheden compositie eigenschap. De formele verificatie van de geen gebroken data afhankelijkheden compositie eigenschap wordt bereikt in twee stappen. In een eerste stap verifi¨ eren we statisch dat een gegeven compositie met deterministische controle overgangen voldoet aan de gewenste compositie eigenschap. Hiertoe worden de componenten uitgebreid met contracten die de interacties met de gemeenschappelijke opslagplaats beschrijven. Vervolgens werden deze contracten als basis gebruikt om samen met de configuratie informatie te verifi¨eren dat zich geen inbreuken kunnen voordoen op de compositie eigenschap. In een tweede stap breiden we de verificatie van de gewenste compositie eigenschap uit naar toepassingen met reactieve controle overgangen. Hierbij wordt de statische verificatie aanpak voor deterministische systemen gecombineerd met verificatie tijdens de uitvoering. Om de statische verificatie mogelijk te maken met de reactieve controle overgangen, defini¨eren we een bovengrens voor het reactief gedrag. We noemen dit de vooropgestelde interacties tussen de web gebruiker en de toepassing. Daarna verifi¨eren we statisch of een gegeven compositie voldoet aan de compositie eigenschap, onder de veronderstelling dat elke interactie tussen de web gebruiker en de toepassing deel uitmaakt van de voorop-
6.2
Validatie experimenten
xxiii
gestelde interactie tussen de web gebruiker en de toepassing. Ten slotte beperken we tijdens de uitvoering de aanvragen naar de server, zodat de aanvragen die de toepassing bereiken deel uitmaken van de vooropgestelde interactie tussen de web gebruiker en de toepassing.
6.2
Validatie experimenten
We hebben de formele verificatie van de geen gebroken data afhankelijkheden eigenschap gevalideerd zowel in een deterministische als in een reactieve softwarecompositie. Naast de haalbaarheid van de aanpak waren we in deze experimenten ook ge¨ınteresseerd hoe goed aan de twee extra-functionele criteria uit sectie 2 werd voldaan: beperkte overhead en toepasbaarheid in realistische toepassingen. De oplossing voor toepassingen met deterministische controle overgangen (voorgesteld in sectie 4) werd gevalideerd door deze toe te passen op de middelgrote, webmail toepassing GatorMail. In dit eerste experiment hebben we aangetoond dat, dankzij de modulaire verificatie, de voorgestelde oplossing lineair schaalt naar grotere toepassingen, mits de verificatiecomplexiteit van elke methode stabiel blijft. Verder hebben we in dit experiment een beperkte annotatie overhead vastgesteld (maximaal 4 lijnen probleemspecifieke specificatie per component) en een verificatietijd per component van maximaal 15 seconden gemeten. De oplossing voor toepassingen met reactieve controle overgangen (voorgesteld in sectie 5) werd gevalideerd door deze toe te passen op de Duke’s BookStore toepassing. Deze toepassing is een didactisch voorbeeld van een e-commerce site en wordt samen met de J2EE tutorial gebundeld. In dit tweede experiment maten we opnieuw een beperkte annotatie overhead (maximaal 4 lijnen probleemspecifieke specificatie per component) en een nog steeds haalbare verificatietijd (tot maximaal 4 minuten per component). Bovendien toonde het experiment aan dat de verificatie tijdens de uitvoering slechts een perfomantiekost van 1.3% met zich meebracht. Beide experimenten toonden aan dat het verbeteren van de betrouwbaarheid en veiligheid van web toepassingen door gebruik te maken van bestaande verificatieprogramma’s hoopvol lijkt. We beperkten in beide gevallen de overhead, zowel in termen van inspanning door de programmeur als in termen van verificatietijd. Bovendien geloven we dat met behulp van ondersteunende programma’s de inspanningen van de programmeur nog verder teruggeschroefd kunnen worden. Ten slotte illustreerden beide experimenten dat de voorgestelde oplossingen lineair schalen naar grotere, realistische toepassingen dankzij de modulaire specificatie en verificatie.
REFERENTIES
xxiv
Referenties [1] Eric Armstrong, Jennifer Ball, Stephanie Bodoff, Debbie Bode Carson, Ian Evans, Dale Green, Kim Haase, and Eric Jendrock. The J2EE 1.4 Tutorial. Sun Microsystems, Inc., December 2005. [2] Lilian Burdy, Yoonsik Cheon, David Cok, Michael Ernst, Joe Kiniry, Gary T. Leavens, K. Rustan M. Leino, and Erik Poll. An overview of JML tools and applications. International Journal on Software Tools for Technology Transfer (STTT), 7(3):212–232, June 2005. [3] David R. Cok. ESC/Java2 Implementation Notes. http: //secure.ucd.ie/products/opensource/ESCJava2/ESCTools/docs/ Escjava2-ImplementationNotes/Escjava2-ImplementationNotes.pdf. [4] Lieven Desmet, Frank Piessens, Wouter Joosen, and Pierre Verbaeten. Dependency analysis of the Gatormail webmail application. Report CW 427, Department of Computer Science, K.U.Leuven, Leuven, Belgium, September 2005. [5] Drake Emko, Sandy McArthur, Todd Williams, and Stephen L. Ulmer. GatorMail WebMail. http://sourceforge.net/projects/gatormail/. [6] Eric Freeman, Ken Arnold, and Susanne Hupfer. JavaSpaces Principles, Patterns, and Practice. Addison-Wesley Longman Ltd., 1999. [7] J. Hunter and W. Crawford. Java Servlet Programming. O’Reilly, second edition, April 2001. [8] Bart Jacobs, Jan Smans, Frank Piessens, and Wolfram Schulte. A Statically Verifiable Programming Model for Concurrent Object-Oriented Programs. In Proceedings of the Eighth International Conference on Formal Engineering Methods (ICFEM 2006), 2006. [9] Nico Janssens, Wouter Joosen, and Pierre Verbaeten. NeCoMan: Middleware for Safe Distributed-Service Adaptation in Programmable Networks. IEEE Distributed Systems Online, 6(7), July 2005. [10] Jini. http://www.jini.org/. [11] KindSoftware. The Extended Static Checker for Java version 2 (ESC/Java2). http://secure.ucd.ie/products/opensource/ESCJava2/. [12] Gary T. Leavens. jmlspecs.org/.
The Java Modeling Language (JML).
http://www.
REFERENTIES
xxv
[13] K. Rustan M. Leino, Greg Nelson, and James B. Saxe. ESC/Java User’s Manual. [14] Sam Michiels. Component Framework Technology for Adaptable and Manageable Protocol Stacks. PhD thesis, K.U.Leuven, Dept. of Computer Science, Leuven, Belgium, November 2003. [15] Sam Michiels, Lieven Desmet, Wouter Joosen, and Pierre Verbaeten. The DiPS+ software architecture for self-healing protocol stacks. In Proceedings of the 4th Working IEEE/IFIP Conference on Software Architecture (WICSA4), Oslo, Norway, 2004. IEEE/IFIP, IEEE. [16] Sam Michiels, Nico Janssens, Lieven Desmet, Tom Mahieu, Wouter Joosen, and Pierre Verbaeten. A component platform for flexible protocol stacks. In Component-Based Software Development for Embedded Systems: An Overview of Current Research Trends, volume 3778/2005 of Lecture Notes in Computer Science, pages 185–208. Springer-Verlag, GmbH, November 2005. [17] Vipin Samar. Unified login with pluggable authentication modules (pam), 1996. [18] Mary Shaw and David Garlan. Software Architecture - Perspectives on an emerging discipline. Prentice-Hall, 1996. [19] Sun Microsystems, Inc. Java Servlet Technology. http://java.sun.com/ products/servlet/. [20] Clemens Szyperski, Dominik Gruntz, and S. Stephan Murer. Component Software - Beyond Object-Oriented Programming. Addison Wesley Professional, 2nd edition, November 2002. [21] Clemens Szyperski and Cuno Pfister. Component-oriented programming: Wcop’96 workshop report, 1997. [22] The Apache Software Foundation. The Struts Framework. http://jakarta. apache.org/struts/. [23] Web Application Security Consortium. Web Application Firewall Evaluation Criteria, version 1.0. http://www.webappsec.org/projects/wafec/, January 2006.