CSA
Computer Systemen en Applicaties deel #1 van het collegedictaat
Patrick van der Smagt Second edition
September 1995
2
c 1994, 1995 The University of Amsterdam. Permission is granted to distribute single copies of
this book for non-commercial use, as long as it is distributed as a whole in its original form, and the names of the authors and the University of Amsterdam are mentioned. Permission is also granted to use this book for non-commercial courses, provided the authors are noti ed of this beforehand. The authors can be reached at University of Amsterdam Faculty of Mathematics & Computer Science Kruislaan 403, NL{1098 SJ Amsterdam THE NETHERLANDS Phone: +31 20 525 7463 Fax: +31 20 525 7490 email:
[email protected]
September 1994
Computer Systemen en Applicaties
Contents 1 Introduction
9
1.1 How to understand a computer? : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 1.2 Structure 1: Layers of computation : : : : : : : : : : : : : : : : : : : : : : : : : : 9 1.3 Overview of this book : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12
2 Operating systems
2.1 Tasks of an operating system : : : 2.1.1 Process management : : : : 2.1.2 Memory management : : : 2.1.3 File management : : : : : : 2.1.4 Communication : : : : : : : 2.2 Operating system structure : : : : 2.2.1 The monolithic structure : 2.2.2 The client-server structure :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
3.1 Introduction : : : : : : : : : : : : : : : : 3.1.1 Goals of computer networks : : : 3.1.2 Building blocks of a network : : 3.1.3 Types of networks : : : : : : : : 3.2 The ISO/OSI model : : : : : : : : : : : 3.2.1 The application layer : : : : : : : 3.2.2 The presentation layer : : : : : : 3.2.3 The session and transport layers 3.2.4 The network layer : : : : : : : : 3.2.5 The data link layer : : : : : : : : 3.2.6 The physical layer : : : : : : : : 3.2.7 Summing up: The reality : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
4.1 Generations : : : : : : : : : : : : : : : : : : : : : : : 4.1.1 Architecture of a computer : : : : : : : : : : 4.2 The bus : : : : : : : : : : : : : : : : : : : : : : : : : 4.3 Memory : : : : : : : : : : : : : : : : : : : : : : : : : 4.3.1 Bits, bytes, sizes : : : : : : : : : : : : : : : : 4.3.2 Memory types : : : : : : : : : : : : : : : : : 4.4 Processors : : : : : : : : : : : : : : : : : : : : : : : : 4.4.1 Measuring processor speed : : : : : : : : : : 4.4.2 CISC vs. RISC processors : : : : : : : : : : : 4.4.3 Pipelining, superscalar and vector processors
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
3 Networks
4 Computer architecture
3
13 15 15 17 19 22 24 24 25
29 30 30 30 32 33 35 35 35 36 37 37 37
39 39 41 43 44 44 44 46 48 48 50
4
CONTENTS 4.4.4 Odd man out : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 51 4.5 I/O : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 52 4.5.1 Video displays : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 52
A Technologietrends in de informatica
A.1 Inleiding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A.2 Wat is een computer? : : : : : : : : : : : : : : : : : : : : : : : : : : A.3 Mijlpalen : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A.3.1 Hardware : : : : : : : : : : : : : : : : : : : : : : : : : : : : : A.3.2 Programmeertalen en -methoden : : : : : : : : : : : : : : : : A.3.3 Bedrijfssystemen : : : : : : : : : : : : : : : : : : : : : : : : : A.4 Het exponentiele technologie-groei model : : : : : : : : : : : : : : : : A.4.1 Voorbeeld van exponentiele ontwikkeling: geheugenelementen A.4.2 Waarvoor wordt die rekenkracht gebruikt? : : : : : : : : : : : A.4.3 Waarom steeds kleiner? : : : : : : : : : : : : : : : : : : : : : A.4.4 Chips van dichtbij bekeken : : : : : : : : : : : : : : : : : : : A.4.5 Andere aspecten : : : : : : : : : : : : : : : : : : : : : : : : : A.5 De levensloop van een computer : : : : : : : : : : : : : : : : : : : : A.6 Toekomstige ontwikkelingen : : : : : : : : : : : : : : : : : : : : : : :
B Glossary Bibliography
September 1994
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
: : : : : : : : : : : : : :
55 55 56 58 58 59 60 60 60 61 61 62 63 63 64
67 73
Computer Systemen en Applicaties
List of Figures 1.1 Three examples of logical gates: OR, AND, and NOT. : : : : : : : : : : : : : : 11 2.1 2.2 2.3 2.4 2.5 2.6
An example process tree. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Virtual vs. real memory space. : : : : : : : : : : : : : : : : : : : : : : : : : : : : An example le system on a Unix computer. : : : : : : : : : : : : : : : : : : : : The execution of a system call in a monolithic operating system. : : : : : : : : : The client-server structure of the MINIX operating system. : : : : : : : : : : : : The client-server operating system lends itself perfectly for a transparently distributed architecture. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
17 19 20 25 26 27
3.1 A local area network, consisting of an ethernet connecting various devices together and with the outside world. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32 3.2 Two hosts A and B communicating over an OSI network. : : : : : : : : : : : : : 37
: : : : B.1 Virtual machines stacked on top of each other. : : : : : :
: : : : : B.2 A typical disk structure (yet the track width is much enlarged). :
4.1 4.2 4.3 4.4
A picture of the inside of a 1994 workstation. : : : : : : Schematic drawing of (part of) the inside of a computer. The inside of a CPU. : : : : : : : : : : : : : : : : : : : : Pipelining in a processor. : : : : : : : : : : : : : : : : :
5
: : : : :
: : : : :
: : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
42 42 47 51 67 68
6
September 1994
LIST OF FIGURES
Computer Systemen en Applicaties
List of Tables 1.1 A possible anatomy of the modern digital computer (after (Tanenbaum, 1984)). :
9
2.1 BSD and SysV implementations of Unix operating systems. : : : : : : : : : : : : 14 3.1 Names of the highest-level Internet domains. : : : : : : : : : : : : : : : : : : : : 34 3.2 The ISO/OSI reference model for networking. : : : : : : : : : : : : : : : : : : : : 34
: : : : B.1 Typical data for a Sun SCSI harddisk. : : : : : : : : : : :
4.1 4.2 4.3 4.4
The history of computer architecture generations. : : : : Comparison of types of memory. : : : : : : : : : : : : : Various processors throughout the ages. : : : : : : : : : Number of dhrystones per second for various computers.
7
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
40 46 49 50 68
8
September 1994
LIST OF TABLES
Computer Systemen en Applicaties
Chapter 1
Introduction 1.1 How to understand a computer? What makes a computer run? There are many possible answers to this question, depending on the point of view of the questioner. For instance, an electrical engineer who's used to building circuits himself may have a totally dierent view than a secretary typing letters for his boss on a Mac computer. In this chapter, and throughout this book, we'll adopt a not-so-shocking view on computer systems: the hierarchical, layered structure. This layered structure describes increasing levels of complexity of viewing a computer system. In this we follow Tanenbaum (Tanenbaum, 1984) and many authors since then. O we go.
1.2 Structure 1: Layers of computation To get a better understanding of the computer's inner workings, the layered structure as shown in table 1.1 is useful to breaks a computer system down into smaller parts. Although the Level Abstraction 6 5 4 3 2 1 0
high-level programming imperative programming assembly language operating system kernel machine language microprogram digital logic
Table 1.1: A possible anatomy of the modern digital computer (after (Tanenbaum, 1984)). dissection, as presented in the table, will not be unfamiliar or controversial to the initiated, the distinction between the levels is not as clear-cut as the table may suggest. There is one important dierence, however, which may be regarded as a guideline: in general, the translation between the higher levels occurs through compilation; the translation among the lower levels through interpretation. That is to say, a Pascal program at level 5 will, in general, be compiled to level 4 before it is executed. On the other hand, a program at level 2 (in machine language) is actually interpreted in the microprogram at level 1, instead of compiled. By dissecting a digital computer in such a way, this text will attempt to present a view of that computer which, on the one hand, considers it level-by-level, but on the other, also examines 9
10
CHAPTER 1. INTRODUCTION
level-to-level. Let's consider the levels in turn. 6: High level programming. Programming languages have developed along several lines. At the top level we therefore distinguish three (mutually incomparable) programming styles: visual programming, declarative programming and object-oriented programming. A language may belong to several of these categories. Each of the three categories can be explained best by comparing it to imperative programming at level 5 (where you will read: a program in an imperative language is an ordered sequence of statements and procedures calls specifying the steps the computer must follow). Visual programming allows the user to use graphics to create a program. Some types of complex programs, such as those that use concurrent processes or deal with real-time systems, are dicult to describe with textual languages; graphical speci cations may be more appropriate. For instance, part of the ExSpect language that we use in the programming lab of this course is graphical and is typically used to specify networks of interacting components. A program in declarative (non-procedural) style tells the computer only what is to be accomplished, but not how to do it. Logical languages like Prolog, functional languages, and algebraic speci cation languages form subclasses of declarative languages. Whereas programs in an imperative style are process-oriented, object-oriented programming focuses on the description of objects: data-structures with the related operations. Object-oriented programming naturally leads to modular style of programming. Some languages with features supporting the object-oriented programming paradigm are Simula, Smalltalk, C++ and Eiel. 5: Imperative programming. Imperative or procedural programming is still the prevalent style. A program in an imperative language is an ordered sequence of statements and procedures calls specifying the steps the computer must follow. The statements re ect basic machine operations. The number and size of the required steps vary with various imperative languages. Well known languages in this class are Pascal, Fortran, Algol, COBOL, BASIC and C. 4: Assembly language. This is the lowest level that the user usually sees, if at all. Assembly language programs consist of long sequences of coded instructions such as add, jump, load. Each instruction directly corresponds with a processor instruction. Therefore assembly languages are low-level languages: they are totally machine dependent. This in contrast with the languages at levels 5{8, called high-level languages, which can usually be ported from one machine to another without much (or any) adaptation. In assembly languages, there are only a few data types available (for instance, oating point, character, and integer), and no instructions to build new types from the existing ones, such as is possible in all high-level languages. Secondly, there are instructions for jumping to other instructions and, often, the possibility to call subroutines (procedures, functions). It is also possible to make system calls by jumping to system subroutines: read a keyboard character, read a block from disk, etc. Such calls are handled by the operating system. 3: Operating system kernel. This level can be seen as an extension to level 2. The language available at that level is still available here, but is complemented with the operating system, which hides much of the large intestine of the computer. Chapter 2 discusses operating systems. 2: Machine language. Machine language is like level 4 (assembly), but does not provide calls to the operating system. This is bare-bone assembly, and gives the full power of the
September 1994
Computer Systemen en Applicaties
1.2. STRUCTURE 1: LAYERS OF COMPUTATION
11
machine to the user. Pretty cumbersome to program, this, and we will not go into this level any further. 1: Microprogram. On some computers, instructions on level 2 are interpreted by the microprogram into microcode on level 1. Consider the following example. An instruction to add to numbers, one in memory location a and the other in memory location b, together and store the result in a, might be interpreted as follows: 1. read the next instruction stored at the memory address indicated by the program counter. 2. decode the instruction; in this case, it is found that the values of a and b must also be fetched. 3. fetch the value of a. 4. fetch the value of b. 5. get contents of memory location a. 6. get contents of memory location b. 7. add two numbers together in the processor. 8. store the result back in memory location a. 9. increment the program counter to skip to the next instruction. The program counter is a register in the processor in which the address of the next instruction stored in memory is kept. In some processor designs (to be precise, CISC processors which are discussed in section 4.4), the allowable instructions are so complex that they cannot be directly executed in hardware but are broken into smaller, simpler, instructions that can be executed in hardware. No more on this level, too. 0: Digital logic. The nal level, that of the digital logic, represents the computer program and data by 0 and 1 values on the so-called logical gates. These gates perform boolean operations such as AND and OR on their inputs. Some examples are shown in gure 1.1. symbol:
OR
x1 x2
y
y = x1 jx2 C notation: standard notation:y = x1 + x2 truth table:
x1 x2 y
0 0 1 1
0 1 0 1
0 1 1 1
AND
x1 x2
NOT
y
y
z
y = x1 &x2 y = x 1 x2
y = x y=x
x1 x2 y
x y
0 0 1 1
0 1 0 1
0 0 0 1
0 1
1 0
Figure 1.1: Three examples of logical gates: OR, AND, and NOT. The AND, OR, and NOT gates together suce (in principle) to build a computer (you can't build a clock, though, without a crystal). Indeed, computers consist for a large part
Computer Systemen en Applicaties
September 1994
12
CHAPTER 1. INTRODUCTION from these logical gates. By combining several types of gates, you can build adders (i.e., circuits that add two numbers) and memory. To design circuitry using these gates, Boolean algebra is used.. This algebra, in which all numbers are either 0 or 1, is named after George Boole (1815{1864) who rst published on the subject. The gates are functions in that algebra, and each function can be described by a truth table as shown in the gure. These truth tables give the output of the gate for each input; this is possible due to the fact that the inputs are binary.
Example 1 Show how a single memory cell (i.e., a piece of circuitry that can store a 0 or 1) can be designed from two NAND's (a NAND is an AND followed by a NOT, which therefore outputs a 0 when both inputs are 1, and 1 otherwise. In symbols y = x1 + x2 ).
Let us take another view, somewhat more high-level, at computer systems.
1.3 Overview of this book Clearly, the authors try to cram a whole curriculum into one course. First of all we want to look the levels of computation in more detail. Chapter 2, for instance, looks at the operating system, around level 3 and groping upwards and downwards. Chapter 3 makes, literally and guratively, a side-step by placing a few of the level-hierarchies of computer systems in a network, and see how they connect. This leads to a new layered system known as OSI. Back to basics in chapter 4, which concentrates on design issues occurring below level 3: the how, what, and why of computer architectures. Next set of levels: the programming languages. More to be written on that in this introduction. So, apart from walking along the road of boring fact after fact, we also want to elucidate the underlying theoretical concepts. How do we model the connection between to processes running on one computer? How to prevent deadlocks? To ensure secure computation (nice when you're operating a nuclear plant or a submarine)? Each chapter will show sidesteps to such topics. Thus we will not only go into the what of computer science, but also the why. And, the when is covered in appendix A. Appendix B gives a glossary of topics which go too far to explain in detail, but are interesting to know anyway.
September 1994
Computer Systemen en Applicaties
Chapter 2
Operating systems Early computing1 in the 1940's and 1950's was a bit like working on a powerful pocket calculator: the operator programmed the machine by providing it with a set of instructions, which were directly executed by the hardware. Programming occurred in machine language, coded as binary or hexadecimal numbers. With the development of technology during the 1960's, computers became larger, faster, and more complex. Soon it became infeasible to have a single person control all parts of the computer system by hand. Part by part these tasks were automated to ease the job of the operator: the rst operating systems were born. In the 1960's computers became powerful enough to serve multiple users at the same time, in order to decrease the time that someone had to wait before he could get access to the computer. Thus time-sharing, also known as multiprogramming, was born. MIT, Bell Labs, and General Electric made an attempt at writing an operating system to support time-sharing. This operating system, called MULTICS (MULTiplexed Information and Computing Service), was to allow several hundreds of users to connect simultaneously to a single computer. As the complexity of computers increased, operating systems increased in size and complexity, too, in order to keep the computer usable and understandable for a large variety of users. Indeed, a current-day computer without operating system is unthinkable and virtually unusable. In the layered structure of a computer system shown in table 1.1, the middle layer is represented by the operating system. This middle layer is very important, since it shields the particularities of the underlying hardware from the above layers as much as possible (and desired). The operating system presents the user with a virtual machine (glossary p. 67) such that this user can do the same operations on very dierent computers. This asks for the existence of standards, such that your programs can be used on a Sun, Silicon Graphics, NeXT, PC, or whatever machine without much extra work; or, that you can update to a next, faster, version of the same computer without having any side eects. Unfortunately, standards are usually de facto. That is to say, someone writes a program or makes a computer architecture which is subsequently well-marketed, its popularity booms, and it will be used ever after. With patches, extensions, and alterations till Kingdom Come, to keep up with new trends. Unix is a very good example (see (Goodheart & Cox, 1994) for a wonderful description of the history of Unix). At the time that MULTICS was written, it was too complex a project, Bell Labs and General Electric bailed out, and the operating system never came much further than running on a few computers at MIT. However, Ken Thompson from Bell Labs, who had been working on the MULTICS project, simpli ed MULTICS to provide access to a computer for one user (in fact, he himself running a PDP{7), leading to an operating system which he dubbed UNICS (UNiplexed Information and Computing Service). Together with Dennis Ritchie, also from Bell Labs, they rewrote, redesigned, and respelled the operating 1 The rst section in chapter 4 also gives a concise history of computing, not very dierent from this one. Appendix A is less concise.
13
14
CHAPTER 2. OPERATING SYSTEMS
system to Unix in the high-level programming language C designed by Ritchie. Why is Unix so popular? Well, when Thompson and Ritchie had written Unix, AT&T licensed the operating system to universities almost for free, and thus it gained very much in popularity. Since then many avours of Unix saw the light, and a few attempts for standardisation lead to an increasing mess. The two main standardisations are BSD and SysV (also known as SVR4 for SysV Revision 4; see (Goodheart & Cox, 1994)). BSD stands for Berkeley Standard Distribution and is a avour of Unix that has been devised at the University of California at Berkeley. SysV (System Five) was designed at AT&T Bell Labs, and intended as a standardisation of Unix which would surpass previous hacks|a task that has mostly been reached. One of the most important features of SysV Unix is that it combines BSD, SVR3, SunOS, and other Unix versions, taking the best from all worlds. Table 2.1 shows some Unix implementations, and whether they're BSD or SysV. So, what is the dierence between these two avours? Well, it Table 2.1: BSD and SysV implementations of Unix operating systems. OS name machine type NeXTStep NeXT, 486, Sun, BSD SunOS (= Solaris 1) Sun BSD Solaris (v. 2 and higher) Sun SysV Silicon Graphics SysV Irix Linux PC SysV To nd out which operating system you're using, type uname -a. If it says SunOS 5.2 or higher, then it's Solaris. :::
would be possible to provide a list of details where the one diers from the other (we won't; don't worry. If you really need such a list, consult (Frisch, 1991) and (Goodheart & Cox, 1994)). But: : :fundamental dierences? According to Perry Hutchison, The fundamental dierence is that one is broken, the other is not. The ames arise from strenuous disagreement as to which is the broken one.
But, of course, many people disagree. As the table shows, Solaris 2.3 is SysV based. Most of the examples presented in this book are for this environment, but in a few cases dierences between such a SysV environment and the SunOS 4.1.3 BSD environment may be brought to light. The material in this chapter is for a large part based on (Tanenbaum, 1987) and (Hwang, 1993), but also derived from discussions with various know-it-all's. For this, the Internet is extremely useful. The structure of this chapter is as follows. In section 2.1, the real work is begun: it gives a modular description of the tasks that have to be performed by an operating system. All parts are combined in section 2.2, which shows a good and a not-so-good way of setting up an operating system. More practical issues are covered in the programming lab manual, which also covers the shell in detail. Since the shell is actually not a part of the operating system but rather an interface towards it, it is not covered here.
September 1994
Computer Systemen en Applicaties
2.1. TASKS OF AN OPERATING SYSTEM
15
2.1 Tasks of an operating system As we have seen in section 1.2, a computer system can be dissected in a number of layers. One of the layers is the operating system, which again consists of a few layers2 . In this chapter we will study the layered structure of an operating system. Traditionally the `layeredness' of operating systems was more a principle than a fact. After all, if you really want to separate parts of the functionality of an operating system, you will also have to provide a means of coupling those parts together. In a truly modular system that means communication, and communication means overhead: it simply costs time to communicate, and if that time becomes noticeable, users will tend to choose a faster operating system. Yet, with the increase of memory size (see section 4.3) and computing power (see section 4.4) of computers, modern operating systems can indeed assume a layered structure, where the parts of the system are completely separated and cooperate by passing messages. The introduction of this chapter attempted to show why you really need an operating system. In the subsequent sections we will study some tasks an operating system has to perform. In order to simplify the understanding of an operating system, the tasks it has to perform can be dissected in a number of classes. These classes are related to parts of the computer: the CPU (process management), I/O ( le management), and the memory (memory management). We will discuss these three parts in turn. Note that this trisection is only exemplar; for example, a text describing a computer dedicated to databases might have included a section on database management; this could also be included as a part of the operating system. Or, a system used for image processing may have a special part of the operating system dedicated to speci c operations for image processing. Put all parts together and you have an operating system.
2.1.1 Process management
In the most simple operating systems, such as MSDos, there is always one program running at any time. For example, the user starts to edit a program, and when nished quits the editor. Next, the compiler can be started to compile the program. Then the user runs it. This scheme may be sucient for such simple use, but is very inecient and annoying when more complicated problems have to be tackled. For instance, a minimum requirement would be to use a computer while a text is being printed. This example is easily extended to the desire to run a time-consuming computation, while the computer can still be used for editing, compiling, or whatever. This also leads to a much more ecient use of a computer: for instance, during editing the editor spends in the order of 99.6%3 of its running time waiting for the user to press the next key|valuable time which can be used for other things. Perhaps the most important single task of an operating system is to provide the user with process management. Below the operating-system level, a digital computer consists of a `stream' of simple commands in machine language. These simple Commands will be recognised by the processor and lead to some kind of action: jump (conditional) to a part of the program stored in memory, or call a procedure (i.e., retain the return address); add or multiply; assign values. Clearly, at this level it would not be easy to make a program which allows a user to run multiple programs at the same time. Therefore an operating system oers process management. A process, within computer science terminology, is a program in action. Even though we would like to run multiple processes, we often have only one processor at our disposal. This problem is solved with time slicing. The running time of the processor is sliced into very small parts, in the order of 10{500ms.
2 And the shell? Well, the shell actually is an interpreter of a declarative language. It may therefore be placed at level 5, although this is not really what is meant by that level. 3 This was measured by monitoring various vi processes running on Sun's.
Computer Systemen en Applicaties
September 1994
16
CHAPTER 2. OPERATING SYSTEMS
On a Unix machine, the program ps shows a list of processes running under that shell. To get a complete list of processes, type ps -waxu (BSD) or ps -edf (SysV). The top program continuously shows the running processes. Most important are the owner, name, process ID, and status that are shown. A process can be created by, for example, starting a program from the shell. E.g., when you type top, you will also see top as an active process. You can send this (or any other process of which you are the owner) a signal. E.g., you can kill the top-process by typing, in another window, kill -KILL procnr, with procnr the process ID of top. During each time slice, a process is chosen by the scheduler and subsequently executed. At the end of the slice, the process is put in the waiting state, and a next process is chosen. Switching from one process to a next one is called a context switch or process switch and may take in the order of 100s on a Sun Sparc. So the processor continuously switches from running one process to another one. The time that one process is run depends on various factors: how many processes are running, how important is the process, how long does a context switch take. For instance, when 100 processes are running and each process is a allowed to run 100ms, then it will take 10s before your editor will display the character you typed|well above the irritation limit. An operating system where the scheduler continuously switches from one runnable process to another is said to implement preemptive scheduling. This as opposed to a so-called run to completion system where each process is completed before another one is started. As we shall see below, although preemptive scheduling is a necessity on modern operating systems, it can lead to race conditions when more than one process need to share resources (e.g., the disk). A considerable eort needs to be put in the operating system to solve these facts-of-life. Another disadvantage of preemptive scheduling is that the computer cannot be used for process control anymore. The running time of any process cannot be predicted, since it depends on other processes. The system is not real-time anymore; when you try to control a nuclear plant or a robot with such a system you may end up with nasty surprises when some important process had to wait a few ms because new mail arrived!4 A process is always started by another process using one of various system calls. A system call is like an ordinary function call, but the function is coded in the operating system. When process A creates process B, A is called the parent process and B the child. This leads to a process tree as shown in gure 2.1. For each process that is running on a single machine an entry in the process table is maintained. Each process has a process number, a process state, and a parent process number. The process number is a unique identi cation of that process, and is used to change the running priority of the process, kill a process or send it another signal, and so on. The process state (see below) indicates whether a process is running, sleeping, etc. The parent process number indicates which process started this one; for instance, for processes that were started by the user (e.g., ls, vi) this is the shell process in which the process was initiated. The importance of the process concept is that one can describe the computer in terms of autonomous entities.
Process state and scheduling.
As was already mentioned, any process can be in one of several states. These states include: 4 Of
course, processes can be given a priority level. On Solaris it is even possible (for trusted users) to start real-time processes. The scheduler will always run a real-time process to completion before starting preemptive scheduling again. Anyway, care must be taken.
September 1994
Computer Systemen en Applicaties
2.1. TASKS OF AN OPERATING SYSTEM sched
rpcbind
pageout
init
ttymon
17 fsflush
started from kernel
startX
keyserv
syslogd
xinit
ttymon process replaced by X window session the user decided to run X windows
X
xterm
xterm
xterm
tcsh
tcsh
tcsh
xclock
xcalc
xtetris
Figure 2.1: An example process tree.
On Unix systems, init is always the rst process. Init is therefore the root of the tree (not counting the three other processes started from the kernel: the scheduler, and two processes for the memory manager and le system). Finally the user ends up with xterm, which in fact starts a shell.
running a process is running, i.e., it has been selected by the scheduler. Also known as active. runnable a runnable process is able to run, but currently not chosen by the scheduler. Also
known as ready or waiting. sleeping a sleeping process is a process waiting for an interrupt, or for another process to return control to this process. For instance, when a process issues a system call to read data from disk, the process will be put in sleeping state until the requested data is available. A sleeping process will never be chosen by the scheduler. Also known as blocked. Thus the state of a process depends on what the process is doing, and whether the scheduler chose it to run or not. For example, when a process a wants to read some data from disk, the process will be put into sleeping state while its request is being handled, such that other processes may utilise the processor during the time that a is waiting for its data. When is a process chosen for execution? There are several scheduling algorithms, which have to comply with a few rules. Usually, short-running processes are prevalent over long-running ones, in order to keep the user, who just wants to list a directory, not waiting too long. Secondly, processes may have a high (0) or low (127) priority, depending on how important they are, how long they have run already, and other similar factors.
2.1.2 Memory management
When many processes are loaded (even though they may be dormant), each of these processes will taking up a part the computer's memory. Not only is it wasteful that not active processes use that memory, but there simply may not be enough memory to hold them all. There are two important methods for solving this problem: swapping and paging. The issue of security will not be treated here; however, it be noted that this issue is equally important: not every user is allowed to read from or write to each memory address, or else it would be easy to steal or change someone else's data!
Computer Systemen en Applicaties
September 1994
18
CHAPTER 2. OPERATING SYSTEMS
Run, on Solaris (SysV), ps -el (make sure you have the right one. This ps is stored in /usr/bin; when you execute the shell-builtin command which ps it should show /usr/bin/ps. If not, specify the full path. The alternative might be the BSD /usr/ucb/ps which has the same functionality but dierent ags (c.q., -waxu).). Every process will have one of the following states: O running; R runnable, i.e., it may be selected by the scheduler to run; S sleeping, i.e., waiting for an event to occur (e.g., disk interrupt); I idle: the process is being created; Z zombie. This occurs when a process has nished, but its parent process is killed. The child process is not alive anymore (therefore cannot be killed; hence the name `zombie'), but is still present in the process table. T traced: the process has received a SIGSTOP signal because the parent is tracing it; X the process is waiting for a memory page to be put into memory; Of each process is also shown the ID (process number) of its parent in the PPID column. The rst four processes are: schedis the scheduler, initis the parent of all following processes, pageout is part of the memory manager, to write pages of memory back to disk at intervals, fsflush writes disk blocks back to disk at regular intervals (` ushing'). Note that the process tree has init at its root.
In Unix a process can be assigned a nice value, which is an integer between ?20 and +19 (values less than 0 can only be set by the super-user). Normally, processes run at nice-value 0; a process with a higher nice-value has a lower chance to be picked by the scheduler for execution. At value 19, a process is only executed if no other process (with a lower nice-value) is waiting. Nice-values can be checked and set with top.
September 1994
Computer Systemen en Applicaties
2.1. TASKS OF AN OPERATING SYSTEM
19
Swapping When a process A is sleeping, and the memory it reserves is needed by another process B, A can be removed from the main memory (swapped out) and written on a partition of the local disk, called the swap space. The free space, when B does not need more memory than A did, can now be occupied by B. Due to the hard disk usage (which are very slow in comparison with main memory (table 4.2), much swapping makes a computer slow. Whereas it takes only a few ms for a CPU to switch from running one process to running another, swapping out a few megabytes of memory takes tens up to hundreds of milliseconds.
Paging But what do you do when a single program is too large to t in the available memory? In MSDos, when a single program is too large to t in the main memory, uses overlays in which a program is split up|something which is not done by the operating system but by the programmer. First the rst part of a program (the .exe) is loaded and run; this part is mainly the control part. It loads the rst overlay (an .ovl le, usually) from disk into memory and runs it. When nished, the second overlay is loaded and run, and so on. Of course this is very cumbersome and in exible; therefore, mature operating systems support a system called paging. The available memory of the computer is divided in a number of pages. The amount of virtual memory that the operating system presents to the user is arbitrarily5 larger. At any time, only a few of the pages of the virtual memory are actually present in real memory ( gure 2.2). When a block of real memory
virtual memory page 1 page 2 page 3
Figure 2.2: Virtual vs. real memory space. memory is required that is not in real memory, another block will be moved out and replaced by the requested block. Various methods exist to optimally implement a page replacement technique (Tanenbaum, 1987).
2.1.3 File management Without explicitly saying so, the previous section discussed only one kind of memory known as main memory or RAM (Random Access Memory): the memory chips in your computer. But there are other types of memory for dierent purposes. This section discusses how another type of memory, viz. disk is used. 5 Naturally, there exist design considerations. Consult, e.g., (Tanenbaum, 1987) to learn more about paging and segmentation.
Computer Systemen en Applicaties
September 1994
20
CHAPTER 2. OPERATING SYSTEMS
Due to the fact that a disk a very slow medium is compared to main memory (i.e., the transfer of data is much slower), it is used for dierent purposes. Data on disks is usually accessed through les. A le is a program or a piece of text or data on a disk. For example, a le may be a C program, or a letter formatted with a word processor, or indeed that word processor itself. Files have names and contents. However, a le has nothing to do with how a disk (glossary p. 68) actually operates; a disk does not store les but blocks of data; it is the task of the operating system to let the user access the data on the disk through les. The operating system translates these le transfers to transfers of blocks of data. Files are grouped together with directories. Directories can be seen as a special kind of les. A directory is a node in the directory tree and may contain various les and directories. The root directory in gure (2.3) is called / in Unix. Subdirectories can be accessed through root directory
opt
share
man
devices
include
home
adam
Figure 2.3: An example le system on a Unix computer.
The root directory is called /. On Solaris, it contains, for instance, the directory opt in which optional software is stored. The subdirectory man stores man pages, and include is used for C include les. The directory /home is used for storing the so-called home directories of the users on that computer. Each user again may have her or his tree of directories to store programs and data.
their absolute path, e.g., the games directory of Adam can be accessed as /home/adam/games. However, when Adam has logged in on the system, his current working directory (CWD) will most likely be /home/adam(which is known as Adam's home directory). Adam can then move to his games directory by typing cd games instead of cd /home/adam/games; he need only specify the relative path.
I/O management
Apart from the directories, the les we have considered so far are known as normal les in Unix terminology. But two other types of le exist: character special les and block special les. These special les are associated with I/O (Input/Output) devices. For instance, whereas the keyboard on a computer is a device, it can be treated as if it were a le, and it is possible to read from it and write to it just as if it were a le stored on a disk; no special programs or system calls are required. The keyboard is an example of a character special le: transfer goes one character at a time. Block special les are typically disks or tapes. Transfer to and from these devices goes per block of typically 512 or 1,024 bytes.
September 1994
Computer Systemen en Applicaties
2.1. TASKS OF AN OPERATING SYSTEM
21
The program pwd (print working directory) descends the directory tree, starting from the current working directory, up to the root and prints the path it descended. There is also a shell built-in command dirs which prints the current working directory which is kept from following change directory (cd) commands. The basic program to read from or write to a le is cat. Since the `keyboard input', known as standard input or stdin, is also a le, as is `screen output' known as standard output or stdout, the simplest operation of cat is to copy stdin to stdout. Typing cat at the shell prompt echoes the input, line by line, until the end-of- le character ^D is given. Type cat le to print the input of le to stdout. On Unix machines, the special les are kept in the directory /dev. Get a `long list' of the directory using ls -l /dev. You will notice that character special les are marked by a `c' and block special les by a `b'. Mark, for instance, the le /dev/floppy: this is the le associated with the oppy disk (provided your machine has one). You will have direct access to the disk through this block-special le: for instance, you can read from the disk by typing, more /dev/floppy.
Interrupt handlers. In a previous section we discussed the scheduler, which is a piece of the operating system which repeatedly chooses a process from the process queue, and allows it to run for a short while. One question we did not ask, however: who schedules the scheduler? After all, the scheduler is just another process. The answer to that question is the following. When the scheduler has nished choosing a process for execution, it issues a sleep() system call, which allows it to go to sleep for the speci ed amount of time. When this time has elapsed, the system clock will generate an interrupt. This interrupt is generated by, in hardware, switching one of the processor's interrupt lines `on.' The processor will now be interrupted, stop running the process it was just working on, and read the next instruction from the device that initiated the interrupt. This instruction tells the processor which interrupt handler it will have to execute. This interrupt handler is a piece of code in the operating system. Typically, an interrupt handler for the hard disk, for the clock, etc. would exist. In the case of the scheduler, the clock interrupt handler would cause the scheduler to be run again. When the scheduler is nished playing around with the process table, the next process will be allowed to run. A similar thing occurs when a disk controller generates an interrupt (e.g., because it has nished loading a block from the disk to memory). In this case, however, the interrupt handler will start running the device driver for the disk, and only resume control to the interrupted program when that device driver is nished. Device drivers. With each I/O device is associated a device driver. In each operating
system one device driver exists for each connected type of device; i.e., one for the oppy drive, one for the keyboard, one for the screen, etc. The device drives contain the device-dependent software necessary to control that device. For instance, for a screen that may be code to write a byte to the screen, to scroll the screen one line up, to clear the screen, etc. The device drivers provide the interface between the device-independent part of the operating system, and the I/O devices. For instance, a user process may issue a read() system call. The device-independent part of the I/O software in the operating system will, depending on where the issued read is supposed to read from (a le, keyboard, a network, : : :) forward the call to the corresponding device driver.
Computer Systemen en Applicaties
September 1994
22
CHAPTER 2. OPERATING SYSTEMS
2.1.4 Communication
A very important subject that we have to consider is communication. Communication within computer systems consists at various levels: communication between the various parts of the operating system; communication between two processes running on a single processor; communication between two processors in a parallel computer; communication between processes running on two computers connected together; communication between two processes running on computers thousands of miles apart. Chapter 3 will study this topic in more detail. However, in this section we want to discuss a few important issues of communication encountered in computer science. In particular, we will only discuss those issues that are encountered in inter-process communication or IPC: the communication between various processes on one computer (or, in some cases, between two computers in a network). Consider the following case. Suppose there are two operators Zoe, working in a travel agency in Zierikzee, and Zachary, working in Zurich, who both want to book a seat on ight 521 for their respective customer. The database for all ight bookings is centrally maintained by
Airlines. As it happens, Zoe and Zachary both make their reservation via a modem at the same time. According to Murphy's law6 , the following will happen. When Zoe requests the seating availability of the plane, this data will be sent to her. She passes it to her customer, who needs a few seconds to decide he wants to have a window or an aisle seat. While he is attempting to make up is mind, Zachary gets the same information from the central database, and his customer then decides she wants to take seat 11a, which is still free. Ditto Zoe's customer! So, Zoe writes back a reservation for seat 11a, and so does Zachary, thus overwriting Zoe's request. Who's going to get the seat? Situations like this are known as race conditions. With a little imagination it is easy to see that this case is also very likely to occur in an operating system, for instance when two processes have simultaneous access to a disk block. When race conditions are not properly taken care of, deadlocks are bound to occur. A deadlock is a condition where process a is waiting for process b while process b is waiting for a. In a situation like this, neither process will ever continue operation: a deadlock. Naturally, deadlocks should always be prevented.
Critical sections How can these race conditions be avoided? A solution to this problem is the introduction of critical sections. A critical section is that part of a program where shared read/write resources are accessed; for example, access to shared memory or shared databases. When it can be guaranteed that at no time two processes are simultaneously in their critical sections, race conditions can be prevented. Throughout the years of computing many ways of implementing critical sections have been invented. Most operating system books list many of these inventions, but we will keep the list down and discuss only two principal methods.
Disabling interrupts. The easiest method, which would work on a single-user single-process
machine: disable interrupts, i.e., do not allow the current process to be interrupted by the scheduler. All computers have an instruction to this eect. However, such a powerful instruction could never be given to a user on a time-sharing system: she would have the machine entirely to herself, and nally the machine would come to a grinding halt! Fortunately, better solutions exist. 6 Murphy's rst law reads: `everything that can go wrong will go wrong.' That's the one we refer to. His second law, also not to be forgotten, maintains, `Everything that can't go wrong, will go wrong nonetheless.'
September 1994
Computer Systemen en Applicaties
2.1. TASKS OF AN OPERATING SYSTEM
23
Semaphores. Semaphores, introduced by E. W. Dijkstra in 1965, are counters which can be incremented and decremented by the instructions up and down, respectively. The down instruction on a semaphore checks if its value is 0. If it is, the process issuing that down
instruction is put to sleep; otherwise, the semaphore is decremented and the process continues. The up instruction, on the other hand, increments a semaphore. If one or more processes were sleeping on that semaphore, one of these processes is woken up and can complete its down instruction. Consider the following example. It consists of a single person (cf. the processor in your computer) who continuously alternates in being a baker and a cookie-monster. There are two processes that have to be executed: the baker's job, in which a cookie is baked and brought to the store; and the cookie-monster's job, in which the cookie has to be bought and subsequently eaten. There are a few shared variables: cookie, store-is-empty, store-is-full, and don't-interrupt. The cookie variable indicates the shelf containing the cookies. The next two variables indicate whether the store is full (no more cookies can be stored) or empty (no more cookies can be bought). Finally, don't-interrupt has to make sure that the cookie monster does not decrement the cookie counter (i.e., by buying a cookie) when the baker is just incrementing it. The program looks like this: process BAKER's-JOB: repeat bake-cookie(cookie); down(store-is-empty); down(don't-interrupt); bring-to-store(cookie); up(don't-interrupt); up(store-is-full); forever
process COOKIE-MONSTER's-JOB: repeat down(store-is-full); down(don't-interrupt); buy(cookie); up(don't-interrupt); up(store-is-empty); eat-cookie(cookie); forever
Note that the down and up instructions may never be interrupted! In an operating system they would be implemented as system calls; these system calls then could be written in such a way that they never be interrupted by the scheduler. However, you must be very, very careful when programming with semaphores. For instance, if the two down's in the baker's and cookie monster's jobs were reversed, deadlocks would be bound to occur (why?).
Interprocess communication Now that we have looked into race conditions between two processes running on one processor, let us shortly look into some communication primitives to broaden the perspective. The above
Computer Systemen en Applicaties
September 1994
24
CHAPTER 2. OPERATING SYSTEMS
solution will only work when the processes sharing resources run on one or more processors which have access to common memory, i.e., a piece of memory they can all write to and read from. When you move this idea to a distributed system in which each processor has its own bit of memory, that solution is not feasible anymore. Somehow, the information must be shared between remote processes.
Message passing. To this end, two communication primitives are introduced:
send(destination, message) and receive(source, message) which are both system calls to the operating system. One process sends a message to another process, and may block until an answer arrives. The receiving process could be serving the rst process, and may block all the time until a message arrives.
Remote procedure call. In some operating systems, interprocess communication is done through remote procedure call or RPC. In RPC, a message is passed by a client process to a server process by calling a procedure called the client stub procedure. This procedure
sends a message to the server stub procedure, which then calls the required procedures within the server to handle the message. Then, a message is sent back, the client stub will receive the message, and the client process will continue operation. Naturally, the nice part about this is that the communication is generally transparent to the client and server. For all they care, the stub procedures can be implemented using semaphores. `Generally,' since some things can go wrong. For instance, the server may crash before the client gets its answer.
2.2 Operating system structure In the previous section some of the more important tasks that an operating system has to perform have been laid out. Assuming that all these tasks can be separately solved, all of these solutions have to be brought together into a single operating system. In this section we will discuss two ways of doing to: the monolithic structure, and the client-server structure.
2.2.1 The monolithic structure An operating system with a monolithic structure has, in fact, no structure at all. All parts are put together into one big chunk of program. This appears to be the most simple choice: make separate source codes for the dierent parts, compile those into object les, and link all object les together into one executable. Any programmer with a bit of experience with designing large programs will immediately recognise that this most obvious choice is also the worst one, if only from the point of view of maintenance, portability, clarity, and so on. Yet it is the structure followed by many operating systems, such as MSDos and, in lesser extent, early versions of Unix. This is because of the computational eciency: when all parts are linked together into one program, they can share data through global variables, instead of having to send messages to and fro. However, with the necessary increasing complexity of operating systems, the price of this overhead trades very well against the price of maintaining monolithic operating systems. Also, the cost of overhead (the operating system runs slower because of the time needed to pass messages between its parts) decreases with increasing computer power. Let us shortly study how a monolithic Unix operating system works. The machine can either be running in user mode or kernel mode (in some operating systems such as Solaris, a third mode real-time mode exists, which has the highest precedence). User processes normally run in user mode. When a user process requests some action from the operating system, such as opening a le or reading a block, the user process issues a system call. So, this system call
September 1994
Computer Systemen en Applicaties
2.2. OPERATING SYSTEM STRUCTURE
25
user program
user level
system call handler
kernel level
memory manager
le system hardware control
hardware level
hardware
Figure 2.4: The execution of a system call in a monolithic operating system. transfers control to the operating system (see gure 2.4), i.e., the program issuing the system call will be put in sleeping state and the operating system in running state. Next, the computer will be set in kernel mode: the process now running will have many more permissions, e.g., to do I/O control, kill processes, and so on. The requested action will be performed by the operating system kernel, which possibly has to wait for I/O devices to complete a read/write operation. When the kernel is nished with handling the request, control is returned to the user process (i.e., it will be put in runnable state). In due time, it will be chosen for execution again by the scheduler.
2.2.2 The client-server structure We have already come across the client-server structure between communicating processes. The idea is very general: there is one server which handles requests of many clients. This structure is also rapidly increasing popularity (or, in fact, already becoming standard) in operating systems. The reason that this structure is only now becoming popular is that the communication overhead (message passing or IPC) that it leads to is only now becoming negligible due to the fast computers that are available nowadays. PC owners who have ever experimented with running MINIX7 on a 8088 or a 80286 processor will quickly learn to appreciate their MSDos operating system, even though it is user-unfriendly beyond comparison. MINIX, albeit an operating system with a crystal-clear organisation, is just too slow for such archaic computer architectures. The idea behind the client-server structure for operating systems is the following. One: put as little as possible code into the kernel. The kernel, which is now reduced to a microkernel, operates as a server to handle requests of various clients. The task that is left to this microkernel is very limited: handle the communication between the processes, and give the primitives necessary to prevent race conditions. Two: migrate as much as possible functionality to stand-alone clients, such as the process server8 , le server, memory manager, etc. The remaining microkernel is left with only one task: 7 MINIX is a Unix-like operating system (BSD) written by A. S. Tanenbaum (Tanenbaum, 1987) for PC's. The reason he wrote it was that, in the 1980's, AT&T became aware of the distributability of Unix, and prohibited the distribution and therefore study of their operating system. Tanenbaum then decided to rewrite a similar product from scratch, partly for teaching purposes, and ended up with a very well-structured product. The popularity of MINIX as an operating system is limited, however, due to the fact that it is rather slow|the communication overhead, while threads (see below) are not available, neck it when too many processes are active. 8 In this case the word `server' has another connotation than the same word when used for the client-server
Computer Systemen en Applicaties
September 1994
26
CHAPTER 2. OPERATING SYSTEMS
handle the communication between the various clients. As a very neat example of a client-server based operating system we will consider is MINIX. Although MINIX is not actually used on many computers, its clear structure aids in the understanding of how operating systems work. The structure of the MINIX operating system is build up out of four levels, as depicted in gure 2.5. At the lowest level, the communication between user user user : : : memory le manager system clock disk camera : : : process management and message passing
Figure 2.5: The client-server structure of the MINIX operating system.
The micro-kernel only performs process management and handles the communication (passing of messages) between the memory manager, le system, etc. I.e., the kernel serves the other processors.
the processes is taken care of, and traps and interrupts (explained below) are caught. This lowest level is very similar to the microkernel discussed before, except that process management is also done at this level and not by a separate client. At the second lowest level are the I/O device drivers. These processes are all completely stand-alone and communicate with each other by message passing through the microkernel. Layer 3, containing (among others) the memory manager and le system, presents the user with system calls. For instance, all system calls for memory management (e.g., fork for starting a new process or brk for changing the amount of memory that is reserved for a process to store its data) are calls to functions in the memory manager, which will subsequently send messages to the lower levels. At layer 4 the user processes are running. In general, user processes have access to the kernel in three dierent ways:
interrupt an interrupt, as we saw before, is a condition that is not in uenced by the running program but rather by something external, e.g., a disk, or the clock. Clock interrupts are used, among other things, for the scheduler to choose when to run a next process. As another interrupt example, assume a program A running while the disk hardware has completed the transfer of a block of data. The disk controller chip will generate an interrupt. The processor will then stop executing program A and transfer execution to the disk interrupt handler. When this program is nished, control will be transferred back to A. So, an interrupt can be seen as a transparent procedure call, instantiated by external in uences.
trap a trap is similar to an interrupt, but is directly generated by the running program itself.
There are two kind of traps: those resulting from an error of some kind, and those resulting from a system call, e.g., reading from a le. This system call makes the processor execute a special instruction called the trap instruction, also known as a kernel call. This trap serves to transfer control from a user-program to the operating system's kernel, and put the machine in kernel mode. The former, error-generated, traps result for instance when a division by 0 occurs, or a oating point over ow, an illegal instruction, or a reference to a part of memory where a program is not allowed to read or write.
model; admitted, very unfortunate
:::
September 1994
Computer Systemen en Applicaties
2.2. OPERATING SYSTEM STRUCTURE
27
Example 2 Naturally, the above story is less than half of the real story. Though telling the
full story, which would of course be dependent on the speci c processor chip that is being used, would be going too far, the following principle applies in general. What happens exactly when a trap or interrupt occurs? The dierence between the two is, at the lowest level, minimal. When, say, an interrupt is generated by the oppy disk's controller chip (the FDC) (but, the story would be very similar for another kind of interrupt or a trap), the processor's interrupt line will be activated by that chip. The FDC will put a memory address on the bus. The processor will read this address, and start executing the piece of software located at this address: the interrupt handler associated with the oppy disk. This interrupt handler will preform the required task, e.g., switch o the disk's motor. When it's nished it will execute the `return from interrupt' instruction, and the previously running user process can resume execution. Traps are so similar to interrupts, that `trap handlers' do not exist but are just called `interrupt handlers.' Phew. You got that? Remember, it's only slightly more than the half truth, and someone surely might punch a hole here and there, but it's in principle true. If you want to know more, consult any book on computer architecture and get confused more.
A foremost advantage of the client-server operating system is extendibility. First, it is trivial to replace, for instance, the le server with a new or advanced release. Ditto, adding a new server (such as the often mentioned database manager) is very easy. Finally, the kernel itself can be replaced, without having to adapt the task handlers and managers. In other words, those task handlers and managers can be unaware of the exact implementation of the underlying kernel. This also allows considerable independence of the computer architecture. This is, of course, a tremendous advantage: for instance, the underlying architecture may be a distributed computer, all running a copy of the microkernel which is heavily tuned towards communication, and each running the le system, or the database manager, or what-not. This is shown in gure 2.6. computer #1
le system
kernel
computer #2
computer #3
computer #4
memory manager database server : : : kernel kernel
kernel ethernet
Figure 2.6: The client-server operating system lends itself perfectly for a transparently distributed architecture.
Computer Systemen en Applicaties
September 1994
28
CHAPTER 2. OPERATING SYSTEMS
September 1994
Computer Systemen en Applicaties
Chapter 3
Networks More and more, communication between computers is becoming an important issue. While electronic mail was almost exclusively used by researchers in universities, in the 90's electronic mail and hypertext are becoming so widespread that many individuals can get an Internet connection for only a few $$ per month, and exchange electronic mail, read and post news, and play virtual tourist over the World-Wide-Web. But how is this communication realised? The simplest kind of communication, well-known to many home-computer owners, is via a modem. `Modem' is an acronym for modulator{ demodulator; it is a machine that translates digital signals to analog audio (and back) which can be transmitted over a telephone line. This form of communication, while often sucient for a single user, is very slow and inecient for the following reasons. First, most telephone systems have a frequency range of only 4000Hz, and can be used for data transfer of up to 9600bps (bits per second) or, sometimes, 19,200bps. This is, of course, terribly slow. Second problem is that telephone companies use circuit switching: when two users connect, there is a physical wire which connects them. Most of the time, however, the whole bandwidth of the wire is unused since no data needs to be transferred; it would be much more ecient (and cheap) to split the data in small packets which are then transmitted over the telephone lines. The path followed by two packets from A to B need then not always be the same, but can be routed in such that congestions are avoided. Packet switching is therefore much more ecient and cheap for data transferral1. So, how do packet switching networks work? This, and the problems resulting from such networks, are investigated in this chapter. As customary when things get tough, we are going to tackle this topic by describing a layered view on computer communication. Communication, like computing itself, can be discussed at various levels. At the lowest level is the communicating hardware itself; the copper wires, and all of that. At the highest level are the applications (such as email, mosaic, distributed le servers, etc.), the only level that most users will experience. This chapter is organised as follows. The introduction de nes what a network is and describes the goals of computer networks. It also discusses local vs. wide area networks, and gives a few examples. Things are getting serious in section 3.2 which discusses the layered ISO/OSI network architecture, and shortly visits every layer. Much of the text of this chapter was derived from (Tanenbaum, 1981). Furthermore, (Black, 1991) added to detailed information, while (Krol, 1992) provided information on the Internet. But, the largest source of information was, of course, the Internet itself. Most notably, the Computational Science Education Project (CSEP)2 edited by Verena Umar was a valuable 1 Nowadays telephone companies start to introduce packet switching for voice connections, too; ATM networks (explained below) are fast enough and can thus be used much more eciently. 2 http://csep1.phy.ornl.gov/csep.html
29
30
CHAPTER 3. NETWORKS
source of information.
3.1 Introduction We follow (Tanenbaum, 1981) in the de nition of computer networks: A computer network is an interconnected collection of autonomous computers. The stress of this de nition is on autonomous: all computers in a computer network are essentially similar; no hierarchy is present as in a client-server setup.
3.1.1 Goals of computer networks
Before we want to go into the `how' of computer networks, it is important to clearly state the goals of this exercise. Why do we need computer networks? We cite a few reasons: 1. end the `tyranny of geography'; 2. reliability (graceful degradation); 3. price of communication vs. computation.
Ad 1. This is a very obvious reason: many companies are geographically widely distributed, yet have to share data between the various locations. Thus, a program should be able to access data independent of the location of its `physical' storage. A novel use of this is `tele-working': employees of a company need not always physically go to their oce, but work at home on their own computer and connect to the main computer via a network, thus saving both money and trac queue irritation.
Ad 2. For critical services such as banking or industrial process control, a main computer
which is out of operation for an hour can be devastating. If the load of such a computer could be taken over by a remote machine, all operation can simply continue as if the local computer were operable.
Ad 3. Finally, computing networking is much cheaper than buying large computer power. For
those who need very powerful computers to do large calculations or access huge databases, it is much cheaper to buy a small computer with which to access a large computer or database over a network than buying this large computer oneself. An example of the use of a computer network is shown in the box on the next page; in this case, a network of computers was used to crack a Public Key Encryption method known as RSA, named after its inventors Ron Rivest, Adi Shamir, and Leonard Adleman3 .
3.1.2 Building blocks of a network
What does a network consist of? At this level, the terminology is not really agreed upon. There exist many words for the same thing; we'll try to name a few. A network consists of a number of hosts, which are computers running user processes. These hosts are connected via a communication subnet (aka subnet, transport system, transmission 3 RSA works as
follows: take two large primes and and nd their product = ; is called the modulus. Choose a number which is relative prime to ( ? 1)( ? 1), and nd its inverse , mod ( ? 1)( ? 1), which means that = 1mod( ? 1)( ? 1). And are called the public and private exponents, respectively. The public key is the pair ( ); the private key is . The factors and must be kept secret or destroyed. p
e < n
ed
p
n; e
September 1994
q
p
q
e
n
q
pq
d
n
p
q
d
d
p
q
Computer Systemen en Applicaties
3.1. INTRODUCTION
31
From:
[email protected] (Derek Atkins) Subject: RSA{129 Date: 27 Apr 1994 04:06:25 GMT Organization: Massachusetts Institute of Technology (MIT)
We are happy to announce that
RSA{129 = 1143816257578888676692357799761466120102182967212423625625618429| |35706935245733897830597123563958705058989075147599290026879543541 = 3490529510847650949147849619903898133417764638493387843990820577 32769132993266709549961988190834461413177642967992942539798288533 The encoded message published was 968696137546220614771409222543558829057599911245743198746951209308162| |98225145708356931476622883989628013391990551829945157815154 This number came from an RSA encryption of the `secret' message using the public exponent 9007. When decrypted with the `secret' exponent 106698614368578024442868771328920154780709906633937862801226224496631| |063125911774470873340168597462306553968544513277109053606095 this becomes 200805001301070903002315180419000118050019172105011309190800151919090618010705: Using the decoding scheme 01=A, 02=B, : : :, 26=Z, and 00 a space between words, the decoded message reads:
THE MAGIC WORDS ARE SQUEAMISH OSSIFRAGE To nd the factorization of RSA{129, we used the double large prime variation of the multiple polynomial quadratic sieve factoring method. The sieving step took approximately 5000 MIPS years, and was carried out in 8 months by about 600 volunteers from more than 20 countries, on all continents except Antarctica. Combining the partial relations produced a sparse matrix of 569466 rows and 524338 columns. This matrix was reduced to a dense matrix of 188614 rows and 188160 columns using structured Gaussian elimination. Ordinary Gaussian elimination on this matrix, consisting of 35489610240 bits (4.13 gigabyte), took 45 hours on a 16K MasPar MP{1 massively parallel computer. The rst three dependencies all turned out to be `unlucky' and produced the trivial factor RSA{129. The fourth dependency produced the above factorization. We would like to thank everyone who contributed their time and eort to this project. Without your help this would not have been possible. Derek Atkins Michael Gra Arjen Lenstra Paul Leyland
Computer Systemen en Applicaties
September 1994
32
CHAPTER 3. NETWORKS
system). The subnet consists of two components: switching elements and transmission lines (circuits, channels). The switching elements (called IMP's for Interface Message Processors, or communcation computers, packet switches, nodes, data switching exchanges) are, in fact, communication-specialised computers.
3.1.3 Types of networks
In computer networks we distinguish local area networks or LAN's and long haul networks, also known as wide area networks or WAN's.
Local area networks LAN's are networks of computers within one organisation, e.g., one university campus. They usually consist of ethernets, a system developed by Xerox PARC in 1970. The improvement that the ethernet brought over existing systems was the possibility of a connected node to detect whether another node is transmitting, and thus delay sending its own data until the ethernet is free. This was a great improvement over the Aloha system used at the University of Hawaii, in which computers broadcast their messages over radio when they had data available instead of waiting for other transmitters to nish, thus leading to many collisions.
Figure 3.1: A local area network, consisting of an ethernet connecting various devices together and with the outside world. Figure 3.1 shows various devices connected together via an ethernet. A gateway connects the ethernet to the outside world (WAN). Ethernets, which consist of `twisted pair' cables of several meters up to hundreds of meters long, have a typical transfer rate of 10Mbits/s. That may seem as much, but as the size of computers, programs, and data les increases rapidly, LAN's tend to get clogged. The problem with ethernet is that, since one cable connects all machines together, only one machine may be transmitting data at a time. Though this situation is improved by separating several ethernets and connecting them with routers, interference is still a problem. A new technique called ATM for Asynchronous Transfer Mode with a typical transfer rate of 10Gbit/s (i.e., 1000 times
September 1994
Computer Systemen en Applicaties
3.2. THE ISO/OSI MODEL
33
The command traceroute can be used to investigate how a message reaches another machine. For instance, traceroute sun.com shows which machines a message may go through from your machine to sun.com. Or test this one: UNIX> traceroute fwi.uva.nl traceroute to fwi.uva.nl (146.50.4.20), 30 hops max, 40 byte packets 1 fwi-gw-edu (146.50.7.16) 15 ms 3 ms 2 ms 2 fwi-gw.fwi.uva.nl (146.50.6.1) 155 ms 27 ms 269 ms 3 mail.fwi.uva.nl (146.50.4.20) 294 ms 169 ms 216 ms
This list shows a few things: rst, the IP-addresses (`Internet-Protocol') of the machines: the machine mail.fwi.uva.nl has the address 146.50.4.20; both the name and the IPaddress point to the same thing. Second, while we requested fwi.uva.nl we in fact got mail.fwi.uva.nl. Therefore, the former must be an alias for the latter. (You can verify this with the program /usr/sbin/nslookup and giving it as argument fwi.uva.nl or mail.fwi.uva.nl|you'll get the same IP-address in both cases.) When you run traceroute twice, it need not follow the same route! This is a result of having a packet switching network. as fast as ethernet) solves this problem by connecting computers point-to-point, and is therefore rapidly gaining in popularity.
Wide area networks
Wide area networks connect several LAN's together. The largest WAN is known as the Internet. The Internet (Krol, 1992) is a connection of several smaller, national networks such as SURFnet, NSFNET, BITNET, ARPANET, etc. The Internet is not governed by anyone; there is no single authority gure for the Internet as a whole. Instead, there are a few organisations for setting standards, allocate resources, etc.; all of these organisations are run by volunteers. And: : :who pays for Internet? No-one directly. Instead, NSF pays for NSFNET, NASA pays for the NASA Science Internet, etc. The connection of these sub-networks is regulated by those networks themselves. Then, those parties connected to the networks pay a fee for their connection (e.g., those on EUnet pay EUnet for their connection); EUnet again is a part of Internet.
The domain system structure Each computer within the Internet has its own name. For instance, the le server of the education group at the faculty of Mathematics and Informatics of the University of Amsterdam is gene.fwi.uva.nl. In this case, gene is the name of the actual machine, which resides in the domain fwi|the domain of the faculty of Mathematics and Informatics. fwi is a portion of the domain of the university of Amsterdam uva, which again lies in the domain of the Netherlands, nl. Each group can decide what is in its domain; e.g., the fwi group may autonomously decide the names of its sub-domains or, in this case, machine names. The top level domains were created by at when the domain system was invented. It doesn't take many guesses as to where this system was invented (table 3.1).
3.2 The ISO/OSI model One thing must be made sure: that all communicating computers speak the same language when they transmit data to each other. That is to say, there has to be an agreed communication protocol between the connected computers.
Computer Systemen en Applicaties
September 1994
34
CHAPTER 3. NETWORKS Table 3.1: Names of the highest-level Internet domains.
domain
edu com gov mil org net 2-letter code
usage
US educational organisation US commercial organisation US government organisation US military other US organisations network resources non-US country codes (e.g., nl=Netherlands)
Table 3.2: The ISO/OSI reference model for networking. level 7: level 6: level 5: level 4: level 3: level 2: level 1:
application layer presentation layer session layer transport layer network layer data link layer physical link layer
One standardisation for a communication protocol is the OSI communication model. Table 3.2 shows the ISO/OSI reference model for computer communication. This layered structure, known as OSI (short for Open Systems Interconnection) was proposed by ISO (for International Standards Organization) in the 1970's as a rst step towards standardisation of the various protocols. The principles that ISO applied to get this layered structure were the following (Zimmerman, 1980): 1. a layer should be created where a dierent level of abstraction is needed; 2. each layer should perform a well-de ned function; 3. the function of each layer should be chosen such as to de ne internationally standardised protocols; 4. the layer boundaries should be thus chosen that information ow across these boundaries is minimised; 5. the number of layers should be not too large to result in an un exible system, yet large enough to hold the distinct functions, one in each layer. So: : :what does that layered communication structure mean? How do those things communicate? The following analogy might help. Imagine two biologists, one in Kenya and one in Indonesia, who want to communicate about their ndings. The scientists speak no language besides their mother tongue; so, both hire translators, each of whom contacts an engineer. One evening, after a good Safari dinner, biologist A decides to relate her love for oryctolagus cuniculus to her colleague. She thereto passes the message `Ninapenda sungura' (in Swahili) across the 2/3 interface to her translator. Depending on the protocol used in layer 2, the translator may then translate this message to `I like rabbits' or `J'aime les lapins' or `Ik houd van konijnen.' After translation, the translator gives the resulting message to his engineer for transmission,
September 1994
Computer Systemen en Applicaties
3.2. THE ISO/OSI MODEL
35
by fax, telephone, computer network, or whatever, depending on what the two engineers have agreed upon in advance (layer 1 protocol). After the message has arrived at the second translator, he then translates this message into Indonesian `Saya suka kelinci' and passes it across the 2/3 interface to biologist B. Note that each protocol is completely independent of the other ones as long as the interfaces are not changed. The translators can switch from French to Dutch at will, provided that they both agree, and neither changes his interface with either layer 1 or layer 3. That should be clear enough. We are still faced with the details, however. Therefore we will shortly discuss the layers of the ISO/OSI model in the following sections, starting at the uppermost layer and working our way down. At each level we also mark the type of unit that is exchanged at this level.
3.2.1 The application layer
Exchanged unit: message.
The top-level of the OSI model, the communcation level that the user's see, is called the application layer: electronic mail, shared les systems, etc. In the 1990's, the application layer got more and more attention and expanded rapidly. This was caused by the fact that the underlying layers matured considerably, and the possibilities of a world-wide network of computers became slowly apparent. The distinction between the application and presentation layers is very important: it distinguishes every-day users from network maintainer. At the application layer, the user sees her network-shared data in a cooked form: electronic mail (email) messages are deposited in the user's mailbox and can be accessed with the mail4 program. World-wide-web data can be accessed with the Mosaic program. News can be read with nn or xrn. Files can be transferred from one machine to another with the ftp (` le transfer protocol') program. And so on and so forth. The best way to nd out the workings of the network at this level is to try it yourself|the 90's are, after all, the years of the virtual tourist.
3.2.2 The presentation layer
Exchanged unit: message. The presentation layer only performs some transformation on the data, such as compression, encryption, etc. Especially the latter point is of utmost importance. For instance, when two computers communicate via a satellite, a third party can easily listen in on the conversation by pointing his satellite dish in the right direction. Problems of user authentication as well as encryption, as discussed in the previous chapter, must be tackled.
3.2.3 The session and transport layers
Exchanged unit: message.
The session layer, which is absent in many implementations because of its few tasks which are easily integrated into other layers, does not do much besides taking care of crash recovery. When a connection between two machines has been established at the transport layer level, and one of the hosts through which the communication takes place crashes, it is up to the session layer to re-establish the connection through other hosts by instructing the transport layer to make a new connection, while the higher layers do not notice any change in service (except, perhaps, a short delay). 4 Or xmail
or elm or mh or mailtool or
:::
Computer Systemen en Applicaties
September 1994
36
CHAPTER 3. NETWORKS
But, most important is the transport layer. The transport layer takes care that communication is end-to-end, i.e., that machine A in a network can communicate with machine B in the same network, even though they are not physically connected. Below the transport layer, all communication goes from machine to adjacent (physically connected) machine; at the transport layer level, every machine is virtually connected with every other machine. The software for the transport, and higher, layers only runs on the host machine; the lower layers run both on the host and the IMP's, since it is that level only that IMP's are bothered with. The complexity of the transport layer depends on the service that the subnet provides. When the subnet is able to delived messages in order from sender to receiver, without errors, loss of packets, or packet duplication, the transport level can remain relatively simple. However, often the subnet only provides datagram service, such that the transport level must make sure that the right packets arrive at the destination in the right order. An extra problem is that the network, over which the transport layer communcates, has a storage capacity: when a packet is sent from host A to B, it may arrive a few seconds later, or 30 seconds, or even longer, or not at all.
3.2.4 The network layer Exchanged unit: packet.
The network layer controls the operation of the communication subnet. It determines how packets are routed from the sender to the receiver, and determines the protocol between the host and the IMP. In order to determine the eciency of an ethernet, the following calculation might help. Let q be the number of systems that is connected to a network. The probability P that one system gains exclusive access to the network is calculated as: q systems have 1=q probability of gaining access at a time when the other q ? 1 systems, each with probability 1 ? 1=q, do not access the network. In equation form, q?1 1 q?1 = 1? q : P = q 1q 1 ? 1q
With this de nition we can determine the mean time a host must wait before it accesses the network. This time is expressed in time units called the slot time: the amount of time a system waits before retransmitting when it nds the network busy. The probability of waiting 0 slots is P ; the probability of waiting 1 slot is P (1 ? P ). In general, the probability of waiting i slots equals p = P (1 ? P )i : The mean of the probability distribution over all i gives the expected number of slots a system will have to wait before acquiring exclusive access to the network:
W = 1 ?P P : Finally, the eciency of the network is the percentage of time that there are packets on the net. If the packet size is s bits, C is the peak capacity (bits per second) of the network, and T the length of a time slot. Then the eciency of the ethernet equals the time to send a packet as a fraction of the total time: E = WTs=C + s=C
September 1994
Computer Systemen en Applicaties
3.2. THE ISO/OSI MODEL
37
3.2.5 The data link layer Exchanged unit: frame.
One of the most important layers is the data link layer. Starting from the underlying physical layer, it transforms that into a service which is free from transmission errors. It does this by splitting the message up in data frames, sending those frames to the other host, and waiting for acknowledgement frames. The receiving host has to translate the incoming stream of bits into frames and send acknowledgement frames back. Of course, both data and acknowledgement frames can be damaged or even completely lost. The protocol at this level provides a mechanism to recover from damaged or lost frames, by asking for damaged frames to be resent, and resending lost frames when an acknowledgement did not arrive within a certain amount of time.
3.2.6 The physical layer
Exchanged unit: bit. Finally we come to the copper wires themselves. There is lots to be calculated at this level, in order to get the best of the network possible. At this level the problem is how to get one bit of data from one end to another, and get it over as reliably and fast as possible. A bottleneck in this problem is formed by the monopolies that many telephone companies in many countries have on selling physical connections; a monopoly that is, currently, being broken bye-and-bye. For a good description of the problems at this level, check out (Tanenbaum, 1981).
3.2.7 Summing up: The reality
Now, if you understood all of the above, then you'll be able to understand gure 3.2, too. If not, go back to the beginning of this chapter and re-read. application
application
presentation
presentation
session
session
transport
transport
network
the communication subnet network network
data link
data link
data link
data link
physical
physical IMP
physical IMP
physical
host A
network
host B
Figure 3.2: Two hosts A and B communicating over an OSI network. Virtual links are indicated by dotted arrows.
The above model is all very well and nice, but, unfortunately, the reality is slightly dierent. While OSI was being developed, the Internet was already in use|widely in use. How could they do that, without such a wonderful protocol??? The answer is, there is already a protocol
Computer Systemen en Applicaties
September 1994
38
CHAPTER 3. NETWORKS
in use for many years, known as TCP/IP. It in fact consists of two protocols on top of each other: a transport layer protocol called TCP for Transmission Control Protocol and a subnet protocol called IP for Internet Protocol. The Internet Protocol is very simple: it sends packets of data from one host to another, not taking care of lost packets, duplicated packets, or damaged packets. Each packet, which has a size of up to 1500 bytes, is prepended with an IP header. This header carries the required information to reach the destination machine from the sender via hosts and IMP's. On top of IP, TCP is a transport layer protocol which assumes a lossy protocol underneath it: i.e., it accepts data from the host, breaks it up into packets, takes care of the right sequencing of those packets, and deals with loss and damage of sent packets. In order to do this, each packet gets a header before it is handed over to the IP level. The total header for each packet is at least 40 bytes long; for a maximum size packet of 1500 bytes, this means about 3% communication overhead. It must be clear from the above that, although OSI is a very neat protocol, it is rather complex and wasteful. Nevertheless it is being pushed by the United States government, and nowadays many modern communication computers both run TCP/IP and OSI. It may be that the Internet will gradually switch in the years to come; on the other hand, since OSI does not oer considerably more than TCP/IP the impetus to switch is rather low.
September 1994
Computer Systemen en Applicaties
Chapter 4
Computer architecture What do you need to know of a computer in order to use it? Arguably very little|depending on what kind of computer you have, of course, but also depending on what you want to do with it. For instance, a secretary typing letter after letter for his boss may not need to be told about whether he has sRAM or dRAM in his machine, and whether the processor is a vectorised CISC, nor should it bother him too much that his data bus is half as wide as his address bus: : : Yet, a researcher writing programs for scienti c calculations, or that person who wants to buy ten computers in order to set up a department investigating digitised images of clouds for neural network classi cation should take heed that she not be sent into the lion's den uninitiated. As the previous chapters, this chapter tries both to bring you up-to-date with current technology by taking a practical view, but not in such a way that you'd be left at your own devices two years from now with outdated information|hence, also theoretical issues will be touched upon. So, after the introduction we will talk about the bus (section 4.2), then about memory (4.3) and processors (4.4), to conclude with standard I/O hardware (section 4.5). The information sources used for this chapter mainly consisted of (Hwang, 1993), (Tanenbaum, 1984), and CSEP.
4.1 Generations Almost any text on computer architecture starts with a historical perspective of computers. So why be dierent here? Consider table 4.1. This table shows the traditional summation of past computer generations. The rst generation: computers were huge machines consuming large amount of energy, in order to do|compared to current-day technology, but not compared to the computers of generation 0|tiny calculations. They consisted of relays1 and vacuum tubes, and, situated in the basements of large oce buildings, eliminated the need for a central heating system. Also, considering the MTBF (Mean Time Between Failure) of relays and vacuum tubes, there was a smaller chance that the machine operated than not; repair was constantly underway. 1 According
to an on-line Webster dictionary (not the Merriam-Webster), a relay is an electromagnetic device for remote or automatic control actuated by variation in conditions of an electric circuit and operating in turn other devices (as switches) in the same or a dierent circuit. And, it says, a vacuum tube is an electron tube evacuated to a high degree of vacuum. But that doesn't help much, I guess. Relays consist of a coil wound around a metal core, acting as an electro-magnet. When the magnet is switched on, it attracts a piece of metal connected to a switch; when it is o, the switch returns to the rest position. In fact, this is a bit like what a transistor does. Most relays have a size in the order of a few cm3 . A vacuum tube is the old version of a chip, with anodes (positive poles) and cathodes (negative poles) ring electrons at each other. Depending on the (potential of the) shield(s) between those poles, electrons will or will not ow.
39
40
CHAPTER 4. COMPUTER ARCHITECTURE Table 4.1: The history of computer architecture generations.
generation technology
software
MSI (medium-scale integration); pipelining; cache fourth VLSI (very large scale (1975{90) integration), vectorisation, multicomputers fth ULSI (ultra-large scale (1991{now) integration)
time-sharing, processing
rst (1945{54) second (1955{64) third (1965{74)
examples
vacuum tubes and relay machine language (i.e., ENIAC memories 1's and 0's) transistors FORTRAN IBM 7090 parallel PDP{8, IBM 370
multiprocessor OS; compiler environments for parallel processing massively parallel processing
Sun SPARC, Cray X-MP Cray MPP, CM2
Nevertheless, those machines were computers, and they were used for useful things2. In those days programming was not very dierent from handling a switchboard: switches represented bits (having a 0 or 1 value) and would have to be set right for the computer to operate. Indeed, this idea of programming is not very appealing to the current day computer users. Things got a lot better in the second generation: the unreliable relays and vacuum tubes were replaced by transistors, the machines got more powerful (i.e., more memory and faster computation), as well as cheaper. Conceptually, too, much was changed: the rst compilers (for FORTRAN) ran successfully. Yet programming and compilation was not as easy as it is nowadays. Consider a programmer in the late 1950's. In order to run a program, this person would have to go through the following steps. First her FORTRAN program had to be coded in. To do this, the programmer would type her program in on a cardwriter, one line at a time. When a line was nished, the corresponding holes would be punched in a card of the size of an airline ticket, and the card was spit out. This process was continued up to the end of the program. Next, the programmer would take his stack of cards, careful enough so as not to drop them, to the computer room which she previously reserved for an hour or two. First the FORTRAN compiler cards were read into the computer; next, the user's program. Some compilers required more than one pass over the program; i.e., it must be fed in more than once. If no syntax errors were found by the compiler, the computer would spit out more cards representing the compiled code. These cards were then carried to the card reader, and the computer could commence execution. Of course, the program would contain an error in many cases; the programmer then had to study the memory contents of the computer to see what went wrong, and after having solved the problem a few days later, come back with the stack of cards (with a few cards replaced or swapped) and try again. Naturally, the above scheme needed improvement. By automating the controller's work via an operating system, much of the work load was eased and processing was much faster. This operating system, which was running in the computer at all times, was accessed by the programmer by issuing the computer with special control cards. The operating system, compiler, 2 As an example, towards the end of
the Second World War the ENIAC (an acronym for Electronic Numerical Integrator and Computer) has been used by allied forces to decipher enemy communication codes. With success, a feat that de nitely in uenced the lives of many.
September 1994
Computer Systemen en Applicaties
4.1. GENERATIONS
41
etc. were all available to the computer on magnetic or ticker tape3 . Progress went fast once users were able to directly connect via a keyboard with a computer and run their commands on it, while getting the results back on their teletype or, later, screen. These time-sharing systems usually were mainframe, or later, mini-computers4. We have landed in the third generation, where transistors are replaced by IC's (Integrated Circuits), commonly known as chips. The scale of integration here is Medium; MSI chips contain in the order of 10 to 100 gates. After MSI chips came the LSI (Large Scale Integrated) ones: approximately 100 to 100,000 gates per chip. They were used in microcomputers which were introduced in the late 1970's with the PET and the TRS{805 . These machines, with 8-bit processors (16-bit data lines), running at a clock-speed of a few MHz, and with a typical memory size of 16{64 Kb, were so cheap (not very dierent from the price of current-day PC's) that they could be (and were) purchased by personal users. (Did those facts bae you? No worry, we'll explain them bye and bye.) In the next years, mini and microcomputers rapidly grew close to each other. The fourth generation, with VLSI chips of 100,000{1,000,000 gates, showed the workstation as well as powerful personal computer, which put the mini-computer out of business. In the 90's the last super-computers, such as the Cray X-MP6, were built|their price/performance ratio is rather silly compared to workstations. The end of this trend has been foreseen a long time ago. Nevertheless, it seems to continue. Physical characteristics of material and electrons indeed pose a not-so-distant upper limit; when that limit is reached, only parallel processing can save the day.
4.1.1 Architecture of a computer Just to let you know what we're talking about, gure 4.1 shows the inside of a 1994 workstation. Some of the parts of that picture are labeled. Stare at it for a few seconds before you proceed reading. A schematic drawing makes things clearer. A few components connected by a bus are shown in gure 4.2. In order to orchestrate the dierent components, a small piece of circuitry generates the clock signal. This is a clock pulse with a width between 1s and 10ns, i.e., a frequency between 1 and 100 MHz (one Hz is one oscillation per second). The beginning of a clock pulse is usually taken as the signal to start a new action: fetch data from memory, decode an instruction, ask for the bus, etc. Naturally, the faster the clock, the faster the computer (which does not mean that a computer with a 16MHz 80486DX processor is slower than a 20MHz 80386SX processor; since the processor is smarter. You can only compare clock speeds when used in the same computer architecture). 3 Watch the motion picture The Sting (1973) with Robert Redford and Paul Newman to see a ticker tape in action. The Hudsucker Proxy (1994; also with Paul Newman) also features one, but it doesn't play such a prominent role in that movie. 4 The rst mini-computer, the DEC PDP{7, was introduced in 1970. Back then, a mini-computer was de ned as a computer costing less than $50,000. 5 The birth of the TRS{80 is an anecdote worth telling. Apparently, the rst machine (TRS{80 Level 1) had been designed by a Tandy salesperson, who pushed electronic chips and switches from his store to assemble the computer in his garage at home. Naturally he was found out, and was left with two choices by Tandy/Radio Shack corporation: be liable for prosecution, or give the rights to the computer to the company. He did the latter, leading to the sales of a rather successful microcomputer. Although the TRS{80 was designed and marketed about a year later than the PET, due to the fact that Tandy already had a world-wide chain of stores it could market its machine much better. This led to the popularity of that machine.
Computer Systemen en Applicaties
September 1994
42
CHAPTER 4. COMPUTER ARCHITECTURE
RAM
CPU
cache
ethernet connection
Figure 4.1: A picture of the inside of a 1994 workstation.
CPU
memory
bus
I/O device
I/O device
I/O device
Figure 4.2: Schematic drawing of (part of) the inside of a computer.
September 1994
Computer Systemen en Applicaties
4.2. THE BUS
43
4.2 The bus The bus is that part of the computer which connects the various parts such as the I/O controllers, the processor(s), and the memory together. Typically, the bus consists of just a bunch of wires in your computer. The bus consists of three parts: the data bus, the address bus, and the control bus. Typical use of the bus would be as follows. When the processor needs the contents of a memory address A it will rst put A on the address bus. Next, it will set the control line READ on; this is an indication for the memory that the contents of the memory location indicated by the value on the address bus is requested. While the memory is working on getting the data out, it switches on the WAIT control line; when that data is obtained it is put on the data bus. Each line of the bus can transfer one bit. A bit is a piece of memory which can have value 0 (false, o) or 1 (true, on), and thus represents two states. Two bits can represent 2 2 = 4 states; n bits can represent 2n states. Although four bits are known as a nibble, this term has gone out of fashion since no computer works in quantities of four bits anymore; the smallest data size that is usually operated on is a byte which is 8 bits wide. The smallest addressable memory element is known as a word, the size of which depends on the computer architecture. Word sizes may have any value with typical values being 8, 16, 32, or 64 bits. The width (number of lines) of the address bus indicates the maximum amount of memory that can be addressed. For instance, for an address bus of width 20, 220 = 1;048;567 dierent positions can be addressed; i.e., the addressable memory cannot be larger than 1M ( 1 1024 1024) words. When the word size is 16 bits, that particular processor can address up to 2Mbytes of memory, and not more. Although a bus sounds like a rather simple device, the length of the bus is something that has to be taken into account. With electrical currents owing at an approximate speed of 100m/s or 1m/10ns, limits on wire length have to be taken into consideration. With the emergence of ATM6 networking, traditional buses are just as fast as such a network. Therefore such a bus can be replaced by an ATM network, leading to computer architectures which are completely modular. 6 In the previous chapter ATM were
mentioned but not fully explained. Let me take this second chance. Why are ATM networks so fast compared to ethernets? Well, an ethernet has a bus architecture: it consists of a single wire to which all computers are connected. In fact, ethernet is a circuit switching network: when one machine is talking to another, the rest has to shut up and wait till the line (after all, there's only one `line') is free. With additional problems of collisions when two machines want to transmit at the same time. Most ethernets, over which packets of 1530 bytes are transmitted, have a communication bandwidth of 10 Mbps (bits per second); an ethernet is never faster than 100 Mbps. ATM networks are point-to-point. An ATM network has a so-called star topology: the centre of the network is the ATM switch, while every computer in the network has a direct connection to this ATM switch. Packets are small: 48 bytes long. When computer A wants to send a message to computer B, it sends it to the ATM switch which then forwards it to B. Since packets are so small, it is possible to design switches which can immediately forward incoming packets to the destination without having to store them|therefore memory is not necessary, thus making the switch aordable (memory is always the most expensive part of a computer). ATM networks come at 34 Mbps (an IBM standard) and 155 Mbps (often used) for copper communication channels, and 622 Mbps, 2 Gbps, and 4 Gbps for glass bre. ATM networks are not only going to be used for computer communication, but also voice transfer. ATM (short for Asynchronous Transfer Mode) is therefore also known as Broadband ISDN or BISDN, where ISDN means Integrated Services Digital Network. It is therefore also supported by telephone companies around the world, and has a real future. It can be used for voice transfer since its bandwidth is guaranteed: collisions do not occur! There are still some problems remaining with ATM. First, in the original setup it is assumed that packet-loss does not occur. This is, unfortunately, ction; therefore a protocol such as TCP/IP is necessary. You can imagine, however, that TCP/IP with 48-byte packets requires much overhead in order to disassemble messages in 48-byte packets before they are sent, and reassemble them at the destination computer. Secondly, what will happen when the channel from computer A to the ATM switch is faster than the channel from the switch to B?? Why are ATM packets 48 bytes long? Because 48 is the average of 32 and 64. Those two standards were proposed by two powerful parties in the ATM standardisation committee, so a compromise was found :::
Computer Systemen en Applicaties
September 1994
44
CHAPTER 4. COMPUTER ARCHITECTURE
4.3 Memory When referring to memory in the above text, the authors meant the part of memory known as the main memory, which is often also referred to as RAM or Random Access Memory. The increase of the amount of RAM that a typical small computer has goes hand-in-hand with the decrease in the price of that memory. In the late 1970's and early 1980's, microcomputers would typically have 16{64Kb of RAM; nowadays, 16Mb is rather little for many applications. Ditto for secondary storage: typical backup storage for a 1980 microcomputer was a 64Kb oppy disk; now it's a 200Mb (or several Gb) harddisk. Apart from RAM, computers usually also have a section of memory called the ROM. ROM stands for Read-Only Memory, and does what it says; furthermore, the contents of ROM are not erased when the appliance is switched o. The ROM is generally used for storing those parts of the operating system which start up the computer. The original ROM memories consisted of etched silicon surfaces and were rather expensive to manufacture|since, for each application (i.e., program stored in the ROM) a dierent etching mask had to be made. Therefore ROM has been replaced by PROM (Programmable ROM), soon to be followed by EPROM (Erasable PROM): by applying ultra-violet light to it, its contents would be erased and it could be re-programmed. Final step was the EEPROM (Electrically EPROM) which could just be programmed and erased while in the computer. The dierence between RAM and EEPROM is that the time to write a byte in the latter is much, much longer than for RAM. Note that all the numbers given in this section will change heavily in time; the amounts of memory that are available to a computer used are quickly increasing, and it is not unlikely that within a few years each PC has enough storage capacity to store all available data (can you imagine that? Yet, what to do with it is quite another question: : :)
4.3.1 Bits, bytes, sizes As we said above, the smallest quantity of information in a computer, the bit, is a piece of memory which can have value 0 (false, o) or 1 (true, on), and thus represents two states. The smallest data element that is usually operated on is a byte which is 8 bits wide; a byte therefore holds 28 = 256 dierent values. Memory sizes are usually indicated in bytes with a sux K, M, G, etc. In computer terminology, K is not exactly 1,000 as in physics, but rather K=210 , i.e., 1,024. Thus M=KK=220 , and G=230 .
4.3.2 Memory types As we said above, the faster the memory, the more expensive it is. We'd prefer to have gigabytes or terabytes of the fastest memory available, but just cannot aord it (table 4.2 will show what I mean|don't look at it now, look at it later). Dilemma: we want fast but cheap memory. Solution: make a pyramid: very little of the fastest memory around, a bit more of the second fastest, etc. By moving frequently used memory contents from slower to faster memory (that's called paging about which you read in chapter 2) we end up with a reasonably fast machine for a reasonably low price. So what types of memory am I actually talking about? We distinguish several types of memory, depending on how fast their access is:
registers Registers are small pieces of memory within the processor chip. Within one processor
we distinguish two kinds of registers: the address registers, which are as large as the number of bits that a memory address is wide, and data registers, having the size of one word.
September 1994
Computer Systemen en Applicaties
4.3. MEMORY
45
It is clear that the decimal number system is not optimal for computing with binary numbers. The decimal (base 10) number system exists because humans have ten ngers; the hexadecimal (base 16) number system exists because 16 16 = 28 (8 is the number of bits in a byte) and, the octal (base 8) because that also equals 8 8 8. In hexadecimal (usually nicknamed `hex'), numbers are indicated with symbols 0{9 and A{F; in octal, 0{7. The following table compares the rst sixteen (I spell it out on purpose, so no confusion arises) numbers: binary hex octal decimal binary hex octal decimal 0000 0 0 0 0001 1 1 1 3 3 3 0010 2 2 2 0011 0100 4 4 4 0101 5 5 5 7 7 7 0110 6 6 6 0111 9 11 9 1000 8 10 8 1001 1010 A 12 10 1011 B 13 11 15 13 1100 C 14 12 1101 D 1110 E 16 14 1111 F 17 15 The table clearly shows that hex or octal are better for representing computer-associated numbers. In UNIX the convention is as follows: addresses are represented in hex, which are written in C notation using a leading 0x, as in 0x00FF, while character codes are represented in octal using a leading 0, as in 0377.
Registers. In every chip there are some special-purpose registers: the program counter (PC),
stack pointer (SP), and the instruction register (IR). The program counter always contains the memory address where the instruction is stored which has to be executed next. When the ow of control has to be changed (i.e., the program jumps from one location to another one), this results in a change of the program counter; otherwise, it is just incremented after an instruction is executed. The stack7 pointer contains the current address of the stack. The value of this register is usually changed by pushing something on the stack, or popping a value from the stack. The instruction register contains the current instruction; when a instruction is fetched from memory, it is stored in this register before it is decoded. Registers are used to store temporary data used in calculations. For instance, to compute nm, store n from RAM in data register A and B , and m in data register C . Now multiply A with B , storing the result in A, and decrement C . Repeat this process until C is 0. The result is stored in A, which is written back to RAM.
cache Cache memory has become very important in increasing the performance of computers.
The only dierence between cache and main memory is the access time: cache is much faster. This dierence is obtained by using other memory technology which is more expensive. Cache memories usually are tens to hundreds of kilobytes large. The paging algorithms described in section 2.1.2 also apply here.
7 What's a
stack? One of the best explanations has been given by rabbi Ra , the great Talmud-teacher of the middle ages. Ra comments on the biblical story of Jacob and Esau and the sold birthright: Think of a deep, very narrow pit. Throw some stones in it and take them out again. That stone which was thrown into the pit last will be removed as the rst one, and that stone which was thrown in rst will be the last one to be taken out. To Jacob and Esau this means that Jacob in reality was from the rst drop of semen and Esau from the second, therefore Esau was born as the rst and Jacob as the second. So Jacob was right to deprive Esau of his birthright.
Computer Systemen en Applicaties
September 1994
46
CHAPTER 4. COMPUTER ARCHITECTURE
Table 4.2: Comparison of types of memory. In 1994, a 5Gb disk cost around $2,000. A video tape, used for tape backup, could be obtained for around $48 .
characteristics
access time capacity cost (/c/Kb) bandwidth (Mb/s)
registers cache 10 ns 512 bytes 18,000 400{800
25{40 ns 128 Kb 72 250{400
main memory disk
60{100 ns 128 Mb 5.6 80{133
12{20 ms 5 Gb (per disk) 5 10?4 3{5
tape
minutes 1 Tb (per tape) 4 10?9 0.18{0.23
As we said, we'd like to have only cache and no slow main memory, but that's just too expensive.
main memory Next scale up is main memory (also called RAM). A PC with less than 8Mb of main memory is not really up-to-date anymore; one year from now that number may be 16Mb or 64Mb. Workstations are typically equipped with 16{128Mb. RAM is about ve to ten times as slow as cache memory, but also ten times as cheap.
disk A harddisk may contain from several tens of Mb to several (tens of) Gb. Access is, compared to RAM, slow due to the mechanical construction involved; a disk is about three orders of magnitude slower than RAM and the same order cheaper.
tape Finally, tape: used to store huge amounts of data. Tape, CD-ROM, and similar devices
may contain many gigabytes of data and are very cheap. Tapes are often used for storage backup, storing data from experiments, and exchanging data. A CD-ROM player is becoming standard equipment of even personal computers.
In table 4.2 the mentioned types of memory are compared.
4.4 Processors In the days when computers were still considered `smart' and `magic,' the processor or CPU (Central Processing Unit) was often referred to as the brain of the computer. Of course, processors are as much like brains as scissors are like space-shuttles. The processor is a chip which, in general, reads instructions and data from the bus, performs calculations on the data, and writes the data back on the bus. Figure 4.3 shows what a CPU chip looks like inside. What you can't see on that picture is that a CPU basically consists of three parts: a control unit, an Arithmetic Logical Unit (ALU), and a set of registers. As long as it is switched on, a CPU endlessly performs the following steps: 1. fetch the next instruction from memory into the instruction register; 2. decode the instruction and fetch additional data from memory, if needed; 3. execute the instruction, and write data back to memory if needed; 4. increase the program counter register; 5. back to 1. 8 The comparison is not entirely honest. After all, for $4 you only get the tape, and not the tapedrive. Perhaps it would be more fair to compare the tape price with the price of a oppy disk. Still cheaper, though.
September 1994
Computer Systemen en Applicaties
4.4. PROCESSORS
47
Figure 4.3: The inside of a CPU.
This is the microSPARC chip, blown up approximately 5 times. You can't get much useful information from this picture.
If you want to see what your C program looks like in assembly, give your compiler the -S
ag; most compilers will then generate a le with a sux .s or .asm containing the assembly code. If you're really curious, compile it on two dierent computers, e.g., on a PC and a Sun Sparc|the result will be very dierent. Especially on the latter, don't try to understand the code! That's part of the work of the control unit. The ALU takes care of the computation: perform boolean operation on its two inputs, or some arithmetic such as addition or multiplication. The input for the ALU comes from registers, and the output is also written to a register. Each processor has an instruction set and a large manual describing what each instruction does, precisely. There exist only a few basic instructions: fetch a word from memory or write it back, jump to an address, push something on the stack, or perform simple arithmetic on numbers. Most processors have in the order of 100{200 instructions. The instructions are represented by mnemonics: a symbolic representation (e.g., `LD A,E' to store register E in A) which makes it easier for humans to read the instruction (e.g., 01111011 or 0x7B). A program written using these mnemonics is called an assembly language program; it's the language that your compiler translates your C program to. So that's the basics of processors. Naturally, there is an ongoing quest for faster computers, meaning: faster processors. Increasing the clock speed doesn't do it all: due to the fact that an electron can only travel in the order of 107 m/s (in fact, a bit less), it takes more than 1ns for an electron to travel 1cm. Buses (in computers) are longer than that, so a clock-speed of 1000MHz is out of the question9. It has been (successfully) attempted to put a whole computer on a single chip; however, since all the components are concentrated on a few square cm of silicon, all the power is also consumed by that piece of silicon, and the same amount of heat as in a normal computer is dissipated on that very small area. Without very expensive cooling, this technique is (currently) not feasible. Secondly, it is also required that processors address an ever-increasing amount of memory. A computer with only 64Kb of memory needs only a 16-bit address bus (since 216 = 65,536 = 641,024) to address each byte. A computer with 128Mb needs a 27-bit address bus to address 9 It is for this reason that optical computers are of interest; light travels over an order of magnitude faster than electrons in metals.
Computer Systemen en Applicaties
September 1994
48
CHAPTER 4. COMPUTER ARCHITECTURE
each byte; of course, memories are word-addressed, i.e., with a 32-bit data bus and a 27-bit address bus, 4128Mb can be accessed. Before getting into speci c processor architectures, let's review a few chips of the past and present. In table 4.3 a list of several well-known processors is provided, together with the size of their bus. So, how do you make processors faster? There are several techniques, each of which has speci c application areas where their merit is most noticed. We discuss RISC vs. CISC, pipelining techniques to speed up processing, vector and array processing, and a few odd beasties.
4.4.1 Measuring processor speed As said, specifying the clock speed is not a good way to express computer speed|it would only work in advertisements for the uninformed. Therefore there are a few standard methods to express computer speed. A rst manufacturer-speci ed datum is the number of MIPS and MFLOPS their computer runs. MIPS stands for Million Instructions Per Second, and FLOPS for Floating point Operations Per Second. Secondly, several benchmark programs10 exist which measure computer speed. Two well-known standards are dhrystones and whetstones. The dierence between the two is that the latter also measures the speed of the computer in oating point computation, whereas the former does not. To illustrate how these numbers change, table 4.4 shows how many dhrystones several computers can run per second. Note the bad performance of the expensive Cray 2: this machine was not intended for the kind of computation done in a dhrystone benchmark.
4.4.2 CISC vs. RISC processors The early processors were very simple, having only a small set of instructions. This was due to the high price of hardware, while the low software-price made it necessary to put much of the functionality not in the CPU but in the programs (e.g., the once-popular Zilog Z80 microprocessor, heart of the TRS{80 and other early microcomputers, does not even have an instruction for multiplication). However, later on the roles reversed: hardware became cheap, software expensive. Thus the trend in the 1970's and 1980's was to make complex processors which were able to do increasingly complex tasks.
CISC Application area: general. What is it: CISC = Complex Instruction Set Computer. Well-known implementations: Intel 80x86 (PC's), Motorola 680x0 (NeXT, Macintosh).
Thus came the CISC processors, used for instance both in PC's and Macintoshes. CISC instruction sets typically encompass between 120 and 350 instructions. In most CISCs, instructions are usually not directly executed by speci c parts of the CPU, but they are interpreted in a microprogram. Thus one processor instruction requires more than one clock cycle. Nowadays the complex instructions are also hardwired in the chip, leading to a better performance since only one clock cycle per instruction is needed. 10 A benchmark is, in its widest context, a point of reference; the term is used in computer science to indicate a program which measures the quality of the computer in some sense.
September 1994
Computer Systemen en Applicaties
4.4. PROCESSORS
49
processor
producer
4004
Intel
used in
4 bit words (from 1970) calculators, watches, etc. 8 bit words (from 1975) 6502 Signetics, Western Digital Apple II, BBC, Commodore 64 8080 Intel Z80 Zilog TRS{80, Sinclair 16 bit words (from 1980) 8088 Intel IBM PC 8086 Intel Olivetti PC 80186 Intel Philips YES 80286 Intel IBM PC/AT T2 INMOS transputers 32 bit words (from 1985) 68000 Motorola Apple Mac, Atari ST, Amiga 68010 Motorola 68020 Motorola Amiga, Apple Mac, Sun2, Sun3 68030 Motorola Amiga, Apple Mac, NeXT, Sun3 68040 (+) Motorola Amiga, Apple Mac, NeXT 68040LC (*) Motorola Apple Mac 88000 Motorola Data General machines (DG/UX) T4, T8 INMOS transputers 80386SX (*) Intel, AMD 1 IBM PC's 80386DX (*) Intel PC's 80486SLC Cyrix (, TI?) TI 1,2 80486SX (*) Intel PC's 80486DX (+) Intel PC's Pentium (80586) Intel PC's ARM Acorn Apple notebook, Archimedes 601, 603, 604 Motorola PowerPC 64 bit words (from 1990) SPARC TI, Weitek, Fujitsu Sun4, SunSPARC1, 1+, 2 MicroSPARC TI Sun Classic, LX, SunSPARC5 MIPS MIPS (Silicon Graphics) IRIS Alpha DEC various DEC computers 620 Motorola PowerPC 128 bit words SuperSPARC TI SunSPARC10, 20 HyperSPARC ROSS (Fujistu) Sun MBus modules
addr data MHz 12
8
0.93
16 16 16
8 8 8
1{3
20 20 24 24 16
8 16 16 16 16
8 8
23 23 32 32 32 32 30 32 24 24 24 32 32 32 26 32
16 16 32 32 32 32 32 32 16 32 16 16 32 32 32 64
32
64
64 32
64 64
2
16 16 16 16{25 16{40 66 16 40 20 50 66 20
100
Table 4.3: Various processors throughout the ages.
The processors are divided by word size (the mentioned dates are approximate, of course). addr indicates the width of the address bus; e.g., for a chip where addr = 16, a total of 216 = 65 536 memory locations can be addressed (however, sometimes the address bus width also includes control lines such as for parity checks). With a data bus (word size) of width 8, the Z80 thus can address 65,536 bytes. The width of the data bus is given in the data column; naturally, this should never be more than the width of a word in that chip (it does, sometimes, to be abreast with future developments). When data is less than the word width, the data bus width is not optimal for that chip. As another example, the Pentium chip can access 232 = 4 294 967 296 addresses, while words are four bytes long. Note also the dierence between the 8088 and 8086 chips, both used in PC/XT machines. Since the 8088 has only an 8-bit data bus, transferring a word of 16 bits takes twices as long than with the 8086. A similar dierence exists between the later 80x86SX and DX families. (*) indicates that the processor does not contain a oating point unit (FPU); (+) that it does. ;
;
;
;
Making a table like this is a big pain; everybody knows a bit, and always disagrees with what others know. Yet we think it is rather accurate. Information has been provided by many people: Dick van Albada, Marcel Beemster, Anuj Dev, Randy Kruszka, John Latala, Frans Lotty, Andy Pimentel, Gert Poletiek, Murphy A. Sewall, and Edwin Steens. The authors spent many an hour trying to sieve out what's true and what's not when trying to to create this table: : :
Computer Systemen en Applicaties
September 1994
50
CHAPTER 4. COMPUTER ARCHITECTURE Table 4.4: Number of dhrystones per second for various computers. computer processor clock dhry Commodore 64 6510 1MHz 36 Atari 520ST 68000 8Mhz 846 PDP-11/73 KDJ11-AA 15Mhz 981 VAX 11/750 w/FPA 997 IBM PC 8088 4.77Mhz 390 IBM PC/XT 8088 4.77Mhz 403 IBM PC/XT 8086 9.54Mhz 909 IBM PC/AT 80286 6Mhz 1086 IBM PC/AT 80286 7.5Mhz 1315 IBM PC/AT 80286 9Mhz 1556 `386' 80386 4,000 Mac+ 68000 7.8Mhz 909 MacII 68020 2,500 SUN 2/120 68010 10Mhz 1315 SUN 3/180 68020 16.67Mhz 3846 SUN 4/260 SPARC 18,000 SUN Sparc 2 SPARC 35,000 SUN LX SPARC 27,500 SUN Sparc 10/30 SPARC 92,000 SGI Crimson MIPS 113,000 Cray 2 48,000
RISC Application area: general. What is it: RISC = Reduced Instruction Set Computer. Well-known implementations: Sun SPARC (Sun), PowerPC (IBM and Apple), DEC Alpha
(DEC). After the early RISC processors were gradually replaced by CISC, it was realised that 95% of the processor time was spent on 25% of the instruction set; i.e., 75% of the instruction set was virtually unused. This seems rather inecient. Some chip designers thought, perhaps it is better to move this 5% execution time into the software, and use the now vacant chip space for a more ecient implementation of the remaining, small, instruction set. RISC was born, with an instruction set of typically less than 100 instructions. The whole processor can be implemented on a single chip, resulting in a higher clock rate and thus better performance. Currently it seems that RISC will outrun CISC; however, the last word in this battle has not yet been said.
4.4.3 Pipelining, superscalar and vector processors There are several techniques to improve the performance of RISC and CISC processors. We will list them in this section.
Pipelining A typical cycle consists of four tasks: fetch instruction, decode instruction, execute instruction, write back result. Although all of these tasks have to be executed in sequence for one instruction, there is no reason that while decoding instruction i, the next instruction i + 1 cannot be fetched from memory. Figure 4.4 shows how pipelining works: instead of performing the steps in a processor sequentially, they are performed in parallel. Thus every clock-cycle one instruction
September 1994
Computer Systemen en Applicaties
4.4. PROCESSORS 1 2 3 4 5 6 7 8
fetch i decode i execute i write result fetch i + 1 decode i + 1 execute i + 1 write result
51 fetch i decode i fetch i + 1 execute i decode i + 1 fetch i + 2 write result execute i + 1 decode i + 2 write result execute i + 2 write result
Figure 4.4: Pipelining in a processor. can be completed, instead of requiring four clock cycles for each instruction. Of course, when instruction i + 1 depends on instruction i, the pipeline will have to wait sometimes; so linear speedup is not obtained. Nevertheless it results in a considerable speedup, and is a standard technique in processor architecture.
Superscalar Application area: numerical computations. What is it: parallelism in the processor. Well-known implementations: IBM RS/6000.
Superscalar processors go a step further than pipeline processors. A scalar processor executes one instruction per cycle. A superscalar processor more than one; thus, also multiple results are generated per clock cycle. This is simply done by having multiple instruction queues. Naturally, only independent instructions can be executed in parallel. If you would, for instance, add two numbers in one instruction, and subtract a third from the result in the next, those two instructions could not be executed in parallel. On the average, a parallelism degree of 2 in superscalar architectures is optimal. Most superscalar chips allow a degree ranging from 2 to 5 parallel instructions.
Vectorised Application area: numerical computations. What is it: computation on vectors of data. Well-known implementations: Cray, CDC Cyber.
A vector processor is especially designed to perform computation on vectors of data; i.e., the same computation is performed on more than one numbers. This allows for parallelisation: if you want to compute the addition of two vectors of length n, the fastest way to do this is to have n adders, put the vectors in, and get the result one clock cycle later. This method of speedup has been the basis for success of the Cray supercomputers.
4.4.4 Odd man out We will nally discuss two types of processor which are rather uncommon but still in production.
Symbolic Application area: arti cial intelligence. What is it: very dedicated. Well-known implementations: Symbolics 3600 Lisp, Philips L-Neuro.
Instead of dealing with numerical computation, the symbolic processor is designed for dealing
Computer Systemen en Applicaties
September 1994
52
CHAPTER 4. COMPUTER ARCHITECTURE
with, e.g., lists, logic, and neural networks. Since it is specialised, special hardware can be included for typical tasks such as computing the length of a list.
VLIW Application area: not so general; scienti c calculations. What is it: VLIW = Very Long Instruction Word. `Well-known' implementations: Imagine.
The VLIW architecture combines superscalar processing with parallel (i.e., horizontal) microcoding. The idea is to have very, very complex instructions which can be executed by that parallel microcode. The Imagine (`Image Engine') chip, introduced in June 1994 by Arcobel Graphics (The Netherlands), is dedicated to real-time graphics. By using the VLIW approach a very high performance can be attained (manufacturer-speci ed speedup of up to a factor 20).
4.5 I/O The nal devices that have access to the bus are the I/O (input/output) devices. To name a few: ethernet card, printer, disk, camera, robot, microphone, speaker, screen, keyboard. These devices usually operate in direct interaction with the CPU. Communication over the bus between I/O and the processor is usually memory-mapped. This means that parts of the address space of the processor does not correspond with RAM but with I/O devices. So, for instance, when the processor writes to address 0xFC89AB, the word written there will not end up in memory but may lead to an action on the oppy disk; e.g., writing 0x80 switches on the disk motor, while 0x00 switches it o.
4.5.1 Video displays
Most users who generate high quality images will do so on workstations con gured with extra hardware for creating and manipulating images. Almost every workstation manufacturer includes in its product line versions of their basic systems that are augmented with extra processors that are dedicated to drawing images. These extra processors work in parallel with the main processor in the workstation. There are many dierent techniques for drawing images with a computer, but the dominant technology is based on a raster scan. A beam of electrons is directed at a screen that contains a quick-fading phosphor. The beam can be turned on and o very quickly, and it can be bent in two dimensions via magnetic elds. The beam is swept from left to right (from the user's point of view) across the screen. When the beam is on, a small white dot will appear on the screen where the beam is aimed, but when it is o the screen will remain dark. To paint an image on the entire screen, the beam is swept across the top row; when it reaches the right edge, it is turned o, moved back to the left and down one row, and then swept across to the right again. When it reaches the lower right corner, the process repeats again in the upper left corner. The number of times per second the full screen is painted determines the refresh rate. If the rate is too low, the image will icker, since the bright spots on the phosphor will fade before the gun comes back to that spot on the next pass. Refresh rates vary from 30 times per second up to 60 times per second. The individual locations on a screen that can be either painted or not are known as pixels (from `picture cell'). The resolution of the image is the number of pixels per inch; a common screen size is 1280 pixels across and 1024 pixels high on a 16" or 19" monitor. The controller for the electron gun decides whether a pixel will be black or white by reading information from a memory that has one bit per pixel. If the bit is a 1, the pixel will be painted,
September 1994
Computer Systemen en Applicaties
4.5. I/O
53
otherwise it will remain dark. This memory is a special frame buer, which is written to the video controller by the main processor or a dedicated processor. Colour displays are based on the same principles: a raster scan illuminates regions on a phosphor, with the information that controls the display coming from a frame buer. However, instead of one gun there are three, one for each primary colour. When combining light, the primary colours are red, green, and blue, which is why these displays are known as RGB monitor. Since we need to specify whether or not each gun should be on for each pixel, the frame buer will have at least three bits per pixel. To have a wide variety of colours, though, it is not enough just to turn a gun on or o; we need to control its intensity. For example, a violet colour can be formed by painting a pixel with the red gun at 61% of full intensity, green at 24%, and blue at 80%. Typically a system will divide the range of intensities into 256 discrete values, which means the intensity can be represented by an 8-bit number. 8 bits times 3 guns means 24 bits are required for each pixel. Recall that high resolution displays have 1024 rows of 1280 pixels each, for a total of 1.3 million pixels. Dedicating 24 bits to each pixel would require almost 32Mb of RAM for the frame buer alone. What is done instead is to create a colour map with a xed number of entries, typically 256. Each entry in the colour map is a full 24 bits wide. Each pixel only needs to identify a location in the map that contains its colour, and since a colour map of 256 entries requires only 8 bits per pixel to specify one of the entries there is a savings of 16 bits per pixel. The drawback is that only 256 dierent colours can be displayed in any one image, but this is enough for most applications except those that need to create highly realistic images.
Computer Systemen en Applicaties
September 1994
54
September 1994
CHAPTER 4. COMPUTER ARCHITECTURE
Computer Systemen en Applicaties
Appendix A
Technologietrends in de informatica E.H. Dooijes / juni 1994
A.1 Inleiding
In dit hoofdstuk geven we een beeld van het hoe en waarom van de evolutie van de computer sinds het operationeel worden van de ENIAC in januari 1946. Deze gigantische machine (zie kastje 1) is een keerpunt geweest in een ontwikkeling die lang voor 1946 al gaande was. Een illustere (maar nooit verder dan het ontwerpstadium gekomen) voorganger was de Analytical Engine van Charles Babbage (1791{1871). Babbage's experimenten met rekenautomaten [6] zijn pas op enige schaal voortgezet in de late jaren 1930, toen de inmiddels tot wasdom gekomen electrotechniek een alternatief bood voor de uiterst kostbare en moeilijk hanteerbare mechanische systemen waarop Babbage was aangewezen. De eerste bruikbare resultaten|electromechanische rekenautomaten|werden rond het begin van de tweede wereldoorlog geboekt door K. Zuse in Duitsland, en in de U.S.A. door H. H. Aiken. De laatste werkte bewust voort op Babbage's ideeen, die aan Zuse niet bekend waren. Theoretisch werk dat van groot belang zou blijken in de context van de informatica werd in de jaren '30 gedaan door de wiskundigen A. Church en A. M. Turing. Voor 1945 was de computer in het Engelse taalgebruik een rekenmeester in de trant van Willem Bartjens [2], die zijn werk al of niet met behulp van mechanische hulpmiddelen uitvoerde. De ENIAC werd gemaakt om hetzelfde werk te doen, maar dan sneller en zonder menselijke interventie. De ENIAC verschilde van het gereedschap van de rekenmeester door snelheid, door programmeerbaarheid (vooral na de introductie van Von Neuman's 'stored program' concept, een essentiele stap vooruit), en door een gigantisch verschil in (logische) complexiteit vergeleken met eerdere informatieverwerkende machines, inclusief die van Zuse en Aiken. Hierdoor was de werking van de ENIAC niet beperkt tot simpele al of niet repeterende operaties; de machine kon veel waarde aan de invoergegevens toevoegen, in tegenspraak met het ook vandaag nog wel gehoorde gezegde 'er komt niet meer uit dan je erin gestopt hebt'. Dit inzicht leidde al in 1948 tot een technisch rapport [3] van Alan Turing over intelligente machines (het bekende artikel [4] waarin Turing de naar hem genoemde intelligentietest voor computers voorstelde dateert uit 1950). Overigens, de ideeen van Babbage hebben destijds zijn tijdgenote Lady Ada Lovelace al tot dergelijke speculaties genspireerd. Voor de tweede wereldoorlog werd wetenschappelijk rekenwerk gekenmerkt door weinig berekeningen, ondersteund door ongelimiteerd geheugen (kladpapier, tabellenboeken); met de ENIAC werd deze situatie precies omgedraaid. Dit aspect van de ENIAC heeft een heel nieuwe wending gegeven aan de numerieke wiskunde. Deze en andere door de ENIAC in gang gezette ontwikkelingen zouden uiteindelijk leiden tot de 'tweede industriele revolutie'; die werd een feit toen de gentegreerde schakeling op het toneel verscheen, waardoor de computer in korte tijd een wijd verbreid verschijnsel kon worden. 55
56
APPENDIX A. TECHNOLOGIETRENDS IN DE INFORMATICA
1. De ENIAC
Aan de de ENIAC werd sinds 1943 gewerkt door een groep van de Moore School of Electrical Engineering, University of Pennsylvania, Philadelphia, onder leiding van J. P. Eckert en J. Mauchly. De machine werd voor het eerst ingeschakeld in januari 1946 en is in bedrijf gebleven tot 1955. De ENIAC bevatte 18000 electronenbuizen. Het vloeroppervlak was 233 m2 , het energieverbruik 140 kW, het gewicht 30 ton. Aanvankelijk waren er maar 20 geheugenplaatsen, bedoeld voor in- en uitvoerdata en tussenresultaten, gerepresenteerd door decimale getallen van 10 cijfers. In het oorspronkelijke ontwerp gebeurde het programmeren met tuimelschakelaars en plugbare verbindingskabels zoals in een telefooncentrale ('patching'). Het stored-program concept van Von Neumann werd in de loop van 1946 gemplementeerd: het (inmiddels uitgebreide) geheugen werd toen mede gebruikt om gecodeerde programma's en subroutines op te slaan. De eerste toepassingen waren ten behoeve van het Manhattan (waterstofbom) project. Voor de oplossing van een bepaalde dierentiaalvergelijking waren 8 miljoen vermenigvuldigingen en een vergelijkbaar aantal opteloperaties nodig. Dit kostte 15 systeem-uren, waarbij inbegrepen 20% tijd voor testruns [1]. Toch ging het rekenen met de ENIAC 1000 maal zo snel als met de destijds bestaande middelen.
A.2 Wat is een computer? In het volgende bedoel ik met 'computer' in het algemeen een fysiek apparaat: de invariant van de computersystemen die met de computer als basis geconstrueerd kunnen worden. Een computer wordt een computersysteem door er programmatuur (software) en vaak ook toepassingsafhankelijke randapparaten aan toe te voegen. Men moet er zich van bewust te zijn dat de grens tussen hardware en programmatuur nooit scherp te trekken is geweest. Denk aan de 'patching' van de ENIAC en aan moderne concepten zoals rmware, microcoding en dergelijke; bij neurale netwerken is het traditionele concept van programmatuur zelfs niet meer van toepassing. Een computer in de hier gehanteerde betekenis kan nog worden gekarakteriseerd door universele maatstaven zoals snelheid (aantal logische operaties per seconde), energieverbruik en dergelijke. Naast digitale computers hebben in de jaren '50 en '60 ook analoge computers (of dierential analyzers ) een rol van betekenis gespeeld in industriele en wetenschappelijke toepassingen. Dit type machines zal in dit stuk verder niet ter sprake komen. In dit kader is het nauwelijks nodig om in details in te gaan op de werking van de digitale computer. We volstaan met de volgende fundamentele constatering: Een computer is in wezen een georganiseerde verzameling schakelaars van het type 'poort'. Een poort laat een electrische stroom al of niet door onder invloed van een andere electrische stroom. Hierdoor kan de stand (open of dicht) van de ene poort de stand van een of meer andere poorten benvloeden. Hoe dat gebeurt hangt af van de architectuur (de bedrading) van de computer, en van de ingevoerde data- en programmagegevens. Elke mogelijke combinatie van poortstanden de nieert een unieke toestand van de computer als geheel. Het totaal aantal mogelijke toestanden is dus N = 2P als er P poorten zijn. P kan worden opgevat als het aantal bits informatie dat door de machine kan worden opgeslagen en gemanipuleerd. Door Turing [2] werden de volgende getallen genoemd: Brunsviga (een destijds populaire mechanische calculator): P = 90; ENIAC: P = 600. In termen van aantallen toestanden is dat al een astronomisch verschil. Bedenk daarbij dat voor een doorsnee hedendaagse personal computer P van de orde 106 is! De eerste prototypen van computers bevatten electromechanische poorten (relais, zie kastje 2); in de ENIAC werden electronenbuizen gebruikt en sinds 1959 worden poorten uitsluitend gerealiseerd met transistors. Tussen data en programma's is geen intrinsiek verschil; dit is de essentie van het storedprogram concept. Niet alleen getallen maar ook teksten, abstracte concepten en programma's
September 1994
Computer Systemen en Applicaties
A.2. WAT IS EEN COMPUTER?
57
2. Relais, buizen en transistors
Een relais is een gewone schakelaar met een of meer paren contacten, die door een ingebouwde electromagneet kunnen worden doorverbonden of verbroken. In een electronenbuis is er een stroom van electronen van een door een gloeidraad verhitte electrode naar een tweede electrode. De electronenstroom kan worden verbroken door op een derde (gaasvormige) electrode die zich tussen de eerste twee bevindt een negatieve spanning aan te sluiten. Een transistor berust in grote trekken op hetzelfde principe, maar de electronen bewegen zich in dit geval door halfgeleidend materiaal. Meestal is dit silicium waarbij een kleine maar zeer nauwkeurig gedoseerde hoeveelheid 'vreemde' atomen is ingebouwd in het kristalrooster. Het schakelen van een poort kost tijd: voor een relais tenminste 1 milliseconde, een electronenbuis 1 microseconde, een transistor (ingebed in een modern VLSI circuit) 1 nanoseconde.
3. Aantallen actieve componenten in computers
Zuse Z3 (1943): 2000 relais, ENIAC (1945) 18.000 electronenbuizen IBM 709 mainframe (1959): laatste buizenmachine, ca. 20.000 electronenbuizen IBM 7090 mainframe (1959): 20.000 (aparte!) transistors Motorola 68000 processor (1980): 68.000 transistors op een chip DEC Alpha processor (1992) 1.7 miljoen transistors op een chip. De twee genoemde processorchips vormen het hart van een computer, die echter niet compleet is zonder geheugen. Daarvoor zijn ook nog eens miljoenen transistors nodig, eveneens ondergebracht op chips.
worden in een digitale computer gerepresenteerd door reeksen van binaire symbolen of bits. Het tweetal waarden dat een bit kan aannemen worden meestal aan/uit, true/false, of 1/0 genoemd. Het invoeren van een stroom bits betekent dat aan speci eke poorten de beginstand open of dicht wordt opgelegd. Het lijstje in kastje 3 geeft een indruk van de aantallen actieve componenten die men toen en nu in computers kon vinden. Bedenk dat het aantal componenten in het algemeen veel groter is dan het eectieve aantal poorten. Bijvoorbeeld, bij de ENIAC werden de buizen relatief inecient gebruikt omdat de machine met decimale getallen werkte, en verder was een groot aantal buizen nodig voor ondersteunende taken. Zoals eerder opgemerkt speci ceert de architectuur van de computer welke (groepen van) poorten elkaars toestand kunnen benvloeden. Een zeer belangrijke overweging daarbij is, dat het voor de menselijke gebruiker inzichtelijk moet zijn hoe zo'n systeem kan programmeren. Dat is een van de redenen dat men nog steeds hoofdzakelijk gebruik maakt van het Von Neumann model, waarbij we met elkaar communicerende groepen poorten onderscheiden die ieder een nauw omschreven, overzichtelijke functie hebben (processor, geheugen enz). Een extreme tegenhanger van de Von Neumann computer is het kunstmatige neurale netwerk. Hier is de architectuur genspireerd door die van biologische neurale netwerken. Van programmeren in de traditionele betekenis is bij een neuraal netwerk dan ook geen sprake. Tot op heden is de ontwikkeling van de computer hardware in grote trekken een kwestie van schaalvergroting van het Von Neumann model geweest; kennelijk was dat voldoende om de ons bekende evolutie van de toepassingen mogelijk te maken. Deze ontwikkeling lijkt langzamerhand zijn grenzen gevonden te hebben, althans waar het gaat om grotere systemen. We zien dan ook een snel groeiende belangstelling voor alternatieve architecturen, die meestal gekenmerkt worden door een 'symbiose' van (soms grote) aantallen van simultaan en min of meer onafhankelijk werkende gelijksoortige eenheden. Hoofdwet van de informatica: Alle computers kunnen in principe hetzelfde mits er voldoende
Computer Systemen en Applicaties
September 1994
58
APPENDIX A. TECHNOLOGIETRENDS IN DE INFORMATICA
tijd en/of geheugen beschikbaar is. In praktische gevallen kan aan de gestelde voorwaarde meestal niet worden voldaan! Een kunstmatig neuraal netwerk kan dus op een klassieke (Von Neumann) computer worden gesimuleerd, en in feite gebeurt dat ook bij de meeste hedendaagse toepassingen van neurale netwerken; het omgekeerde moet ook mogelijk zijn!
A.3 Mijlpalen In de volgende lijstjes geven we een aantal mijlpalen in de geschiedenis van de computer.
A.3.1 Hardware 1943 1946 1947 1949 1951 1952
relaismachines Zuse (Duitsland), Aiken (USA) ENIAC operationeel uitvinding transistor (Bardeen & Brittain, Bell Labs) EDSAC (UK, Cambridge, de eerste full-sized Engelse machine) Univac I, de eerste commerciele machine (1 exemplaar geleverd aan US Bureau of Census) IBM 701, de eerste in serie gebouwde commerciele machine, huur $15000/mnd; 19 stuks genstalleerd 1953 uitvinding kerngeheugen 1956 introductie harddisk (IBM): 5 MB op 50 schijven van 24 inch 1958 uitvinding van de vlakke (integreerbare) transistor (Fairchild Semiconductor Corp.) 1959 IBM 7090, de eerste commerciele transistormachine (rond 20.000 transistors a $80) 1961 eerste IC (4 transistors, Fairchild) 1964 IBM 360 mainframe 1965 de computer-muis bedacht in Stanford 1965 DEC PDP{7 minicomputer (per de nitie een computer met een prijs lager dan $50.000) 1970 Illiac IV, de eerste parallelle supercomputer (64 processoren, 20 MFLOPS, University of Illinois; in gebruik geweest to 1981) 1971 Intel 4004 microprocessor, 60.000 instructies per sec, 2300 transistors 1971 De eerste pocket calculator (Texas Instruments) 1972 Tektronix Direct View Storage Tube gra sche terminal (startpunt voor de ontwikkeling van computer graphics) 1973 introductie Winchester disk (IBM), volgens het nog steeds gangbare principe van de zwevende lees/schrijfkop. 1976 Cray{1 supercomputer (pipelining, vector processing, compacte bouw ?! 100 MFLOPS) 1976 Ethernet (Xerox), Token Ring netwerk (IBM) 1977 de eerste microcomputers Apple II, TRS{80, PET gebruiken 8-bits microprocessoren
September 1994
Computer Systemen en Applicaties
A.3. MIJLPALEN
59
1978 eerste 16-bit microprocessor (20,000 transistors) 1980 het VLSI tijdperk ingeluid met het boek Introduction to VLSI Design van Mead en
Conway. 1981 IBM PC 1981 Xerox 8010 (Star) eerste commerciele toepassing van windows met muisinteractie (Graphical User Interface, GUI); geen commercieel succes wegens hoge prijs en afmetingen 1982 Inmos (UK) transputer bouwsteen voor parallele machines 1982 Apollo 6000, het eerste succesvolle werkstation, uitgerust met een window systeem 1984 Apple Macintosh, maakt succesvol gebruik van het GUI 1990 Connection Machine: massaal-parallelle machine, tot 64000 'basic' processoren 1991 Cray Y{MP: kleine-schaal parallelle supercomputer, 3 GFLOPS 1993 Alpha, PowerPC, Pentium: 32-bits microprocessoren
A.3.2 Programmeertalen en -methoden
Een complementair aspect van de ontwikkeling van de computertechnologie is het ontstaan en evolueren van programmeertalen. In dit stuk geven we slechts een geannoteerde opsomming, die eigenlijk geen recht doet aan het belang van dit gereedschap; zonder programmeertalen was de toepasbaarheid van computers en daarmee een belangrijke drijfveer achter hun ontwikkeling slechts marginaal geweest. Wel moet worden opgemerkt dat door de jaren heen het probleem heeft bestaan dat de mogelijkheden van de 'hardware' niet volledig konden worden uitgebuit met de tezelfdertijd gangbare programmeer-technieken en gereedschappen. Ook tegenwoordig is deze 'software crisis' actueel, nu het Von Neumann architectuurmodel gaandeweg verlaten wordt ten gunste van 'parallelle' machines met vele simultaan werkende processoren.
1946 1951 1954 1956 1959 1960 1963 1967 1971 1972 1972 1974 1979
stored program concept (J. von Neumann) 'pseudocode compiler' voor de Univac, in feite de eerste assembler Fortran (Backus, IBM): tot op heden veel gebruikt voor numerieke berekeningen LISP: functionele taal; nadruk op symbolische berekening Cobol: bedrijfsgegevensverwerking Algol 60: numerieke berekeningen PL/1 (IBM): algemene doeleinden, nog hier en daar in gebruik Simula: de eerste objectgeorienteerde taal; discrete simulatie Pascal: algemene doeleinden; onderwijs Prolog: kunstmatige intelligentie C: systeemprogrammering SQL (Structured Query Language): relationele database taal Ada (US Dept of Defense): zeer algemene doeleinden
Computer Systemen en Applicaties
September 1994
60
APPENDIX A. TECHNOLOGIETRENDS IN DE INFORMATICA
A.3.3 Bedrijfssystemen Bedrijfssystemen verzorgen de taakverdeling tussen de diverse componenten van een computersysteem, en vormen het intermediair tussen de programmeur en de hardware van een computersysteem. Bekende voorbeelden uit verleden en heden: IBM JCL (Job Control Language); Control Data NOS/BE; DEC RT-11; Bell Labs Unix; Microsoft DOS. Tot op heden hanteert haast iedere fabrikant zijn eigen systeem, ofschoon Unix (in diverse varianten) steeds meer terrein wint als standaard, ook voor kleinere systemen. In sommige gevallen was het bedrijfssysteem gentegreerd met programmeertaal (BASIC, Forth, Smalltalk). Speciale karakteristieken van bedrijfssystemen: batch processing (tot in de jaren '70 de normale gang van zaken); multiprogramming (meer jobs tegelijk in geheugen, 1959), time-sharing (multiprogramming gecombineerd met interactief terminalgebruik, 1970), multi-tasking (Unix). De facto standaard protocols voor het onderling verbinden van computers zoals TCP/IP (Department of Defense 1970); OSI (Open System Interconnection) maken gedistribueerde processing mogelijk.
A.4 Het exponentiele technologie-groei model De ervaring van de afgelopen 30 jaar leert dat de kosten van computerapparatuur globaal met 20{30% per jaar zijn afgenomen. Dat betekent een factor 105 over 30 jaar! Kosten zijn hier te vertalen als prijs-prestatie verhouding, prijs per geheugenbit en dergelijke. In 1964 heeft G. E. Moore voorspeld dat het aantal componenten/chip elk jaar toe met factor 1.5-2 zou toenemen. Deze wet lijkt nog steeds van kracht te zijn. (Het oppervlak van een chip is steeds van de orde 1 cm2 gebleven; straks zullen we zien waarom). Uit deze voorbeelden|waar nog vele aan zijn toe te voegen|blijkt dat de computerindustrie een goed voorbeeld is van exponentiele technologie-ontwikkeling die ook elders wordt gevonden, en heel algemeen kan worden gemodelleerd als T (t) = T (t0 ) rt?t0 . De technologie-maat T in jaar t is bijvoorbeeld de prijs-prestatieverhouding of prijs per geheugenbit bij computers, of de brandstofconsumptie per Watt afgegeven vermogen bij straalmotoren, of de lichtopbrengst per Watt verbruikt vermogen bij gloeilampen. Exponentiele groei wordt gevonden als aan de volgende drie voorwaarden tezamen is voldaan (Fusfeld 1973 [8]): 1. het product in kwestie moet lange tijd gefabriceerd worden; 2. het fabricageproces wordt steeds verbeterd door leereecten; 3. het ontwerp wordt steeds verbeterd (een ander leereect). Deze voorwaarden houden in dat het product een korte (economische) levensduur heeft (snelle 'turnaround' en geen marktverzadiging), en dat de groei niet beperkt wordt door fysische limieten (zoals bij straalmotoren en auto's). Naast de 'push' van de halfgeleiderindustrie heeft tot in de jaren '80 de strategie van IBM om computerproducten te migreren van hoge prijs/klein volume naar lage prijs/groot volume een grote rol gespeeld om de punten 2 en 3 te realiseren. Ook de interactie van de industrie met de afnemers (academische en industriele computergebruikers) is daarvoor van belang geweest, bijvoorbeeld door de ontwikkeling van programmeertalen.
A.4.1 Voorbeeld van exponentiele ontwikkeling: geheugenelementen Voor het geheugen werden vanaf 1952 magneetkernen gebruikt. Voor die tijd gebruikte men vertra-gingslijnen, magnetische trommels en 'Williams buizen'. De prijs per geheugenkern was in 1960 20/c en nam daarna af met 19% per jaar. In 1974 vond de 'turnover' plaats naar het halfgeleider-geheugen (de 4 kbit chip); voor beide technieken gold toen een bitprijs van 1/c.
September 1994
Computer Systemen en Applicaties
A.4. HET EXPONENTIE LE TECHNOLOGIE-GROEI MODEL
61
Kerngeheugen is nog tot ver in de jaren '80 gebruikt voor speciale (vooral militaire) toepassingen, onder meer omdat stralingsbestendig is en het de informatie vashoudt als de computer is uitgeschakeld. Voor halfgeleidergeheugen geldt de empirische formule productieprijs per bit in /c = 0.3 0.72(t?1974) dat wil zeggen dat de prijs met 28% per jaar vermindert. Tegenwoordig (1994) kost een geheugenbit 1 m/c, bij 256 Mbits op een enkele chip. Een gevolg van de exponentieel dalende geheugenprijs (en het feit dat de eindgebruikers daar nuttig gebruik van wisten te maken!) was dat de adresruimte van processoren elke 2{3 jaar met een bit moest worden uitgebreid. Dit feit heeft voor vele generaties van machines uiteindelijk de levensduur bepaald. Ter illustratie: de PDP{11/40 minicomputer (1970) had 16 adresbits; de huidige PowerPC chip heeft 32 adresbits; dat is voldoende voor het adresseren van 4 Gbyte geheugen!
A.4.2 Waarvoor wordt die rekenkracht gebruikt? De wereld heeft nooit moeite gehad met het ten nutte maken van de door de computerfabrikanten aangeboden apparatuur. Naarmate de machines kleiner, goedkoper en krachtiger worden, melden zich steeds nieuwe groepen van gebruikers. (Inderdaad is er in de computerbranche in hoofdzaak sprake geweest van 'technology push', niet van 'market pull'). Een recent voorbeeld is de intrede van de computer in de drukwerkindustrie. Er zijn heel wat toepassingen van de computer te noemen waar verbetering van de prestaties zelfs in de verre toekomst nog welkom zal zijn. Moeilijke problemen: simulatie, zoals ten behoeve van de weersvoorspelling; visualisatie en beeldverwerking (medische wetenschappen, drukwerkindustrie, virtual reality); interactieve systemen, robots, auto- en vliegtuigbesturingsautomaten. Wat deze problemen moeilijk maakt is de zeer grote hoeveelheid data die moet worden gemanipuleerd, de eis dat dit zeer snel gebeurt ('real time') en met name in het laatste voorbeeld de eis dat dit foutloos gebeurt. Het moge duidelijk zijn dat deze eisen niet alleen op de computer hardware betrekking hebben maar evenzeer op de programmatuur! Intrinsiek moeilijke problemen: hiermee wordt men vaak geconfronteerd in situaties waarin alle mogelijke combinaties van dingen moet worden nagegaan, wat leidt tot een 'combinatorische explosie'. Het schoolvoorbeeld is het handelsreizigerprobleem: zoek de kortste route langs n steden. Voor n = 7 zijn er 360 routes; voor n = 120 zijn er 10179 ! In deze context interessante praktijk-voorbeelden zijn het testen van chips en het ontwerpen van de geometrie van chips en chipdragers (printed circuit boards).
A.4.3 Waarom steeds kleiner? Dat computers klein zijn geworden blijkt als we de fysieke eigenschappen van de ENIAC (kastje 1) vergelijken met die van een doorsnee PC. Niet alleen is er een gigantisch verschil in afmetingen, maar ook in electriciteitsverbruik (150 W voor een PC). De rekenkracht van de PC is honderden malen die van ENIAC. Wat er in eerste instantie kleiner wordt bij het voortschrijden van de technologische ontwikkeling is de omvang van de op een chip ondergebrachte componenten (zoals transistors). Dit brengt natuurlijk met zich mee dat complete systemen ook kleiner kunnen zijn, maar dat is toch niet de directe aanleiding van deze ontwikkeling. Globaal kan worden gesteld dat een vierkante centimeter silicium altijd hetzelfde kost, wat er ook op gezet wordt (mits dat iets is wat grote productievolumes rechtvaardigt!) Dus de prestatie-prijs verhouding is evenredig met het integratieniveau. Verder worden computers sneller en betrouwbaarder bij toenemend integratieniveau (hieronder wordt op deze aspecten nader ingegaan). Tenslotte is er een enorm scala van toepassingen voor kleine en/of speciale computers, waaronder schootcomputers; 'ingebedde' computers in meetinstrumenten, CD-spelers, calculators, wasmachines, smart cards enzovoort.
Computer Systemen en Applicaties
September 1994
62
APPENDIX A. TECHNOLOGIETRENDS IN DE INFORMATICA
4. Hoe worden chips gemaakt?
Een chip wordt gemaakt volgens het volgende lithogra sche procede. Op een plak zuiver silicium (wafer) wordt een lichtgevoelige afdeklaag gebracht. Het ontwerp (in de vorm van een masker of sjabloon) wordt hierop sterk verkleind geprojecteerd met kortgolvig licht of een electronenbundel. Na ontwikkeling, zoals bij fotogra sche lm, kan op de belichte plaatsen de afdeklaag worden weggespoeld. Op deze plaatsen kan dan het silicium worden gedoped (vreemde atomen worden in het kristalrooster ingebouwd om de electrische eigenschappen te veranderen). Deze bewerking kan vele malen worden herhaald met steeds andere maskers. Een enkele wafer bevat tientallen chips die van elkaar gescheiden worden en dan individueel getest. Het is gebruikelijk dat in deze fase meer dan de helft van de chips moet worden weggegooid. De overblijvende chips worden op een drager gelijmd en voorzien van aansluitdraden. Om het uitvalpercentage te verminderen worden soms redundante subsystemen op elke chip aangebracht. De reserve-subsystemen worden geactiveerd als de oorspronkelijke subsystemen defect blijken. Door de regelmatige structuur kan voor geheugenchips een veel grotere pakkingsdichtheid bereikt worden dan voor processorchips.
A.4.4 Chips van dichtbij bekeken
Het aantal componenten per chip neemt elk jaar toe met factor 1.5{2 (wet van Moore). Daarentegen zijn de afmetingen van een chip in de loop van de jaren nauwelijks toegenomen; het oppervlak is tot op heden van de orde 1 cm2 . De reden daarvan is, dat bij groter oppervlak de kans dat het silicium een verontreiniging of defect bevat sterk toeneemt. Grotere chips zouden daarom een onacceptabel hoog uitvalpercentage veroorzaken. Toename van het aantal componenten gaat dus gepaard met reductie van de afmetingen van die componenten, en met een toename van de pakkingsdichtheid. Er zijn verschillende redenen waarom het aantrekkelijk is de pakkingsdichtheid van chips groot te maken: grotere systeemsnelheid door kleinere transistors en kortere verbindingen op de chip zelf, en door minder verbindingen tussen de chips onderling;
grotere systeembetrouwbaarheid want minder (lange en kwetsbare) verbindingen tussen de chips onderling, en minder chips nodig voor een gegeven ontwerp;
minder warmte-ontwikkeling door het overbodig worden van veel periferie-driver transistors;
relatief goedkoper: de prijs van een chip is onafhankelijk van wat er op staat (bij massaproductie).
Warmte-ontwikkeling op een chip is onvermijdelijk. Een van de belangrijke taken van de verpakking van de chip is dus het afvoeren van warmte, opdat de temperatuur van het silicium de maximale waarde van 75 a 85 graden niet overschrijdt. Bij luchtkoeling kan via de verpakking enkele Watts worden afgevoerd. Een indruk van het probleem geeft een 60 Watt gloeilamp, die 0.5 W/cm2 produceert en dan veel te heet is om aan te raken. De Alpha-processorchip verbruikt 30 W en heeft een oppervlak van 2.3 cm2 ! De snelheid waarmee signalen worden getransporteerd over metalen draden is 0.5{0.8 maal de lichtsnelheid, dus 1.5{2.4 108 m/s. Bij een cyclustijd in de orde van 10ns is de karakteristieke lengte dus L = 1:5 ? 2:4m. De verschillen in lengte van verbindingen moet binnen een fractie van L blijven om synchrone actie van de componenten van een computersysteem te garanderen. Dit is typisch een probleem dat men bij grote supercomputers aantreft. Op een chip is de situatie
September 1994
Computer Systemen en Applicaties
A.5. DE LEVENSLOOP VAN EEN COMPUTER
63
geheel anders omdat hier geen metalen verbindingen worden gebruikt. De signaalsnelheid kan hier tot 1000 maal kleiner zijn dan de lichtsnelheid, resulterend in een karakteristieke lengte van de orde van enkele millimeters. Dit veroorzaakt grote problemen, bijvoorbeeld bij het distribueren van de kloksignalen die voor de synchrone werking van alle subsystemen op de chip moeten zorgen. Bij een processorchip zit de klokgenerator daarom in het midden. Voor de verbinding van de chip met de buitenwereld zijn contactpennen nodig, des te meer naarmate het integratieniveau van de chip toeneemt. (Een empirische relatie tussen het aantal poorten op de chip en het benodigde aantal pennen wordt gegeven door de regel van Rent [5].) Dit draagt uiteraard bij aan de complexiteit van de externe bedrading. Ruwweg zijn in een systeem dat n chips bevat n2 verbindingen nodig. Het is dus de kunst om de juiste balans te vinden tussen het toegepaste integratieniveau en het aantal benodigde chips, uitgaande van het gegeven dat in moderne computers 'draden' de meeste plaats innemen, een snelheidsbottleneck vormen, en zeer kostbaar zijn (vroeger was de bedrading een relatief gezien verwaarloosbare factor).
A.4.5 Andere aspecten
De vooruitgang van de computertechnologie is niet alleen een kwestie geweest van de evolutie van de chip. Andere onderdelen van de computer hebben hun eigen, even belangrijke ontwikkeling doorgemaakt. Denk aan de verpakkingstechnologie, de communicatietechniek (glasvezel), de verbetering van het massa-geheugen: diskette, tape, hard-disk, CD-ROM (ook hier is sprake van een exponentiele ontwikkeling). Ook het essentiele aandeel van de software wetenschap en -industrie is in dit stuk sterk onderbelicht.
A.5 De levensloop van een computer De technische en economische levensduur van computers lopen zeer sterk uiteen. Eerder was al opgemerkt dat een korte economische levensduur een wezenlijk bestanddeel is van het exponentiele technologie-ontwikkelingsmodel. We weten echter dat aan de andere kant de betrouwbaarheid van een computer boven elke twijfel verheven moet zijn. Tijdens hun economische levensduur mogen computers niet stuk gaan of merkbaar slijten. Daarentegen worden huishoudelijke apparaten zoals wasmachines meestal gebruikt tot de slijtageproblemen uit de hand dreigen te lopen. Kunnen computers uberhaupt verslijten? Laten we eerst de elementaire bouwstenen: de poorten (electronenbuizen, transistors) bekijken. De 18,000 electronenbuizen de ENIAC hadden een gemiddelde levensduur (MTBF, mean time before failure) van ca. 1 jaar. Dit betekent dat het tijdsverloop tussen twee storingen gemiddeld ca. een half uur was. De ENIAC kon dan ook alleen voor serieuze berekeningen gebruikt worden doordat er eenmaal per week preventief onderhoud werd gedaan, waarbij buizen die op het punt stonden stuk te gaan werden opgespoord (bij buizen was het nog mogelijk zoiets te constateren: : :). We weten dat een moderne microprocessor met 2 millioen transistors onder normale omstandigheden jaren lang zonder storing kan functioneren. Gaan we er van uit van een storingsvrije periode van een jaar, en van de (meestal correcte) veronderstelling dat het falen van een transistor tot onbruikbaarheid van de chip leidt, dan komen we tot een MTBF van een miljoen jaar voor de individuele transistors! (Het gaat hier uiteraard over de chips die tijdens de fabricage de acceptatietest hebben doorstaan). Dit astronomische getal is essentieel voor het succes van de microelectronica. De halfgeleiderfysica laat zien dat transistors wel degelijk 'slijten' door het gebruik, hoewel dit moeilijk kwantitatief is aan te geven. Het bovenstaande globale getal berust dan ook op de ervaring, niet op theoretische overwegingen. Ondanks de betrouwbaarheid van de halfgeleiderelectronica is het in sommige gevallen wenselijk een run-time controle op de goede werking uit te oefenen. Bijvoorbeeld wordt RAMgeheugen vrij algemeen voorzien van een 'redundancy check' mechanisme. Een reden daarvoor
Computer Systemen en Applicaties
September 1994
64
APPENDIX A. TECHNOLOGIETRENDS IN DE INFORMATICA
is dat kosmische straling zogenaamde soft errors kan veroorzaken: dat zijn fouten die niet met fysieke destructie van het betreende geheugenelement gepaard gaan. Wat de overige onderdelen van een computer betreft: ook de MTBF van mechanische delen zoals het toetsenbord en de harde schijf, en van relatief kwetsbare electronische componenten (grote condensatoren, beeldbuis) wordt tegenwoordig in jaren gemeten. De economische levensduur van een computer wordt daarom veeleer bepaald doordat zich na verloop van tijd de volgende problemen manifesteren: verminderde beschikbaarheid van (ondersteunde) software, hoge onderhoudskosten, gebrek aan 'opwaartse' compatibiliteit, te groot energieverbruik, te grote omvang en te groot gewicht. Dit leidt er in de praktijk toe dat computers in hoog tempo worden afgeschaft. We wijzen er nogmaals op dat dit een onvermijdelijk bijverschijnsel is van het exponentiele groeipatroon. Gelukkig is men zich er steeds meer van bewust dat een en ander gevolgen heeft voor het milieu: niet iedere afgeschafte computer komt in een museum terecht! De plastic behuizingen van computers en randapparaten zijn beruchte probleemveroorzakers. Er wordt nu naar gestreefd tot een verlenging van de levensduur van computers te komen door het toepassen van modulaire architecturen die de mogelijkheid geven tot uitbreiding en 'upgrading' van onderdelen. We hebben in dit stuk geen gelegenheid in te gaan op de virus-ziekten van de computer. Een ander interessant en belangrijk aspect dat we hebben moeten overslaan is de levensduur van informatie-dragers (van ponskaart tot CD-ROM).
A.6 Toekomstige ontwikkelingen Wat de te verwachten toekomstige ontwikkelingen betreft volstaan we met een aantal losse opmerkingen. De huidige exponentiele ontwikkeling van de chip zal rond 2010 stagneren doordat de transistors dan zo klein geworden zijn dat de huidige 'klassieke' electronica niet meer van toepassing is (men krijgt dan te maken met quantumverschijnselen), nog afgezien van de technische problemen die zich bij de fabricage van 'nanostructuren' zullen voordoen. Het is daarom niet juist om door extrapolatie van de wet van Moore te besluiten [7] dat een chip in 2022 100 miljard componenten zal hebben (dat is het aantal neuronen in de menselijke hersenen). En het is evenmin gerechtvaardigd om vervolgens tot de conclusie te komen dat zo'n chip dan zou kunnen bogen op mens-achtige intelligentie. Immers we hebben al gezien dat het construeren van hardware en het vinden van een methode om er iets verstandigs mee te doen twee werelden apart zijn. Dat dit geen sinecure zal zijn blijkt ook al uit de onverhoopt geringe vooruitgang die in de afgelopen 25 jaar op het gebied van de kunstmatige intelligentie geboekt is. Nieuwe wegen op het gebied van de hardwaretechnologie worden verkend onder de noemer van cryogene, analoge, optische en moleculaire systemen. Het ziet er echter naar uit dan de vooruitgang in de voor ons liggende jaren in de eerste plaats gezocht zal worden in het beheersbaar maken van de mogelijkheden die de huidige techniek al biedt. Er wordt al gesproken over de nieuwe computer-revolutie: het volwassen worden van parallelle en gedistribueerde rekentechnieken (waaronder ook het werken met neurale netwerken). Helaas manifesteert de software bottleneck zich hier in alle omvang. Ook zal de informatica een ander gezicht krijgen door de in het verschiet liggende mogelijkheid om dankzij geautomatiseerde ontwerp- en fabricagetechnieken goedkope chips te produceren voor zeer speci eke toepassingen (en dus op betrekkelijk kleine schaal).
Literatuur [1] J.G. Brainerd, T.K. Sharpless: The ENIAC. Electrical Engineering 67(1948). Herdrukt in Proc IEEE 72 (1984) pp 1203-1212.
September 1994
Computer Systemen en Applicaties
A.6. TOEKOMSTIGE ONTWIKKELINGEN
65
[2] W. Bartjens: Cyeringe. Zwolle 1637 (laatste herdruk 1839). [3] A.M. Turing: Intelligent machinery. N.P.L. Report 1948. Herdrukt in Evans and Robertson (eds.): Key papers in Cybernetics. Butterworth 1968. [4] A.M. Turing: Computing machinery and intelligence. Mind 59 (1950) no.236 [5] R.W. Keyes: Fundamental limits in digital information processing. Proc IEEE 69 (1981) pp 267-279. [6] D.D. Swade: Redeeming Charles Babbage's mechanical computer. Scienti c American 268 (1993) no.2 pp 62-68. [7] A. Leijtens: Intelligente computer simuleert perfect. Computable, 18 maart 1994, p 23. [8] C. Gordon Bell eo.: Computer Engineering: A DEC view of hardware systems design. Digital Equipment Corporation 1978.
Computer Systemen en Applicaties
September 1994
66
September 1994
APPENDIX A. TECHNOLOGIETRENDS IN DE INFORMATICA
Computer Systemen en Applicaties
Appendix B
Glossary VIRTUAL MACHINE
The actual computer that you may be working on consists of a large number of chips with wires between them. Theoretically it would be possible, as with a pocket calculator, to connect switches and lights to the wires, and access the computer by continuously changing the switches and reading the lights. We'll call this actual machine a level 0 machine. Of course this kind of operation is very inecient, so instead a virtual computer is created (at level 1), which has the same functionality as the actual computer (i.e., it can do the same tasks), but whose actions are rst translated to actions that the actual computer can do. Typically one would have a set of instructions, each of which uniquely corresponds with a useful setting of the switches. On top of this virtual machine, another virtual (level 2) machine can be added, with again the same functionality, but with more complex instructions which may take multiple level 1 instructions to be performed. And so on. The distinction between the layers is strict, and one can access a machine at level n and write programs for it, just as if it really existed. Figure B.1 shows such a multilayer architecture. level n
virtual machine translation via interpretation or compilation
level 2
virtual machine translation via interpretation or compilation
level 1
virtual machine translation via interpretation or compilation
level 0
actual machine
Figure B.1: Virtual machines stacked on top of each other. The term `virtual' is pretty popular in computer science. For instance, the index of (Tanenbaum, 1984) lists virtual I/O, machine, machine exception, machine fault, and memory. The Merriam-Webster dictionary lists for virtual, 67
68
APPENDIX B. GLOSSARY one sector in the last track
Figure B.2: A typical disk structure (yet the track width is much enlarged). ? number of cylinders ? track-to-track tracks per cylinder 2 avg. seek time ? rotation time sectors per diskette bytes per sector bytes per diskette sectors per track Table B.1: Typical data for a Sun SCSI harddisk. vir.tu.al \'v*rch-(*-)w*l, 'v*r-ch*l\ \.v*r-ch*-'wal-*t-e-\ \'v*rch-(*-)w*-le-, 'v*rch-(*-)le-\ aj [ME, possessed of certain physical virtues, fr. ML virtualis, fr. L vi]rtus strength, virtue : being in essence or effect but not in fact - vir.tu.al.i.ty n
DISK
The hardware. A disk consists of a number of cylinders, with each as many tracks as
the disk drive has heads. A track, something like a groove on a record, but much smaller and not a spiral but one of a set of concentric circles, consists of a number of sectors. Refer to gure B.2. Table B.1 lists the number of cylinders, sectors, and timing for a typical harddisk used in workstations.
The software. When a speci c block from a disk is requested, a sequence of tasks has to
be performed. First, for oppy disks, when the drive motor is not running, it must be started and the system has to wait until the required speed is attained. Next, the right cylinder has to be found. There are a few important algorithms, which are very similar to elevator schedyling algorithms. Why this similarity? Since the problem is very similar. When an elevator in a somewhat tall building is travelling up and down, it has to make smart decisions about which
oor to go rst when ve people are calling for the elevator from dierent oors at the same time. This is much like ve requests from disk blocks situated on dierent tracks of the disk, and the disk head has to visit them all. Some algorithms:
First come, rst serve of FCFS. Cylinder requests are handled as they come in. This is the
simplest possible algorithm, and is typically used on small systems where usually only one process is active. Shortest seek rst or SSF. The cylinder requests are queued, and the nearest cylinder is always visited next. Although the average serving time of all cylinders is drastically reduced, when many requests are queued the cylinders in the two extremes of the disk will have too long a waiting time.
September 1994
Computer Systemen en Applicaties
69
Elevator algorithm. This algorithm is often used in elevators: the disk arm moves in one
direction, until no further requests in that direction are present. Then the arm direction is reversed. Although the average performance is worse than SSF, the maximum waiting time is limited by twice the number of cylinders. Finally, after the right cylinder has been located, the driver waits until the right sector is available, and reads it to memory. Error correction is usually handled by the hardware, though in some cases (e.g., IBM PC) also have to be handled by the disk driver. The last task for oppy drivers is to set a timer for the driver motor to switch o again. Partition
Computer Systemen en Applicaties
September 1994
70
September 1994
APPENDIX B. GLOSSARY
Computer Systemen en Applicaties
Acknowledgments These lecture notes were initially written in 1994 for the class `Computer Systemen en Applicaties' given by Patrick van der Smagt, Frank Tuijnman, and Emma van der Meulen. In the current version, only one third of the subjects (those presented by Patrick) are included. So, although Patrick wrote it, much thought about the form etc. has been given by Frank and Emma. This book has had more input than is apparent from its cover. The authors whose work we used are thanked by getting a reference in the bibliography. But there were many others, whom we met through phone, informal chat, or email contact. Corrections, additions, and lots and lots of information was given by Anuj Dev, Jose Lagerberg, Frans Lotty, Edwin Steens, and Jan Wortelboer. Not to forget, last but not least, Gert Poletiek, who gave a tremendous amount of input and knows to answer most any question about computers. We want to thank Eddy, Jon Hawkins, Amy Parker, Joseph Nyachiro, and last but not least Gregory C. Gugel for their Swahili knowledge, and Sentiono Leowinata and Tina Wibawa for their Indonesian. Further help was given, free of charge, by Darius Bacon, Kurt Godden, Al Goldstein (alias Mr. Xfree), Brian Harvey, Perry Hutchison, Michael Korcuska, Barry Margolin, M. Sasikumar, and Tom Strickland Jr. Thank you, guys, for the help. And the nal thank-you is for Sander Bosman, who assisted in setting up the programming lab associated with this subject, aided with his clear no-nonsense view on matters (which, hopefully, will be appreciated by future readers and students), and added to the Primer on C. All of this just for the love of it.
71
72
September 1994
APPENDIX B. GLOSSARY
Computer Systemen en Applicaties
Bibliography Black, U. (1991). X.25 and Related Protocols. IEEE Computer Science Press, New York, NY 10017. Frisch, A. (1991). Essential System Administration. O'Reilly, Sebastopol. Goodheart, B., & Cox, J. (1994). The Magic Garden Explained. Prentice-Hall. Hwang, K. (1993). Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, Inc. Krol, E. (1992). The Whole Internet. O'Reilly Associates, Inc. Tanenbaum, A. S. (1981). Computer Networks. Prentice-Hall Inc., Englewood Clis, N. J. 07632. Tanenbaum, A. S. (1984). Structured Computer Organization. Prentice-Hall Inc., Englewood Clis, N. J. 07632. Tanenbaum, A. S. (1987). Operating Systems: Design and Implementation. Prentice-Hall Inc., Englewood Clis, N. J. 07632. Zimmerman, H. (1980). Osi reference model|the iso model of architecture for open systems interconnection. IEEE Transactions on Communication, COM{28, 425{432.
73
Index acknowledgement frame, 37 Algol, 10 Aloha, 32 ALU, see Arithmetic Logical Unit Arithmetic Logical Unit, 46 assembly language, 47 assembly programming, 10 asynchronous transfer mode (ATM), 32 ATM, 32, 43
critical section, 22 CSEP, 29 CWD, 20 data frame, 37 deadlock, 22 declarative programming, 10 device driver, 21, 26 dhrystone, 48 digital logic, 11{12 directory, 20 path, 20 root, 20 working, 20 dirs, 21 disk, 19, 46
BASIC, 10 benchmark, 48 BISDN, 43 bit, 43, 44 block special le, 20 boolean algebra, 12 boolean operations, 11 brk, 26 Broadband ISDN, 43 BSD, 14 vs. SysV, 14 bus, 43 byte, 43, 44
EEPROM, 44 Eiel, 10 Eiel Tower, 40 ENIAC, 40 EPROM, 44 Esau, 45 ethernet, 32, 43
C, 10 C++, 10 cache, 45{46 cat, 21 cd, 21 CD-ROM, 46 Central Processing Unit, see processor character special le, 20 child process, 16 chip, 41 circuit switching, 29 CISC, 48 clock, 41, 47 COBOL, 10 communication, 22{24 compilation vs. interpretation, 9 context switch, 16 CPU, see processor Cray, 41
FDC, 27 le, 20 block special, 20 character special, 20 normal, 20 le management, 19{21 FLOPS, 48
ushing, 18 fork, 26 FORTRAN, 40 Fortran, 10 gates, 11, 41 gateway, 32 hexadecimal number system, 45 high-level language, 10 home directory, 20 Hudsucker Proxy, The, 41 74
INDEX I/O management, 20 IC, see Integrated Circuit IMP, 32, 36 imperative programming, 10 instruction register, 45 instruction set, 47 Integrated Circuit, 41 inter-process communication, 22 Internet, 33 Internet Protocol, 38 interpretation vs. compilation, 9 interrupt, 21, 26 interrupt handler, 26, 27 IP, see Internet Protocol IP-address, 33 IPC, 22 IR, 45 ISDN, 43 ISO, 34 ISO/OSI, 33{34 Jacob, 45 kernel call, 26 kernel mode, 24 kill, 16 LAN, see local area networks local area networks, 32 long haul networks, see wide area networks low-level language, 10 LSI, 41 machine language, 10{11 mainframe computer, 41 memory, 44{46 cache, 45{46 EEPROM, 44 EPROM, 44 main, 19, 44 PROM, 44 RAM, 44, 46 register, 44{46 ROM, 44 secondary storage, 46 memory management, 17{19 memory-mapped I/O, 52 MFLOPS, 48 microcomputer, 41 microprogram, 48 microprogramming, 11 mini-computer, 41
Computer Systemen en Applicaties
75
MINIX, 25 MIPS, 48 mnemonic, 47 modem, 29 Mosaic, 35 MSI, 41 MULTICS, 13 multiprogramming, 13 network, 30 nibble, 43 nice, 18 normal le, 20 object-oriented programming, 10 octal number system, 45 operating system, 10 communication, 22{24 le management, 19{21 memory management, 17{19 process management, 15{17 operating systems, 13{27 OSI, 12, 34 overlay, 19 packet switching, 29, 33 paging, 19, 44 parent process, 16 Pascal, 10 PC, 45 PDP, 41 pop, 45 preemptive scheduling, 16 process child, 16 de nition, 15 kernel mode, 24 nice value, 18 number, 16 parent, 16 priority, 17 runnable, 17 running, 17 sleeping, 17 state, 16 table, 16 user mode, 24 zombie, 18 process management, 15{17 process switch, 16 processor, 46{52 CISC, 48 September 1994
76 instruction set, 47 RISC, 50 superscalar, 51 symbolic, 51 vectorised, 51 VLIW, 52 program counter, 45 PROM, 44 ps, 16, 18 push, 45 pwd, 21 race condition, 22 RAM, 19, 44, see Random Access Memory Random Access Memory, 44, 46 Read-Only Memory, 44 real-time, 16 register, 44{46 relative path, 20 relay, 39 remote procedure call, 24 RISC, 50 ROM, see Read-Only Memory router, 32 RPC, 24 RSA, 30 run to completion, 16 runnable process, 17 running process, 17 scheduler, 16, 17 signal SIGSTOP, 18 Simula, 10 sleeping process, 17 slot time, 36 Smalltalk, 10 SP, 45 stack, 45 stack pointer, 45 stdin, 21 stdout, 21 Sting, The, 41 stub procedure, 24 superscalar processor, 51 SVR4, 14 swapping, 19 symbolic processor, 51 system call, 16, 26 SysV, 14 vs. BSD, 14
September 1994
INDEX Tandy Radio Shack, 41 TCP, see Transmission Control Protocol TCP/IP, 43 time slicing, 15 time-sharing system, 13, 41, 60 top, 16, 18 traceroute, 33 Transmission Control Protocol, 38 trap, 26 TRS{80, 41, 48 truth table, 12 TSS, see time-sharing system uname,
14 UNICS, 13
Unix
history, 13{14 user mode, 24
vacuum tube, 39 vector processor, 51 vi, 15 virtual memory, 19 virtual tourist, 35 visual programming, 10 VLIW, 52 VLSI, 41 WAN, see wide area networks whetstone, 48 which, 18 wide area networks, 32 word, 43 world wide web, 29, 35 WWW, see world wide web
Computer Systemen en Applicaties