Making reliable
distributed systems
in the presence of
sodware errors
Final version (with corrections) last update 20 November 2003
Joe Armstrong
A Dissertation submitted to
the Roy al Institute of Technology
in partial fulfilment of the requirements for
the degree of Doctor of Technology
The Royal Institute of Technology
Stockholm, Sweden
December 2003
Department of Microelectronics and Information Technology
ii
TRITA–IMIT–LECS AVH 03:09
ISSN 1651–4076
ISRN KTH/IMIT/LECS/AVH-03/09–SE
and
SICS Dissertation Series 34
ISSN 1101–1335
ISRN SICS–D–34–SE
c
Joe Armstrong, 2003
Printed by Universitetsservice US-AB 2003
iii
To Helen, Thomas and Claire
iv
Abstract
T
he work described in this thesis is the result of a research program
started in 1981 to find better ways of programming Telecom applica-
tions. These applications are large programs which despite careful
testing will probably contain many errors when the program is put into
service. We assume that such programs do contain errors, and investigate
methods for building reliable systems despite such errors.
The research has resulted in the development of a new programming
language (called Erlang), together with a design methodology, and set of
libraries for building robust systems (called OTP). At the time of writing
the technology described here is used in a number of major Ericsson, and
Nortel products. A number of small companies have also been formed
which exploit the technology.
The central problem addressed by this thesis is the problem of con-
structing reliable systems from programs which may themselves contain
errors. Constructing such systems imposes a number of requirements on
any programming language that is to be used for the construction. I discuss
these language requirements, and show how they are satisfied by Erlang.
Problems can be solved in a programming language, or in the stan-
dard libraries which accompany the language. I argue how certain of the
requirements necessary to build a fault-tolerant system are solved in the
language, and others are solved in the standard libraries. Toget her these
for m a basis for building fault-tolerant sodware systems.
No theory is complete without proof that the ideas work in practice. To
demonstrate that these ideas work in practice I present a number of case
studies of large commercially successful products which use this technol-
ogy. At the time of writing the largest of these projects is a major Ericsson
v
vi ABSTRACT
product, having over a million lines of Erlang code. This product (the
AXD301) is thought to be one of the most reliable products ever made by
Ericsson.
Finally, I ask if the goal of finding better ways to program Telecom
applications was fulfilled—I also point to areas where I think the sy stem
could be improved.
Contents
Abstract v
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Ericsson background . . . . . . . . . . . . . . . . . . . . . 2
Chronology . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter by chapter summary . . . . . . . . . . . . . . . . 7
2 The Architectural Model 11
2.1 Definition of an architecture . . . . . . . . . . . . . . . . . 12
2.2 Problem domain . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Concurrency oriented programming . . . . . . . . . . . . 19
2.4.1 Programming by observing the real world . . . . . 21
2.4.2 Characteristics of a COPL . . . . . . . . . . . . . . 22
2.4.3 Process isolation . . . . . . . . . . . . . . . . . . . 22
2.4.4 Names of processes . . . . . . . . . . . . . . . . . 24
2.4.5 Message passing . . . . . . . . . . . . . . . . . . . 25
2.4.6 Protocols . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.7 COP and programmer teams . . . . . . . . . . . . 26
2.5 System requirements . . . . . . . . . . . . . . . . . . . . . 27
2.6 Language requirements . . . . . . . . . . . . . . . . . . . . 28
2.7 Library requirements . . . . . . . . . . . . . . . . . . . . . 29
2.8 Application libraries . . . . . . . . . . . . . . . . . . . . . 30
2.9 Construction guidelines . . . . . . . . . . . . . . . . . . . 31
2.10 Related work . . . . . . . . . . . . . . . . . . . . . . . . . 32
vii
viii ABSTRACT
3 Erlang 39
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Sequential Erlang . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.1 Data structures . . . . . . . . . . . . . . . . . . . . 44
3.3.2 Variables . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.3 Terms and patterns . . . . . . . . . . . . . . . . . 47
3.3.4 Guards . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.5 Extended pattern matching . . . . . . . . . . . . . 49
3.3.6 Functions . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.7 Function bodies . . . . . . . . . . . . . . . . . . . 52
3.3.8 Tail recursion . . . . . . . . . . . . . . . . . . . . 52
3.3.9 Special forms . . . . . . . . . . . . . . . . . . . . . 54
3.3.10 case . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.11 if . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.12 Higher order functions . . . . . . . . . . . . . . . . 55
3.3.13 List comprehensions . . . . . . . . . . . . . . . . . 57
3.3.14 Binaries . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.15 The bit syntax . . . . . . . . . . . . . . . . . . . . 60
3.3.16 Records . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.17 epp . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3.18 Macros . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3.19 Include files . . . . . . . . . . . . . . . . . . . . . 66
3.4 Concurrent programming . . . . . . . . . . . . . . . . . . 66
3.4.1 regist er . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5 Error handling . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.1 Exceptions . . . . . . . . . . . . . . . . . . . . . . 69
3.5.2 catch . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.5.3 exit . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5.4 throw . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.5.5 Corrected and uncorrected errors . . . . . . . . . 72
3.5.6 Process links and monitors . . . . . . . . . . . . . 73
3.6 Distributed programming . . . . . . . . . . . . . . . . . . 76
3.7 Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.8 Dynamic code change . . . . . . . . . . . . . . . . . . . . 78
ix
3.9 A type notation . . . . . . . . . . . . . . . . . . . . . . . . 80
3.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4 Programming Techniques 85
4.1 Abstracting out concurrency . . . . . . . . . . . . . . . . . 86
4.1.1 A fault-tolerant client-server . . . . . . . . . . . . . 92
4.2 Maintaining the Erlang view of the world . . . . . . . . . . 101
4.3 Error handling philosophy . . . . . . . . . . . . . . . . . . 104
4.3.1 Let some other process fix the error . . . . . . . . 104
4.3.2 Workers and supervisors . . . . . . . . . . . . . . 106
4.4 Let it crash . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.5 Intentional programming . . . . . . . . . . . . . . . . . . . 109
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5 Programming Fault-tolerant Systems 115
5.1 Programming fault-tolerance . . . . . . . . . . . . . . . . . 116
5.2 Supervision hierarchies . . . . . . . . . . . . . . . . . . . . 118
5.2.1 Diagrammatic representation . . . . . . . . . . . . 120
5.2.2 Linear supervision . . . . . . . . . . . . . . . . . . 121
5.2.3 And/or supervision hierarchies . . . . . . . . . . . 122
5.3 What is an error? . . . . . . . . . . . . . . . . . . . . . . . 123
5.3.1 Well-behaved functions . . . . . . . . . . . . . . . 126
6 Building an Application 129
6.1 Behaviours . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.1.1 How behaviours are written . . . . . . . . . . . . . 131
6.2 Generic server principles . . . . . . . . . . . . . . . . . . . 132
6.2.1 The generic server API . . . . . . . . . . . . . . . 132
6.2.2 Generic server example . . . . . . . . . . . . . . . 135
6.3 Event manager principles . . . . . . . . . . . . . . . . . . 137
6.3.1 The event manager API . . . . . . . . . . . . . . . 139
6.3.2 Event manager example . . . . . . . . . . . . . . . 141
6.4 Finite state machine principles . . . . . . . . . . . . . . . . 141
6.4.1 Finite state machine API . . . . . . . . . . . . . . 143
6.4.2 Finite state machine example . . . . . . . . . . . . 144
x ABSTRACT
6.5 Supervisor principles . . . . . . . . . . . . . . . . . . . . . 146
6.5.1 Supervisor API . . . . . . . . . . . . . . . . . . . . 146
6.5.2 Supervisor example . . . . . . . . . . . . . . . . . 147
6.6 Application principles . . . . . . . . . . . . . . . . . . . . 153
6.6.1 Applications API . . . . . . . . . . . . . . . . . . . 153
6.6.2 Application example . . . . . . . . . . . . . . . . . 154
6.7 Systems and releases . . . . . . . . . . . . . . . . . . . . . 156
6.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7 OTP 161
7.1 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8 Case Studies 167
8.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.2 AXD301 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
8.3 Quantitative properties of the sodware . . . . . . . . . . . 171
8.3.1 System Structure . . . . . . . . . . . . . . . . . . . 174
8.3.2 Evidence for fault recovery . . . . . . . . . . . . . 177
8.3.3 Trouble report HD90439 . . . . . . . . . . . . . . 177
8.3.4 Trouble report HD29758 . . . . . . . . . . . . . . 180
8.3.5 Deficiencies in OTP structure . . . . . . . . . . . . 181
8.4 Smaller products . . . . . . . . . . . . . . . . . . . . . . . 185
8.4.1 Bluetail Mail Robustifier . . . . . . . . . . . . . . . 185
8.4.2 Alteon SSL accelerator . . . . . . . . . . . . . . . 188
8.4.3 Quantitative properties of the code . . . . . . . . . 189
8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9 APIs and Protocols 193
9.1 Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.2 APIs or protocols? . . . . . . . . . . . . . . . . . . . . . . 197
9.3 Communicating components . . . . . . . . . . . . . . . . . 198
9.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10 Conclusions 201
10.1 What has been achieved so far? . . . . . . . . . . . . . . . 201
xi
10.2 Ideas for future work . . . . . . . . . . . . . . . . . . . . . 202
10.2.1 Conceptual integrity . . . . . . . . . . . . . . . . . 202
10.2.2 Files and bang bang . . . . . . . . . . . . . . . . . 203
10.2.3 Distribution and bang bang . . . . . . . . . . . . . 204
10.2.4 Spawning and bang bang . . . . . . . . . . . . . . 205
10.2.5 Naming of processes . . . . . . . . . . . . . . . . . 205
10.2.6 Programming with bang bang . . . . . . . . . . . . 206
10.3 Exposing the interface - discussion . . . . . . . . . . . . . 207
10.4 Programming communicating components . . . . . . . . . 208
A Acknowledgments 211
B Programming Rules and Conventions 215
C UBF 247
D Colophon 275
References 277
xii ABSTRACT
1 Introduction
H
ow can we program systems which behave in a reasonable manner
in the presence of sodware errors? This is the central question
that I hope to answer in this thesis. Large systems will probably
always be delivered containing a number of errors in the sodware,
nevertheless such systems are expected to behave in a reasonable manner.
To make a reliable system from faulty components places certain re-
quirements on the sy stem. The requirements can be satisfied, either in
the programming language which is used to solve the problem, or in the
standard libraries which are called by the application programs to solve
the problem.
In this thesis I identify the essential charact eristics which I believe are
necessary to build fault-tolerant sodware systems. I also show how these
characteristics are satisfied in our sys tem.
Some of the essential characteristics are satisfied in our programming
language (Erlang), others are satisfied in library modules written in Erlang.
Toget her the language and libraries form a basis for building reliable sod-
ware systems which function in an adequate manner even in the presence
of programming errors.
Having said what my thesis is about, I should also say what it is not
about. The thesis does not cover in detail many of the algorithms used
as building blocks for construction fault-tolerant systems—it is not the al-
gorithms themselves which are the concern of this thesis, but rather the
programming language in which such algorithms are expressed. I am also
1
2 CHAPTER 1. INTRODUCTION
not concerned with hardware aspects of building fault-tolerant systems, nor
with the sodware engineering aspects of fault-tolerance.
The concern is with the language, libraries and operating system re-
quirements for sodware fault-toler ance. Erlang belongs to the family of
pure message passing languages—it is a concurrent process-based language
having strong isolation between concurrent processes. Our programming
model makes extensive use of fail-fast processes. Such techniques are com-
mon in hardware platforms for building fault-tolerant systems but are not
commonly used in sodware solutions. This is mainly because conven-
tional languages do not permit dicerent sodware modules to co-exist in
such a way that there is no interference between modules. The commonly
used threads model of programming, where resources are shared, makes
it extremely diecult to isolate components from each other—errors in one
component can propagat e to another component and damage the internal
consistency of the system.
1.1 Background
The work report ed in this thesis started at the Ericsson Computer Science
Laborator (CSLab) in 1981. My personal involvement started in 1985 when
I joined the CSLab. The work reported here was performed in the period
1981-2003. During this time the Erlang programming language and OTP
was developed by the author and his colleagues and a number of large
applications were written in Erlang.
The system as we know it today is the result of the collective ecort of
a large number of people. Without their talents and the feedback from
our users Erlang would not be what it is today. For many parts of the
system it is diecult to say with precision exactly who did what and when,
and exactly who had the original ideas for a particular innovation. In
the acknowledgements I have tried to credit everybody as accur ately as
possible.
The chronology of this work is as follows:
1981 The Ericsson CSLab was formed. One goal of this labora-
tory was “to suggest new architectures, concepts and structures for future
1.1. BACKGROUND 3
processing systems developments [29].
1986 I start work on the language that w as to become Erlang,
though at the time the language had no name. The language started
as an experiment to add concurrent processes t o Prolog—this work
is described in [10]. At this stage I did not intend to design a new
programming language, I was interested in how to program POTS
(Plain Old Telephony Service)—at the time the best method for pro-
gramming POTS appeared to be a variant of Prolog augmented with
parallel processes.
1987 First mention of Erlang. By 1987 the term Erlang had been
coined (probably by the head of the CSLab Bjarne D
¨
acker). By the
end of the year a Prolog implementation of Erlang w as available.
This version of Erlang was embedded in Prolog using Prolog infix
operators and did not yet have its own syntax.
Towards the end of 1987 the firs t major experiment with Erlang
started—a group of Ericsson engineers at Bollmora, led by Kerstin
¨
Odling, started work on a prototyping project. They chose Erlang to
prototype something called ACS/Dunder.” ACS was an architecture
which was designed for implementing the Ericsson MD110 private
automatic branch exchange (PABX).
The project was to implement a number of typical PABX features in
Erlang using the ACS architecture and compare the programming
ecor t with the time it was estimated that the same job would have
taken in PLEX.
1
Many of the ideas found in the current Erlang/OTP system can be
traced back to this project.
1988 Some time during 1988 it became clear that Erlang was
suitable for programming Telecoms systems—so while the Bollmora
group wrot e applications in Erlang the CSLab group now augmented
1
PLEX was the programming language used to program the MD110.
4 CHAPTER 1. INTRODUCTION
by Robert Virding and Mike Williams worked on improving the Er-
lang system.
An attempt was made to improve the eeciency of Erlang by cross
compilation to the parallel logic programming language Strand. The
compilation of Erlang to Strand is described in chapter 13 of [37].
Cross compilation to Strand improved the performance of Erlang by
a factor of six and the project was viewed as a dismal failure.
2
1989 The ACS/Dunder project began to produce results. The
Bollmora group showed that using ACS/Dunder with Erlang lead
to an improvement in design eeciency of a factor of somewhere
between 9 and 22 times less than the corresponding ecort in PLEX.
This was result was based on the experience of prototyping some
10% of the functionality of the Ericsson MD 110—these figures were
hotly debated (depending on whether you believed in Erlang or not).
The Bollmora group estimated that they would need a factor seventy
in performance improvement (which we had rashly promised) in
order to turn the ACS/Dunder prototype into a commercial product.
To improve the performance of Erlang I designed the JAM machine
(Joe’s abstr act machine). The design of the JAM machine was loosely
based on the Warren abstract machine [68]. Since Erlang started as
an extension to Prolog it seemed reasonable that the techniques used
to eeciently implement Prolog could also be applicable to Erlang.
This intuition proved correct. The JAM machine was similar t o the
WAM with the addition of parallel processes, message passing and
failure detection and with the omission of backtracking. Compilation
of pattern matching was essentially the same as in the WAM. The
original JAM instruction set and details of the compilation process
were published in [9].
The design of the JAM was completed in 1989. The first imple-
mentation was an instruction set emulator written in Prolog which
emulated the JAM machine. This was very ineecient and could
2
A fact not recorded in the Strand book!.
1.1. BACKGROUND 5
evaluate approximately 4 reductions per second, but it was suecient
to evaluate and test the virtual machine and to allow the Erlang
compiler to be written in Erlang itself.
Once the design of the JAM w as complete I s tarted a C implemen-
tation of the virtual machine—which was soon abandoned ader Mike
Williams read some of my C code—ader which I worked on the com-
piler. Mike Williams wrote the virtual machine emulator and Robert
Virding worked on the Erlang libraries.
1990 By 1990 the JAM machine was working well and had sur-
passed the original goal of being seventy times faster than the origi-
nal Prolog interpreter. Erlang now had its own syntax (up to now it
could be regarded as a dialect of Prolog) and could be regarded as
a language in its own right, rather than as a dialect of Prolog.
1991 Claes Wikstr
¨
om added distribution t o Erlang. The JAM ma-
chine was now stable and had replaced the Prolog implementation
of Erlang.
1992 A decision was made at Ericsson Business Systems (EBC)
to develop a product based on ACS/Dunder. This product was
called the Mobility Server—Ericsson presented Erlang developments
[1, 33], at The XIV International Switching Symposium in Yoko-
hama, Japan.
1993 Ericsson starts a wholly owned subsidiary company called
Erlang Systems AB. The purpose of this company was to market
and sell Erlang to external customers and to provide training and
consulting services to both internal and external customers. Support
for Erlang itself was performed by the Ericsson Computer Science
Laboratory. The fir st commercial version of Erlang was released.
1995 The Ericsson AXE-N project collapsed [55]. The AXE-N
project was a project to build a “next generation switch” to replace
the Ericsson AXE-10. This extremely large project ran from 1987-95.
6 CHAPTER 1. INTRODUCTION
Ader the AXE-N project collapsed a decision w as made to “restart”
the project using Erlang. This project eventually resulted in the
development of the AXD301 switch.
This project was on a much larger scale than any previous Erlang
project. For this reason a new group was started to provide support
to the AXD project. The Erlang libraries were renamed OTP (The
Open Telecom Platform) and a new group was created.
1996 In order to provide Erlang user s with a stable sodware base
a project called OTP (The Open Telecom Platform) w as started.
OTP was to be used primarily in the newly started AXD project; all
existing projects were to migrate to OTP. The OTP project consol-
idated a number of ideas derived from experience with Erlang and
in particular from a earlier set of libraries developed for use in the
Mobility Server.
1997 The OTP project turned into the OTP product unit which
was started in order to take over formal responsibility for Erlang.
Prior t o that, the CSLab had formally been responsible for Erlang.
I moved from the CSLab to the OTP group where I worked as
the chief technical co-ordinator. During the period 1996–1997 a
three-person group (myself, Magnus Fr
¨
oberg and Martin Bj
¨
orklund)
redesigned and implemented the OTP core libraries.
1998 Ericsson delivered the first AXD301. The AXD301 is the
subject of one of our case studies in Chapter 8. At the time of
writing (2003) the AXD301 has over 1.7 million lines of Erlang code
which probably makes it the largest system ever to be written in a
functional style of programming.
In February 1998 Erlang was banned for new product develop-
ment within Ericsson—the main reason for the ban was that Erics-
son wanted to be a consumer of sodware technologies rather than a
producer.
In December 1998 Erlang and the OTP libraries were released sub-
ject to an Open Source License. Since that date it has been freely
1.2. THESIS OUTLINE 7
available for download from http://www.erlang.org/
In 1998 I led Ericsson together with a number of the original Erlang
group to found a new company Bluetail AB—in all 15 people led
Ericsson. The idea behind Bluetail was to use the Erlang technology
to program products which make Internet services more reliable.
1999 Bluetail produced two products written in Erlang. The Mail
Robustifier [11] and the Web Prioritizer. Ericsson produced a number
of Erlang products (including the AXD301 and GPRS systems).
2000 Bluetail is acquired by Alteon Web Systems [3] and subse-
quently Alteon is acquired by Nortel Networks.
> 2001 The Erlang/OTP technology is well established. By now
there are so many projects that nobody knows the exact number.
Erlang products developed by Nortel are selling for “Hundreds of
millions of kronor per year” [51]—The Ericsson AXD301 is one of
Ericsson’s most successful new products and there are a dozen or so
small companies using Erlang for product development.
1.2 Thesis outline
This thesis is organized into the following chapters:
Chapter 1 introduces the main problem area that the thesis ad-
dresses, gives a background to the work in the thesis and a chronol-
ogy of the work performed in the thesis together with a detailed
chapter plan.
Chapter 2 introduces an architectural model that is the basis for the
later chapters in the thesis. I define what is meant by an architecture,
and specify which components must be present in an architecture.
I talk about the problem domain that my architecture is designed
for. I talk about t he underlying philosophy behind the architecture
and I introduce the idea of “Concurrency Oriented Programming”
(COP).
8 CHAPTER 1. INTRODUCTION
I develop the idea of COP and state the desirable properties that a
programming language and system mus t have in order to support a
concurrency oriented style of programming.
I review some previous related work, showing the similarities and
dicerences between this prior work and the material presented in
this thesis.
Chapter 3 describes the programming language Erlang. I describe
a reasonably large sub-set of the Erlang programming language, and
motivate some of the design decisions made in Erlang.
Chapter 4 gives some examples of Erlang programming techniques.
I show how to “f actor a design into its functional and non-functional
components. I show how to fact or out notions of concurrency and
fault-tolerance, and how to program a generic client–server model.
I describe a technique for maintaining the illusion that “everything
is an Erlang process,” and give examples of how to write code which
handles errors.
Chapter 5 gets to the central question of the thesis. It is concerned
with how to program sys tems which behave in a reasonable manner
even in the presence of errors. Central to the idea of fault tolerance
is the notion of an error—I describe what is meant by an error and
what I mean when I talk about a “fault-tolerant” system. I describe a
strategy based on the idea of “Supervision trees” which can be used
for writing fault-tolerant sodware.
Chapter 6 link s the general principles of programming a fault-tolerant
system, developed in the previous chapter, to a number of specific
programming patterns developed for programming fault-tolerant sys-
tems. These programming patterns are central to the understanding
of the O TP system, and of how to build fault-tolerant sodware in
Erlang.
I give a complete example, involving the use of a client–server
model, an event-handler and a finite-state machine. These three
1.2. THESIS OUTLINE 9
components are added to a supervision tree, which will monitor
their progress and restart them in the event of an error.
The entire program, is packaged into a single OTP “application.”
Chapter 7 describes the OTP syst em. OTP stands for “Open Tele-
coms Platform” and is an application operating system (AOS) for
programming fault-tolerant applications together with the delivery
platfor m for the Erlang programming language. It includes a large
set of libraries for implementing fault-tolerant systems, together with
documentation and guides etc for understanding the system.
In this chapter I briefly describe the OTP architecture and give
details of the main components in the system.
Chapter 8 is the acid-test of our technology. Did the ideas work in
practice? In this chapter I analyse a number of large commercially
successful products that make use of OTP. The intention of this
chapter is to see if we have achieved our goals of programming a
system which functions reliably in the presence of sodware errors.
One of the projects studied in this chapter is the Ericsson AXD301,
a high-performance highly-reliable ATM switch. This project is in-
teres ting in its own right, since it is one of the largest programs ever
written in a functional style.
Chapter 9 is concerned with APIs and protocols. I ask how we can
specify the interfaces to modules or the interfaces between commu-
nicating components.
In Chapter 10 I ask broader questions. Did the ideas work? Did
they work well or badly? Where can things be improved? What can
we look for in the future and how are we going to get there?
10 CHAPTER 1. INTRODUCTION
2
The Architectural
Model
There is no standard, universally-accepted definition of the term, for software
architecture is a field in its infancy, ... While there is no standard definition,
there is also no shortage of them ...
The Carnegie Mellon Institute of Software Engineers
T
his chapt er presents an architecture for building fault-tolerant sys-
tems. While everybody has a vague idea of what the word architec-
ture means, there are few widely accepted definitions, which leads
to many misunderstandings. The following definition captures the general
idea of what is meant by a sodware architecture:
An architecture is the set of significant decisions about the organi-
zation of a sodware system, the selection of the structural elements
and their interfaces by which the system is composed, together with
their behaviour as specified in the collaborations among those ele-
ments, the composition of these structural and behavioural elements
into progressively larger subsystems, and the architectural style that
guides this organization—these elements and their interfaces, their
collaborations, and their composition.
Booch, Rumbaugh, and Jacobson [19]
11
12 CHAPTER 2. THE ARCHITECTURAL MODEL
2.1 Definition of an architecture
At the highest level of abstraction an archit ecture is “a way of thinking
about the world. To be useful, however, we have to turn our way of
thinking about the world into a practical rulebook, and a set of procedures,
that tells us how to construct a particular system using our particular way
of looking at the world.
Our sodware architecture is charact erised by descriptions of the follow-
ing things:
1. A problem domain What type of problem is the architecture de-
signed to solve? Sodware architectures are not general purpose but
are designed for solving specific problems. No description of an ar-
chitecture is complete without describing the type of problem that is
supposed to be solved using the architecture.
2. A philosophy What is the reasoning behind the method of sodware
construction? What are the central ideas in the architecture?
3. A set of construction guidelines How do we program a system?
We need an explicit set of construction guidelines. Our systems
will be written and maintained by teams of programmers—it is im-
portant that all the programmers, and sys tem designers, understand
the system architecture, and its underlying philosophy. For practical
reasons this knowledge is conveniently maintained in the form of
construction guidelines. The full set of guidelines, includes sets of
programming rules, and examples, course material etc.
4. A set of pre-defined components Design by “choosing from a set
of pre-defined components” is far easier than “design from scratch.”
The Erlang OTP libraries contain a complete set of pre-defined com-
ponents (called behaviours) with which commonly used system com-
ponents can be built. Some examples of these are the gen_server
behaviour which can be used to build client-server systems, or the
gen_event behaviour which can be used to build event-based pro-
2.2. PROBLEM DOMAIN 13
grams. The pre-defined components will be discussed more fully in
section 6.1.
Section 6.2.2 gives a simple example of how to program a server
using the gen_sever behaviour.
5. A way of describing things How can we describe the interfaces
to a component? How can we describe a communication protocol
between two components in our system? How can we describe
the static and dynamic structure of our system? To answer these
questions we will introduce a number of dicerent, and specialised
notations. Some for describing program APIs, and other notations
for describing protocols, and sys tem structure.
6. A w ay of configuring things How can we start, stop, and config-
ure, the system. How can we re-configure the system while it is in
operation?
2.2 Problem domain
Our system was originally designed for building telecoms switching sys-
tems. Telecoms switching systems have demanding requirements in terms
of reliability, fault-tolerance etc. Telecoms systems are expected to operate
“forever,” they should exhibit sod real-time behaviour, and they should be-
have reasonably in the presence of sodware and hardware errors. D
¨
acker
[30], gave ten requirements for the properties of a telecoms system:
1. The system must be able to handle very large numbers of concurrent
activities.
2. Actions must be performed at a certain point in time or within a
certain time.
3. Systems may be dis tributed over several computers.
4. The system is used to control hardw are.
14 CHAPTER 2. THE ARCHITECTURAL MODEL
5. The sodware systems are very large.
6. The system exhibits complex functionality such as, feature interac-
tion.
7. The systems should be in continuous operation for many years.
8. Sodware maintenance (reconfiguration, etc) should be performed
without stopping the system.
9. There are s tringent quality, and reliability requirements.
10. Fault tolerance both to hardware failures, and sodware errors, must
be provided.
We can motivate these requirements as follows:
Concurrency switching systems are inherently concurrent since in
a typical switch many tens of thousands of people may simultane-
ously interact with the switch. This implies that the system should
be able t o eeciently handle many tens of thousands of co ncurrent
activities.
Sod real-time in a telecommunications system many operations
have to be performed within a specific time. Some of these timed
operations are strictly enforced, in the sense that if a given operation
does not succeed within a given time interval then the entire oper-
ation will be aborted. Other operations are merely monitored with
some form of timer, the operation being repeated if the timer event
triggers before the operation has completed.
Programming such systems requires manipulating many tens of thou-
sands of timers in an eecient manner.
Distributed switching systems are inherently distributed, so our
system should structured in such a way that it is easy to go from a
single-node system to a multi-node distributed system.
2.2. PROBLEM DOMAIN 15
Hardware interaction Switching systems have large amounts of
peripheral hardware which must be controlled and monitored. This
implies that it should be possible to write eecient device drivers,
and that context switching between dicerent device driver s should
be eecient.
Large sodware systems switching systems are large, for example,
the Ericsson AXE10, and the AT&T 5ESS switch, have several mil-
lion lines of program code [71]. This means that our sodware systems
must work with millions of lines of source code.
Complex functionality switching systems have complex functional-
ity. Market pressure encourages the development, and deployment
of syst ems with large numbers of complex features. Oden systems
are deployed before the interaction between such features is well
understood. During the lifetime of a system the feature set will prob-
ably be changed and extended in many ways. Feature, and sodware
upgrade must be performed “in place,” that is, without stopping the
system.
Continuous operation telecommunications systems are designed
for many years of continuous operation. This implies that opera-
tions like sodware, and hardware maintenance, must be performed
without stopping the system.
Quality requirements switching systems should run with an ac-
ceptable level of service even in t he presence of errors. Telephone
exchanges are expected t o be extremely reliable.
1
Fault tolerance switching systems should be “fault tolerant.” This
means that from the outset we know that faults will occur, and that
we must design a sodware and hardware infrastructure that can deal
with these faults, and provide an acceptable level of service even in
the presence of faults.
1
Typically having less than two hours of down-time in 40 years [48].
16 CHAPTER 2. THE ARCHITECTURAL MODEL
While these requirements came originally from the telecoms world they
are by no means exclusive to that particular problem domain. Many mod-
ern Internet services (for example, web servers) would have a strikingly
similar list of requirements.
2.3 Philosophy
How can we make a fault-tolerant sodware system which behaves in a
reasonable manner in the presence of sodware errors? Answering this will
take the rest of this thesis. I will start by giving a short answer which will
be refined in the remainder of the thesis.
To make a fault-tolerant sodware system which behaves reasonably in
the presence of sodware errors we proceed as follows:
1. We organise the sodware into a hierarchy of tasks that the system has
to perform. Each task corresponds to the achievement of a number
of goals. The sodware for a given task has to try and achieve the
goals associated with the task.
Tasks are ordered by complexity. The top level task is the most
complex, when all the goals in the t op level task can be achieved
then the system should function perfectly. Lower level tasks should
still allow the system to function in an acceptable manner, though it
may ocer a reduced level of service.
The goals of a lower level task should be easier to achieve than the
goals of a higher level task in the system.
2. We try to perform the top level task.
3. If an error is detected when trying to achieve a goal, we make an
attempt to correct the er ror. If we cannot correct the error we imme-
diately abort the current task and start performing a simpler task .
Programming a hierarchy of tasks needs a strong encapsulation met hod.
We need strong encapsulation for error isolation. We want t o stop pro-
2.3. PHILOSOPHY 17
gramming errors in one part of the system adversely acecting sodware in
other parts of the system.
We need to isolate all the code that runs in order to achieve a goal in
such a way that we can detect if any errors occurred when trying to achieve
a goal. Also, when we are trying to simultaneously achieve multiple goals
we do not want a sodware error occurring in one part of the system to
propagate to another part of the system.
The essential problem that must be solved in making a fault-tolerant
sodware system is therefore that of fault-isolation. Dicerent programmers
will write dicerent modules, some modules will be correct, others will have
errors. We do not want the errors in one module to adversely acect the
behaviour of a module which does not have any errors.
To provide fault-isolation we use the traditional operating system no-
tion of a process. Processes provide protection domains, so that an error in
one process cannot acect the operation of other processes. Dicerent pro-
grammers write dicerent applications which are run in dicerent processes;
errors in one application should not have a negative influence on t he other
applications running in the system.
This is, of course, only true to a first approximation. Since all processes
use the same CPU, and memory, processes which try to hog t he CPU or
which try to use excessive memory can negatively acect other processes in
the system. The extent to which processes can int erfere with each other
depends upon the design characteristics of the operating system.
In our system processes, and concurrency, are part of the programming
language and are not provided by the host operating system. This has a
number of advantages over using operating system processes:
Concurrent programs run identically on dicerent OSs—we are not
limited by how processes are implemented on any particular operat-
ing sys tem. The only observable dicerence when moving between
OS’s, and processors should be due to dicerent CPU speeds and
memory sizes etc. All issues of synchronization, and inter-process
interaction should be the same irrespective of the properties of the
host operating system.
18 CHAPTER 2. THE ARCHITECTURAL MODEL
Our language based processes are much lighter-weight than conven-
tional OS processes. Creating a new process in our language is a
highly eecient operation, some orders of magnitude faster than pro-
cess creation in most operating systems[12, 14], and orders of mag-
nitude faster than thread creation in most programming languages.
Our system has very little need of an operating system. We make
use of very few operating system services, thus it is relatively easy
to port our system to specialised environments such as embedded
systems.
Our applications are structured using large numbers of communicating
parallel processes. We take this approach because:
1. It provides an architectural infr astructure we can organize our sys-
tem as a set of communicating processes. By enumerating all the
processes in our system, and defining the message passing channels
between the processes we can conveniently partition the sys tem into
a number of well-defined sub-components which can be indepen-
dently implemented, and t ested. This is the methodology implied
by the top level of the SDL [45] system design methodology.
2. Potential eeciency a system which is designed to be implemented
as a number of independent concurrent processes can be imple-
mented on a multi-processor or run on a distributed network of
processors. Note that the eeciency is only potential, and works best
when the application can be partitioned into a number of truly in-
dependent tasks. If there are strong data dependencies between the
tasks this might not always be possible.
3. Fault isolation concurrent processes with no data sharing provide
a strong measure of fault isolation. A sodware error in a concurrent
process should not influence processing in the other processes in the
system.
2.4. CONCURRENCY ORIENTED PROGRAMMING 19
Of these three uses of concurrency, the first two are non-essential char-
acteristics and can be provided by some kind of in-built scheduler which
provides various forms of pseudo-parallel time sharing between processes.
The third characteristic is essential for programming fault-tolerant sys-
tems. Each independent activity should be performed in a completely
isolated process. Such processes should share no data, and only commu-
nicate by message passing. This is to limit the consequences of a sodware
error.
As soon as two processes share any common resource, for example,
memory or a pointer to memory, or a mutex etc the possibility exists that
a sodware error in one of the processes will corrupt t he shared resource.
Since eliminating all such sodware errors for large sodware systems is an
unsolved problem I think that the only realistic way to build large reliable
systems is by partitioning the system into independent parallel processes,
and by providing mechanisms for monitoring and restarting these pro-
cesses.
2.4 Concurrency oriented programming
In our system concurrency plays a central role, so much so that I have
coined the term Concurrency Oriented Programming to distinguish this style
of programming from other programming styles.
2
In Concurrency Oriented Programming the concurrent structure of
the program should follow the concurrent structure of the application. It
is particularly suited to programming applications which model or interact
with the real world.
Concurrency Oriented Programming also provides the two major ad-
vantages commonly associated with object-oriented programming. These
are polymorphism and the use of defined protocols having the same mes-
sage passing int erface between instances of dicerent process types.
When we partition a problem into a number of concurrent processes
we can arrange that all the processes respond t o the same messages (ie
2
Such as Object Oriented programming which models the world in terms of Objects, Functional Pro-
gramming which uses functions, or Logic Progr amming with uses relations.
20 CHAPTER 2. THE ARCHITECTURAL MODEL
they are polymorphic,) and that they all follow the same message passing
interface.
The word concurrency refers to sets of events which happen simulta-
neously. The real world is concurrent, and consists of a large number
of events many of which happen simultaneously. At an atomic level our
bodies are made up of atoms, and molecules, in simultaneous motion.
At a macroscopic level the universe is populated with galaxies of stars in
simultaneous motion.
When we perform a simple action, like driving a car along a freew ay,
we are aw are of the fact that there may be several hundreds of cars within
our immediate environment, yet we are able to perform the complex task
of driving a car, and avoiding all these potential hazards without even
thinking about it.
In the real world sequential activities are a rarity. As we walk down the
street we would be very surprised to find only one thing happening, we
expect to encounter many simultaneous events.
If we did not have the ability to analyze and predict the outcome of
many simultaneous events we would live in great danger, and tasks like
driving a car would be impossible. The fact that we can do things which
require processing massive amounts of parallel infor mation suggests that
we are equipped with perceptual mechanisms which allow us to intuitively
understand concurrency without consciously thinking about it.
When it comes t o computer programming things suddenly become
inverted. Programming a sequential chain of activities is viewed the norm
, and in some sense is thought of as being easy, whereas programming
collections of concurrent activities is avoided as much as possible, and is
generally perceived as being diecult.
I believe that this is due to the poor support which is provided for con-
currency in virtually all conventional programming languages. The vast
majority of programming languages are essentially sequential; any concur-
rency in the language is provided by the underlying operating system, and
not by the programming language.
In this thesis I present a view of the world where concurrency is pro-
vided by the programming language, and not by the underlying operating
system. Languages which have good support for concurrency I call Concur-
2.4. CONCURRENCY ORIENTED PROGRAMMING 21
rency Oriented Languages, or COPLs for short.
2.4.1 Programming by observing the real world
We oden want to write programs that model the world or int eract with the
world. Writing such a program in a COPL is easy. Firstly, we perform an
analysis which is a three-step process:
1. We identify all the truly concurrent activities in our real world activ -
ity.
2. We identify all message channels between the concurrent activities.
3. We write down all the messages which can flow on the dicerent
message channels.
Now we write the program. The structure of the program should
exactly follow the s tructure of the problem. Each real world concurrent
activity should be mapped onto exactly one concurrent process in our
programming language. If there is a 1:1 mapping of the problem onto the
program we say that the program is isomorphic to the problem.
It is extremely important that the mapping is exactly 1:1. The reason
for this is that it minimizes the conceptual gap between the problem and
the solution. If this mapping is not 1:1 the program will quickly degenerate,
and become diecult to understand. This degeneration is oden observed
when non-CO languages are used to solve concurrent problems. Oden
the only way to get the program to work is to force several independent
activities to be controlled by the same language thread or process. This
leads to a inevitable loss of clarity, and makes the programs subject to
complex and irreproducible interference errors.
In performing our analysis of the problem we must choose an appro-
priate granularity for our model. For example, if we were writing an instant
messaging system, we might choose to use one process per user and not
one process for every atom in the user’s body.
22 CHAPTER 2. THE ARCHITECTURAL MODEL
2.4.2 Characteristics of a COPL
COPLs are characterised by the following six properties:
1. COPLs must support processes. A process can be thought of as a
self-contained virtual machine.
2. Several processes operating on the same machine must be strongly
isolated. A fault in one processe should not adversely ecect another
process, unless such inter action is explicitly programmed.
3. Each process must be identified by a unique unforgeable identifier.
We will call this the Pid of the process.
4. There should be no shared state between processes. Processes inter-
act by sending messages. If you know the Pid of a process then you
can send a message to the process.
5. Message passing is assumed to be unreliable with no guarantee of
delivery.
6. It should be possible for one process to detect failure in another
process. We should also know the reason for failure.
Note that COPLs must provide true concurrency, thus objects repre-
sented as processes are truly concurrent, and messages between processes
are true asynchronous messages, unlike the disguised remote procedure
calls found in many object-oriented languages.
Note also that the reason for failure may not always be correct. For
example, in a distributed system, we might receive a message informing
us that a process has died, when in fact a network error has occurred.
2.4.3 Process isolation
The notion of isolation is central to understanding COP, and to the con-
struction of fault-tolerant sodware. Two processes operating on the same
2.4. CONCURRENCY ORIENTED PROGRAMMING 23
machine must be as independent as if they ran on physically separated
machines.
Indeed the ideal architecture t o run a CO program on would be a
machine which assigned one new physical processor per sodware process.
Until this ideal is reached we will have to live with the fact that multiple
processes will run on the same machine. We should still think of them as
if they ran on physically separated machine.
Isolation has several consequences:
1. Processes have “share nothing” semantics. This is obvious since they
are imagined t o run on physically separated machines.
2. Message passing is the only way to pass data between processes.
Again since nothing is shared this is the only means possible to
exchange data.
3. Isolation implies that message passing is asynchronous. If process
communication is synchronous then a sodware error in the receiver
of a message could indefinitely block the sender of the message
destroying the property of isolation.
4. Since nothing is shared, everything necessary to perform a dis-
tributed computation must be copied. Since nothing is shared, and
the only w ay to communicate between processes is by message pass-
ing, then we will never know if our messages arrive (remember we
said that message passing is inherently unreliable.) The only way to
know if a message has been correctly sent is to send a confirmation
message back.
Programming a sy stem of processes subject to the above rules may
appear at first sight to be diecult—ader all most concurrency extensions to
sequential programming languages provide facilities for almost exactly the
opposite providing things like locks, and semaphores, and provision for
shared data, and reliable message passing. Fortunately, the opposite turns
out to be true—programming such a system turns out to be surprisingly
24 CHAPTER 2. THE ARCHITECTURAL MODEL
easy, and the programs you write can be made scalable, and fault-tolerant,
with very little ecort.
Because all our processes are required to be complete isolated adding
more processes cannot acect the original system. The sodware must have
been written so as to handle collections of isolated processes, so adding a
few more processors is usually accomplished without any major changes
to the application sodware.
Since we made no assumptions about reliable message passing, and
must write our application so that it works in the presence of unreliable
message passing it should indeed work in the presence of message passing
errors. The initial ecort involved will reward us when we try to scale up
our systems.
2.4.4 Names of processes
We require that the names of processes are unforgeable. This means that it
should be impossible to guess the name of a process, and thereby inter act
with that process. We will assume that processes know their own names,
and that processes which create other processes know the names of the
processes which they have created. In other words, a parent process knows
the names of its children.
In order to write COPLs we will need mechanisms for finding out the
names of the processes involved. Remember, if we know the name of a
process, we can send a message to that process.
System security is intimately connected with the idea of knowing the
name of a process. If we do not know the name of a process we cannot
interact with it in any way, thus the system is secure. Once the names of
processes become widely know the system becomes less secure. We call
the process of revealing names to other processes in a controlled manner
the name distribution problem the key to security lies in the name distribu-
tion problem. When we reveal a Pid to another process we will say that
we have published the name of the process. If a name is never published
there are no security problems.
Thus knowing the name of a process is the key element of security.
Since names are unforgeable the sys tem is secure only if we can limit the
2.4. CONCURRENCY ORIENTED PROGRAMMING 25
knowledge of the names of the processes to trusted processes.
In many primitive religions it was believed that humans had powers
over spirits if they could command them by their real names. Knowing
the real name of a spirit gave you power over the spirit, and using this
name you could command the spirit to do various things for you. COPLs
use the same idea.
2.4.5 Message passing
Message passing obeys the following rules:
1. Message passing is assumed to be atomic which means that a mes-
sage is either delivered in its entirety or not at all.
2. Message passing between a pair of processes is assumed to be or-
dered meaning that if a sequence of messages is sent and received
between any pair of processes then the messages will be received in
the same order they were sent.
3. Messages should not contain pointer s to data structures contained
within processes—they should only contain constants and/or Pids.
Note that point two is a design decision, and does not reflect any under-
lying semantics in the network used to transmit messages. The underlying
network might reorder the messages, but between any pair of processes
these messages can be bucered, and re-assembled into the correct order
before delivery. This assumption makes programming message passing
applications much easier than if we had to always allow for out of order
messages.
We say that such message passing has send and pray semantics. We
send the message and pray that it arrives. Confirmation that a message has
arrived can be achieved by retur ning a confirmation message (sometimes
called round-trip confirmation.) Interestingly many programmers only be-
lieve in round-trip confirmation, and use it even if the underlying transport
layers are supposed to provide reliable data transport, and even if such
checks are supposedly irrelev ant.
26 CHAPTER 2. THE ARCHITECTURAL MODEL
Message passing is also used for synchronisation. Suppose we wish to
synchronise two processes A, and B. If A sends a message to B then B
can only receive this message at some point in time ader A has sent the
message. This is known as casual ordering in distributed systems theory.
In COPLs all interprocess synchronisation is based on this simple idea.
2.4.6 Protocols
Isolation of components, and message passing between components, is
architecturally suecient for protecting a system from the consequences of
a sodware error, but it is not suecient to specify the behaviour of a system,
nor, in the event of some kind of f ailure to determine which component
has failed.
Up to now we have assumed that failure is a property of a single
component, a single component will either do what it is supposed t o do or
fail as soon as possible. It might happen, however, that no components are
observed to fail, and yet the sy stem still does not work as expected.
To complete our programming model, we add therefore one more
thing. Not only do we need completely isolated components that com-
municate only by message passing, but also we need to specify the com-
munication protocols that are used between each pair of components that
communicate with each other.
By specifying the communication protocol that should be obeyed be-
tween two components we can easily find out if either of the components
involved has violated the protocol. Guaranteeing that the protocol is en-
forced should be done by static analysis, if possible, or failing this by
compiling run-time checks into the code.
2.4.7 COP and programmer teams
Building a large system involves the work of many programmers, some-
times many hundreds of programmer s are involved. To organise their
work these programmers are organised into smaller groups or teams. Each
group is responsible for one or more logical component in the system. On
a day-to-day basis, the groups communicate by message-passing (e-mail
2.5. SYS TEM REQUIREMENTS 27
or phone) but do not regularly meet. In some cases the groups work in
dicerent countries, and never meet. It is amusing to note that not only
is the organisation of a sodware system into isolated components which
communicate by pure message passing desirable for a number of reasons,
but also that it is the way that large programming groups are organised.
2.5 System requirements
To support a C O style of programming, and to make a system that can
satisfy the requirements of a telecoms sy stem we arr ive at a set of require-
ments for the essential characteristics of a system. These requirements are
for the system as a whole—here I am not interested in whether these re-
quirements are satisfied in a programming language or in the libraries, and
construction methods, which accompany the language.
There are six essential requirements on the underlying operating sys-
tem, and programming languages.
R1. Concurrency Our system must support concurrency. The
computational ecort needed to create or destroy a concurrent
process should be very small, and there should be no penalty
for creating large numbers of concurrent processes.
R2. Error encapsulation Errors occurring in one process must
not be able to damage other processes in the sy stem.
R3. Fault detection It must be possible to detect exceptions both
locally (in the processes where the exception occurred,) and
remotely (we should be able to detect that an exception has
occurred in a non-local process).
R4. Fault identification We should be able to identify why an
exception occurred.
R5. Code upgrade there should exist mechanisms to change
code as it is executing, and without stopping the system.
28 CHAPTER 2. THE ARCHITECTURAL MODEL
R6. Stable storage we need to store data in a manner which
survives a system crash.
It is also important that sys tems satisfying the above requirements are
eeciently implemented—concurrency is not much use if we cannot reliably
create many tens of thousands of processes. Fault identification is not much
use if it does not contain enough information to allow us to correct the
error at a later date.
Satisfying the above requirements can be done in a number of dif-
ferent ways. Concurrency, for example, can be provided as a language
primitive (as, for example, in Erlang), or in the operating system (for ex-
ample, Unix). Languages like C or Java which are not concurrent can
make use of operating system primitives which gives the user the illusion
that they are concurrent; indeed concurrent programs can be written in
languages which are not themselves concurrent.
2.6 Language requirements
The programming language which we use to program the system must
have:
Encapsulation primitives there must be a number of mechanisms
for limiting the consequences of an error. It should be possible to
isolate processes so that they cannot damage each other.
Concurrency the language must support a lightweight mechanism
to creat e parallel process, and to send messages between the pro-
cesses. Context switching between process, and message passing,
should be eecient. Concurrent processes mus t also time-share the
CPU in some reasonable manner, so that CPU bound processes do
not monopolise the CPU, and prevent progress of other processes
which are “ready to run.
Fault detection primitives which allow one process to observe an-
other process, and to detect if the observed process has terminated
for any reason.
2.7. LIBRARY REQUIREMENT S 29
Location transparency If we know the Pid of a process then we
should be able to send a message to the process.
Dynamic code upgrade It should be possible to dynamically
change code in a running system. Note that since many processes
will be r unning the same code, we need a mechanism to allow exist-
ing processes to run “old” code, and for “new” processes to run the
modified code at the same time.
Not only should the language satisfy these requirements, but it should
also satisfy them in a reasonably eecient manner. When we program we
do not want to limit our freedom of expression by “counting processes” etc,
nor do we want to worry about what will happen if a process inadvertently
tries to monopolise the CPU.
The maximum number of processes in the system should be sueciently
large that for programming purposes we do not have to consider this max-
imum number a limiting factor. We might need, for example, to create of
the order of one hundred thousand processes in order to make a switching
system which maintains ten thousand simultaneous user sessions.
3
This mix of features is needed to simplify applications programming.
Mapping the semantics of a distributed set of communicating components
onto an Erlang program is greatly simplified if we can map the concurrent
structure of the problem in a 1:1 manner onto the process structure of the
application program which solves the problem.
2.7 Library requirements
Language is not everything—a number of things are provided in the ac-
companying system libraries. The essential set of libraries routines must
provide:
Stable storage this is storage which survives a crash.
3
Assuming 10 processes per session.
30 CHAPTER 2. THE ARCHITECTURAL MODEL
Device drivers these must provide a mechanism for communica-
tion with the outside world.
Code upgrade this allows us to upgrade code in a running system.
Infrastructure for starting, and stopping the sys tem, logging errors
etc.
Observe that our library routines, which are mostly written in Erlang,
provide most of the services which are conventionally provided by an
operating sy stem.
Since Erlang process are isolated from each other, and communicate
only by message passing, they behave very much like operating sy stem
processes which communicate through pipes or sockets.
Many of the features which are conventionally provided by an oper at-
ing system have moved from the operating system into the programming
language. The remaining operating system only provides a primitive set of
device drivers.
2.8 Application libraries
Stable storage etc is not provided as a language primitive in Erlang, but
is provided in the basic Erlang libraries. Having such libraries is a pre-
condition for writing any complex application sodware. Complex applica-
tions need much higher-level abstr actions than storage etc. To build such
applications we need pre-packaged sodware entities to help us program
things like client-server solutions etc.
The OTP libraries provide us with a complete set of design patterns
(called behaviours) for building fault-tolerant applications. In this thesis I
will talk about a minimal set of behaviour s, which can be used for building
fault-tolerant applications. These are:
supervisor a supervision model.
gen_server a behaviour for implementing client-server applica-
tions.
2.9. CONSTRUCTION GUIDELINES 31
gen_event a behaviour used for implementing event handling
sodware.
gen_fsm a behaviour used for implementing finite state machines.
Of these, the central component that is used for programming fault-
tolerant applications is the supervision model.
2.9 Constr uction guidelines
In addition to explaining a general philosophy of programming fault-
tolerant applications, we need more specific guidelines that apply to the
programming languages that we wish to use to program our applications.
We also need example programs, and examples of how to use the library
routines.
The open source Erlang release contains such guidelines which have
been used as the basis for systems with millions of lines of Erlang code.
Appendix B reproduces the programming guidelines, which can be found
in the Erlang open source release. This thesis contains additional guide-
lines, which are organized as follows:
The overall philosophy of our architecture is described in this chap-
ter.
The notion of an error is discussed in several places. Sections 5.3,
and 4.3 describe what is meant by an error; section 4.4 gives advice
on the correct programing style to use when programming for errors
in Erlang.
Examples of how to program simple components can be found in
Chapter 4, and examples of how to use the OTP behaviours in
chapter 6.
32 CHAPTER 2. THE ARCHITECTURAL MODEL
2.10 Related work
The inability to isolate sodware components from each other is the main
reason why many popular programming languages cannot be used for
making robust system sodware.
It is essential for security to be able to isolate mistrusting pro-
grams from one anot her, and to protect the host platform from
such programs. Isolation is diecult in object-oriented systems
because objects can easily become aliased.
4
—Bryce [21]
Bryce goes on to say that object aliasing is diecult if not impossible to
detect in practice, and recommends t he use of protection domains (akin to
OS processes) t o solve this problem.
In a paper on Java Czajkowski, and Dayn
`
es, from Sun Microsystems,
write:
The only safe way to execute multiple applications, written
in the Java programming language, on the same computer
is to use a separate JVM for each of them, and to execute
each JVM in a separat e OS process. This introduces various
ineeciencies in resource utilization, which downgrades perfor-
mance, scalability, and application startup time. The benefits
the language can ocer are thus reduced mainly to portability
and improved programmer productivity. Granted these are
important, but the full potential of language-provided safety is
not realized. Instead there exists a curious distinction between
“language safety,” and “real safety”. [28]
In this paper they introduce the MVM (an extension to the JVM) where
their goal is:
... to turn the JVM into an execution environment akin to
an OS. In particular, the abstraction of a process, ocered by
4
An aliased object is one where at least two other objects hold a reference to it.
2.10. RELATED WORK 33
modern OSes, is the role model in terms of features; isolation
from other computations, resources accountability and control,
and ease of termination and resource reclamation.
To achieve this they conclude that:
... tasks cannot directly share objects, and that the only way
for tasks to communicate is to use standard, copying commu-
nication mechanisms, ...
These conclusions are not new. Very similar conclusions were arrived
at some two decades earlier by Jim Gray who described the architecture
of the Tandem Computer in his highly readable paper Why do computer s
stop and what can be done about it. He says:
As with hardware, the key to sodware fault-tolerance is t o hier-
archically decompose large systems into modules, each mod-
ule being a unit of service and a unit of failure. A failure of a
module does not propagate beyond the module.
...
The process achieves fault containment by sharing no state
with other processes; its only contact with other processes is
via messages carried by a kernel message system. [38]
Language which support this style of programming (par allel processes,
no shared data, pure message passing) are what Andrews and Schneider
[4] refer to as a “Message oriented languages.” The language with the
delightful name PLITS
5
(1978) [35] is probably the first example of such a
programming language:
The fundamental design decision in the implementation of
RIG
6
was to allow a strict message discipline with no shared
5
Programming language in the sky.
6
RIG was a small system written in PLITS.
34 CHAPTER 2. THE ARCHITECTURAL MODEL
data structures. All communication between user and server
messages is through messages which are routed by the Aleph
kernel. This message discipline has proved to be very flexible
and reliable. [35]
Turning away from language for a while, we ask what properties should
an individual process have?
Schneider [60, 59] answered this question by giving three properties
that he thought a hardware system should have in order to be suitable for
programming a fault-tolerant system. These properties Schneider called:
1. Halt on failure in the event of an error a processor should halt
instead of performing a possibly erroneous operation.
2. Failure status property when a processor fails, other processors
in the system must be informed. The reason for f ailure must be
communicated.
3. Stable Storage Property the storage of a processor should be par-
titioned into stable storage (which survives a processor crash,) and
volatile storage which is lost if a processor crashes.
A processor having these properties Schneider called a fail-stop pro-
cessor. The idea is that if a failure
7
occurs it is pointless to continue. The
process should halt instead of continuing and possibly causing more dam-
age. In a fault-stop processor, state is stored in either volatile or stable
memory. When the processor crashes all data in volatile memory is lost,
but all state that was stored in stable storage should be available ader the
crash.
A remarkably similar idea can be found in [38] where Gray talks about
“fail-fast processes.
The process approach to fault isolation advocates that the pro-
cess sodware be fail-fast, it should either function cor rectly or
it should detect the fault, signal failure and stop oper ating.
7
Here Schneider considers a failure as an error which cannot be corrected.
2.10. RELATED WORK 35
Processes are made fail-f ast by defensive programming. They
check all their inputs, intermediate results and data structures
as a matter of course. If any error is detected, they signal
a failure and stop. In the terminology of [Cristian], fail-fast
sodware has small fault detection latency. [38]
Both Schneider, and Gray, have the same essential idea; one is talking
about hardware, the other about sodware but the underlying principles are
the same.
Renzel argued that it is important that processes fail as soon as possible
ader an uncorrectable error has occurred:
A fault in a sodware system can cause one or more errors.
The latency time which is the interval between the existence
of the f ault and the occurrence of the error can be very high,
which complicates the backwards analysis of an error ...
For an ecective error handling we must detect errors and fail-
ures as early as possible [58]
Bearing in mind these arguments, and our original requirements I ad-
vocate a system with the following properties:
1. Processes are the units of error encapsulation errors occurr ing in
a process will not acect ot her processes in the system. We call this
property strong isolation.
2. Processes do what they are supposed to do or fail as soon as possible.
3. Failure, and the reason for failure, can be detected by remote pro-
cesses.
4. Processes share no state, but communicate by message passing.
A language and system with such properties, has the necessary precon-
ditions for writing fault-tolerant sodw are. Later in this thesis we will see
how these properties are satisfied by Erlang, and the Erlang libraries.
36 CHAPTER 2. THE ARCHITECTURAL MODEL
Many of the ideas in this thesis are not new—the fundamental principles
for making a fault-tolerant system are described in Gray’s [38] paper.
Many of the features of the Tandem computer bear a striking similar-
ity to the design principles in the OTP system, and to the fundamental
principles of Concurrency Oriented Programming which where discussed
earlier.
Here are two quotes from the paper, firstly the design principles on
page 15 of [38].
The keys to this sodware fault-tolerance are:
Sodware modularity through processes, and messages.
Fault containment through f ail-fast sodware modules.
Process-pairs to tolerant hardware, and transient sodware
faults.
Transaction mechanisms t o provide data, and message
integrity.
Transaction mechanisms combined with process-pairs to
ease exception handling, and tolerate sodw are faults.
Sodware modularity through processes and messages. As with
hardware, the key to sodware fault-tolerance is to hierarchi-
cally decompose large systems into modules, each module be-
ing a unit of service and a unit of failure. A failure of a module
does not propagate beyond the module.
There is considerable controversy about how to modularize
sodware. Starting with Burroughs Espol and continuing through
languages like Mesa and Ada, compiler writers have assumed
perfect hardware and contended that they can provide good
isolation through static compile-time type checking. In con-
trast, operating systems designer s have advocated run-time
checking combined with the process as the unit of protection
and failure.
2.10. RELATED WORK 37
Although compiler checking and exception handling provided
by programming languages are real assets, history seems to
have favored the run-time checks plus the process approach t o
fault-containment. It has the virtue of simplicity—if a process or
its processor misbehaves, stop it. The process provides a clean
unit of modularity, service, fault containment and failure.
Fault containment through fail-fast sodware modules.
The process achieves fault containment by sharing no state
with other processes; its only contact with other processes is
via messages carried by a kernel message system. [38]
If we compare this to our current Erlang system we see many striking
similarities. There are certain dicerence—in Erlang “defensive program-
ming” is not recommended since the compiler adds the necessary checks
to make this style of programming unnecessary. Gr ay’s “transaction mech-
anism” is provides by the mnesia data base.
8
The containment and pro-
cessing of errors is managed by the “supervision tree” behaviours in the
OTP libraries.
The idea of “f ail-fast” modules is mirrored in our guidelines for pro-
gramming where we say that processes should only do when they are sup-
posed to do according t o the specification, otherwise they should crash.
The supervision hierarchies in our system correspond to the hierarchies of
modules that Gray refers to. This idea can also be found in the work of
Candea and Fox [22] who talked about “cr ash-only sodware”—they argue
that allowing components to crash and then restart leads to a simpler fault
model and more reliable code.
More modern work with object-oriented systems has also recognised
the importance of isolating sodware components from each other. In [21]
Bryce and Razafimahef a argue that is is essential to isolate programs from
one another, and from the programs which run in the host operating sys-
tem. This, they consider, is the essential characteristic that any object
system must have. As they point out in their paper, this is a diecult
problem in an object-oriented context.
8
Written in Erlang.
38 CHAPTER 2. THE ARCHITECTURAL MODEL
3 Erlang
T
his chapter introduces Erlang. The treatment of the language is not
intended to be complete. For fuller treatment the reader is referred
to [5]. Developments to Erlang since [5] can be found in the OTP
documentation [3 4]. A more formal treatment of Erlang can be found in
the Erlang Specification [17] and in the core Erlang specification [23].
Erlang belongs to the class of Message-oriented languages [4] mes-
sage oriented languages provide concurrency in the form of parallel pro-
cesses. There are no shared objects in a message-oriented language. In-
stead all interaction between processes is achieved by sending and receiv-
ing messages.
In this chapter, I present a subset of the language which provides
enough detail to understand the Erlang examples in this thesis.
3.1 Overview
The Erlang view of the world can be summarized in the following state-
ments:
Everything is a process.
Processes are s trongly isolated.
Process creation and destruction is a lightweight oper ation.
39
40 CHAPTER 3. ERLANG
Message passing is the only way for processes to interact.
Processes have unique names.
If you know the name of a process you can send it a message.
Processes share no resources.
Error handling is non-local.
Processes do what they are supposed t o do or f ail.
The use of processes as the basic unit of abstraction is motivated by the
desire to make a language which is suitable for writing large fault-tolerant
sodware systems. The fundamental problem which must be solved in
writing such sodware is that of limiting the consequences of an error—
the process abstraction provides an abstraction boundary which stops the
propagation of errors.
It is, for example, precisely this inability to limit the consequences of
errors that makes Java unsuitable for programming “safe” (sic) applications
(see page 32 for furt her discussion of this point).
If processes are truly isolated (which they must be to limit the conse-
quences of an error) then most of the other properties of a process, like, for
example, that the only way for processes to interact is by message passing,
etc, follow as a natural consequence of this isolation.
The statement about error handling is perhaps less obvious. When
we make a fault-t olerant system we need at least two phy sically separated
computers. Using a single computer will not work, if it crashes, all is
lost. The simplest f ault-tolerant system we can imagine has exactly two
computers, if one computer crashes, then the other computer should take
over what the first computer was doing. In this simple situation even the
sodware for fault-recovery must be non-local; the error occurs on the fir st
machine, but is corrected by sodware running on the second machine.
The Erlang view of the world is that “everything is a process”, when
we model our physical machines as processes we retain the idea that error
handling should be non-local. Actually, this is a modified truth, remote
3.2. EXAMPLE 41
error handling only occurs if a local attempt to fix the error fails. In the
event of an exception a local process may be able to detect and correct the
fault which caused the exception, in which case as far as any other process
in the system is concerned, no error has occurred.
Viewed as a concurrent language, Erlang is very simple. Since there are
no shared data structures, no monitors or synchronised methods etc there
is very little to learn. The bulk of the language, and possible the least
interes ting part of the language is the sequential subset of the language.
This sequential subset can be characterised as a dynamically typed, strict
functional programming language, which is largely free from side-ecects.
In the sequential subset there are a few operations with side-ecects, but
they are virtually never needed.
The remainder of this chapter deals firs tly with the sequential subset of
the language. This is followed with sections on concurrent and distributed
programming and error handling. Finally I describe a type notation for
specifying Erlang data and function types.
To jump start the description, I start with an example of sequential
Erlang code.
3.2 Example
Figure 3.1 has a simple Erlang program. The program has the following
structure:
1. The program starts with a module definition (line 1) followed by
export and input declarations and then by a number of functions.
2. The export declaration (line 2) says that the function areas/1 is to
be exported from this module. The notation areas/1 means the
function called areas which has one argument. The only functions
which can be called from outside the module are those which are
contained in the export list.
3. The import declaration in line 3 says that the function map/2 can be
found in the module lists.
42 CHAPTER 3. ERLANG
1 -module(math).
2 -export([areas/1]).
3 -import(lists, [map/2]).
4
5 areas(L) ->
6 lists:sum(
7 map(
8 fun(I) -> area(I) end,
9 L)).
10
11 area({square, X}) ->
12 X*X;
13 area({rectangle,X,Y}) ->
14 X*Y.
Figure 3.1: An Erlang module
3.2. EXAMPLE 43
4. Lines 5 to 14 have two function definitions.
5. Line 6 is a call to the function sum in the module lists.
6. Lines 7 to 9 are a call to the function map/2 in the module lists.
Note the dicerence between this call to sum and the call to map - both
these functions are in the same module; one call uses a fully qualified
name (that is, lists:sum) whereas the other call uses an abbreviated
call sequence (that is map(...) instead of lists:map(...)). The
dicerence is accounted for by the import declaration in line 3, which
says that the function map/2 is to be found in the module lists.
7. Line 8 creates a fun which is the first argument to map.
8. Lines 11 to 14 contain t he function area/1. This function has two
clauses. The first clause is in lines 11 to 12, the second in lines 13 to
14, the clauses are separated by a semi-colon.
9. Each clause has a head and a body. The head and body are separated
from each other by a -> symbol.
10. A function head consists of a pattern in each argument position
and a possible guard (See Section 3.3.4). In line 13 the pattern is
{rectangle,X,Y}. In this pattern the curly bracket denote a tuple.
The first argument of the tuple is an atom (namely rectangle”)
and the second and third arguments are variables. Variables start
with capital letters, atoms start with small letters.
To run this program we start an Erlang shell compile the program and
enter some requests to evaluate functions, as shown in Figure 3.2. In this
figure all user input is underlined. The Erlang shell prompt is the character
> meaning that the system is waiting for input.
Line 1 in figure 3.2 starts an Erlang shell.
Line 5 compiles the module math.
44 CHAPTER 3. ERLANG
1 $ erl
2 Erlang (BEAM) emulator version 5.1 [source]
3
4 Eshell V5.1 (abort with ^G)
5 1> c (ma th).
6 ok,math
7 2> math : areas([{rectangle, 12, 4}, {square, 6}]).
8 84
9 3> math : area({square, 10}).
10 ** exited: {undef,[{math,area,[{square,10}]},
11 {erl_eval,expr,3},
12 {erl_eval,exprs,4},
13 {shell,eval_loop,2}]} **
Figure 3.2: Compiling and running a program in the shell
Line 7 requests a function evaluation, the shell accepts the request,
evaluates the function and prints the result in line 8.
In line 9 we try to evaluate a function which was not export ed from
the module math. An exception is generated and printed (lines 10
to 13).
3.3 Sequential Erlang
3.3.1 Data structures
Erlang has eight primitive data types:
1
Integers integers are written as sequences of decimal digits, for
example, 12, 12375 and -23427 are integer s. Integer arithmetic is
1
Also called constants.
3.3. SEQUENTIAL ERLANG 45
exact and of unlimited precision.
2
Atoms atoms are used within a program to denote distinguished
values. They are written as strings of consecutive alphanumeric char-
acters, the first character being a small letter. Atoms can obtain any
character if they are enclosed within single quotes and an escape
convention exists which allows any character to be used within an
atom.
Floats floating point numbers are represented as IEEE 754 [43]
64 bit floating point numbers. Real numbers in the