A one-volume guide to the most essential techniques for designing and
building dependable distributed systems
Instead of covering a broad range of research works for each
dependability strategy, this useful reference focuses on only a selected
few (usually the most seminal works, the most practical approaches, or
the first publication of each approach), explaining each in depth,
usually with a comprehensive set of examples. Each technique is
dissected thoroughly enough so that readers who are not familiar with
dependable distributed computing can actually grasp the technique after
studying the book.
Building Dependable Distributed Systems consists of eight chapters.
The first introduces the basic concepts and terminology of dependable
distributed computing, and also provides an overview of the primary
means of achieving dependability. Checkpointing and logging mechanisms,
which are the most commonly used means of achieving limited degree of
fault tolerance, are described in the second chapter. Works on
recovery-oriented computing, focusing on the practical techniques that
reduce the fault detection and recovery times for Internet-based
applications, are covered in chapter three. Chapter four outlines the
replication techniques for data and service fault tolerance. This
chapter also pays particular attention to optimistic replication and the
CAP theorem. Chapter five explains a few seminal works on group
communication systems. Chapter six introduces the distributed consensus
problem and covers a number of Paxos family algorithms in depth. The
Byzantine generals problem and its latest solutions, including the
seminal Practical Byzantine Fault Tolerance (PBFT) algorithm and a
number of its derivatives, are introduced in chapter seven. The final
chapter details the latest research results surrounding
application-aware Byzantine fault tolerance, which represents an
important step forward in the practical use of Byzantine fault tolerance
techniques.