This book introduces a novel approach to the design and operation of
large ICT systems. It views the technical solutions and their
stakeholders as complex adaptive systems and argues that traditional
risk analyses cannot predict all future incidents with major impacts. To
avoid unacceptable events, it is necessary to establish and operate
anti-fragile ICT systems that limit the impact of all incidents, and
which learn from small-impact incidents how to function increasingly
well in changing environments.
The book applies four design principles and one operational principle to
achieve anti-fragility for different classes of incidents. It discusses
how systems can achieve high availability, prevent malware epidemics,
and detect anomalies. Analyses of Netflix's media streaming solution,
Norwegian telecom infrastructures, e-government platforms, and Numenta's
anomaly detection software show that cloud computing is essential to
achieving anti-fragility for classes of events with negative impacts.