This book takes its reader on a journey through Apache Giraph, a popular
distributed graph processing platform designed to bring the power of big
data processing to graph data. Designed as a step-by-step self-study
guide for everyone interested in large-scale graph processing, it
describes the fundamental abstractions of the system, its programming
models and various techniques for using the system to process graph data
at scale, including the implementation of several popular and advanced
graph analytics algorithms.
The book is organized as follows: Chapter 1 starts by providing a
general background of the big data phenomenon and a general introduction
to the Apache Giraph system, its abstraction, programming model and
design architecture. Next, chapter 2 focuses on Giraph as a platform and
how to use it. Based on a sample job, even more advanced topics like
monitoring the Giraph application lifecycle and different methods for
monitoring Giraph jobs are explained. Chapter 3 then provides an
introduction to Giraph programming, introduces the basic Giraph graph
model and explains how to write Giraph programs. In turn, Chapter 4
discusses in detail the implementation of some popular graph algorithms
including PageRank, connected components, shortest paths and triangle
closing. Chapter 5 focuses on advanced Giraph programming, discussing
common Giraph algorithmic optimizations, tunable Giraph configurations
that determine the system's utilization of the underlying resources, and
how to write a custom graph input and output format. Lastly, chapter 6
highlights two systems that have been introduced to tackle the challenge
of large scale graph processing, GraphX and GraphLab, and explains the
main commonalities and differences between these systems and Apache
Giraph.
This book serves as an essential reference guide for students,
researchers and practitioners in the domain of large scale graph
processing. It offers step-by-step guidance, with several code examples
and the complete source code available in the related github repository.
Students will find a comprehensive introduction to and hands-on practice
with tackling large scale graph processing problems using the Apache
Giraph system, while researchers will discover thorough coverage of the
emerging and ongoing advancements in big graph processing systems.