Design and implement a modern data lakehouse on the Azure Data Platform
using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse
Analytics, and Snowflake. This book teaches you the intricate details of
the Data Lakehouse Paradigm and how to efficiently design a cloud-based
data lakehouse using highly performant and cutting-edge Apache Spark
capabilities using Azure Databricks, Azure Synapse Analytics, and
Snowflake. You will learn to write efficient PySpark code for batch and
streaming ELT jobs on Azure. And you will follow along with practical,
scenario-based examples showing how to apply the capabilities of Delta
Lake and Apache Spark to optimize performance, and secure, share, and
manage a high volume, high velocity, and high variety of data in your
lakehouse with ease.
The patterns of success that you acquire from reading this book will
help you hone your skills to build high-performing and scalable
ACID-compliant lakehouses using flexible and cost-efficient decoupled
storage and compute capabilities. Extensive coverage of Delta Lake
ensures that you are aware of and can benefit from all that this new,
open source storage layer can offer. In addition to the deep examples on
Databricks in the book, there is coverage of alternative platforms such
as Synapse Analytics and Snowflake so that you can make the right
platform choice for your needs.
After reading this book, you will be able to implement Delta Lake
capabilities, including Schema Evolution, Change Feed, Live Tables,
Sharing, and Clones to enable better business intelligence and advanced
analytics on your data within the Azure Data Platform.
What You Will Learn
-
Implement the Data Lakehouse Paradigm on Microsoft's Azure cloud
platform
-
Benefit from the new Delta Lake open-source storage layer for data
lakehouses
-
Take advantage of schema evolution, change feeds, live tables, and
more
-
Write functional PySpark code for data lakehouse ELT jobs
-
Optimize Apache Spark performance through partitioning, indexing, and
other tuning options
-
Choose between alternatives such as Databricks, Synapse Analytics, and
Snowflake
Who This Book Is For
Data, analytics, and AI professionals at all levels, including data
architect and data engineer practitioners. Also for data professionals
seeking patterns of success by which to remain relevant as they learn to
build scalable data lakehouses for their organizations and customers who
are migrating into the modern Azure Data Platform.