Dive into the world of SQL on Hadoop and get the most out of your Hive
data warehouses. This book is your go-to resource for using Hive:
authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas
Francois Vermeulen take you through learning HiveQL, the SQL-like
language specific to Hive, to analyze, export, and massage the data
stored across your Hadoop environment. From deploying Hive on your
hardware or virtual machine and setting up its initial configuration to
learning how Hive interacts with Hadoop, MapReduce, Tez and other big
data technologies, Practical Hive gives you a detailed treatment of
the software.
In addition, this book discusses the value of open source software, Hive
performance tuning, and how to leverage semi-structured and unstructured
data.
What You Will Learn
- Install and configure Hive for new and existing datasets
- Perform DDL operations
- Execute efficient DML operations
- Use tables, partitions, buckets, and user-defined functions
- Discover performance tuning tips and Hive best practices
Who This Book Is For
Developers, companies, and professionals who deal with large amounts of
data and could use software that can efficiently manage large volumes of
input. It is assumed that readers have the ability to work with SQL.