This course will help you understand Hive, along with preparing you to achieve CCA159 (Cloudera Big Data Analyst) certification.
You will start by delving into Hadoop and its distributed file system. Next, you’ll become well-versed with the most common Hadoop commands you'll need to work with Hadoop file systems. Later, you’ll explore the Apache Hive, starting with an introduction to it, before moving on to understanding external and managed tables. The next few sections will take you through insert and multi-insert. As you progress, the course will provide insights into different functions such as collection, conditional, Hive string functions, Hive date functions, and mathematical functions. In addition to this, you’ll learn to work with different file formats and compressions.
By the end of this course, you’ll have comprehensive knowledge of Hive and Sqoop and gained the skills you need to pass the CCA Data Analyst Exam.
All code and supporting files are available at - https://github.com/PacktPublishing/CCA-159-Expert-in-Big-Data-Analytics…
Delve into Hive analysis
Get to grips with the ALTER TABLE command
Explore joins, multi-joins and Map joins
Work with different files such as Parquet and Avro
Understand partitioning and bucketing
Focus on views
Get up to speed with lateral views/explode
Delve into window functions - Rank/Dense Rank/Lead/Lag/Min/Max
Explore the window specification