Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. Since its release, Apache Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at a massive scale. It has quickly become the largest open-source community in big data. So, mastering Apache Spark opens a wide range of professional opportunities.
This course starts with an introduction to Apache Spark where you see what Apache Spark is in brief. Then, you will be installing and using Apache Spark. After that, you will look at the Spark execution model and architecture in detail. Next, you will learn the Spark programming model and developer experience. Following that, you will look at the Spark Structured API foundation, and Spark data sources and sinks. Then, you will explore Spark Data frame and dataset transformations along with aggregations in Apache Spark. Finally, you will look at the Spark Data frame joins in detail.
By the end of this course, you will understand Spark programming and apply that knowledge to build data engineering solutions.
All the resource files are uploaded on the GitHub repository at https://github.com/PacktPublishing/Spark-Programming-in-Scala-for-Begin…
Learn Apache Spark and Spark architecture
Look at data engineering and data processing in Spark
Work with data sources and sinks
Work with data frames, datasets, and Spark SQL
Use IntelliJ IDEA for Spark development and debugging
Understand unit testing, managing application logs, and cluster deployment
Before proceeding with the course, you will need basic knowledge of the Scala programming language.
The course will help you understand Spark programming and apply that knowledge to build data engineering solutions.