Spark Programming in Python for Beginners with Apache Spark 3

If you are looking to expand your knowledge in data engineering or want to level up your portfolio by adding Spark programming to your skillset, then you are in the right place. This course will help you understand Spark programming and apply that knowledge to build data engineering solutions. This course is example-driven and follows a working session-like approach. We will be taking a live coding approach and explaining all the concepts needed along the way.

In this course, we will start with a quick introduction to Apache Spark, then set up our environment by installing and using Apache Spark. Next, we will learn about Spark execution model and architecture, and about Spark programming model and developer experience. Next, we will cover Spark structured API foundation and then move towards Spark data sources and sinks.

Then we will cover Spark Dataframe and dataset transformations. We will also cover aggregations in Apache Spark and finally, we will cover Spark Dataframe joins.

By the end of this course, you will be able to build data engineering solutions using Spark structured API in Python.

All the resources for the course are available at https://github.com/PacktPublishing/Spark-Programming-in-Python-for-Begi…

Type
video
Category
publication date
2022-02-18
what you will learn

Learn Apache Spark Foundation and Spark architecture
Learn data engineering and data processing in Spark
Work with data sources and sinks
Work with data frames and Spark SQL
Use PyCharm IDE for Spark development and debugging
Learn unit testing, managing application logs, and cluster deployment

duration
395
key features
Build your own data engineering solutions using Spark structured API in Python * Gain an in-depth understanding of the Apache Hadoop architecture, ecosystem, and practices * Learn to apply Spark programming basics
approach
This course is example-driven and follows a working session-like approach. The course delivers live coding sessions and explains the concepts along the way.
audience
This course is designed for software engineers willing to develop a data engineering pipeline and application using Apache Spark; for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure, for managers and architects who do not directly work with Spark implementation but work with the people who implement Apache Spark at the ground level.

This course does not require any prior knowledge of Apache Spark or Hadoop; only programming knowledge using Python programming language is required.
meta description
Build data engineering solutions with Spark programming in Python
short description
Advance your data skills by mastering Spark programming in Python. This beginner’s level course will help you understand the core concepts related to Apache Spark 3 and provide you with knowledge of applying those concepts to build data engineering solutions.
subtitle
Learn Data Engineering using Spark Structured API
keywords
Apache Spark, Spark 3.0, SQL, PyCharm, Python, Spark Structured API, Data Sources and Sinks, Data Engineering, Data Processing, Data Frames, Spark SQL
Product ISBN
9781803246161