Do you want a guide that will help you to pick the right Big Data technology for your project? Or do you want to get a solid understanding of the Big Data architecture and pipelines? This course will help you out.
After highlighting the course structure and learning objectives, the course will take you through the steps needed for setting up the environment. Next, you will understand the Big Data logical architecture, study the evolution of Big Data technologies, and explore Big Data pipelines. Moving along, you will become familiar with ingestion frameworks, such as Kafka, Flume, Nifi, and Sqoop. Next, you will learn about key storage frameworks, such as HDFS, HBase, Kudu, and Cassandra. Finally, you will go through the various data formats and uncover key data processing and data analysis frameworks.
By the end of this course, you will have a good understanding of the Big Data architecture and technologies and will have developed the skills to build real-world Big Data pipelines.
All the resources and support files for this course are available at https://github.com/PacktPublishing/Big-Data-for-Architects
Create a Google account and a Dataproc cluster
Understand the Big Data architecture and pipelines
Study factors to consider while comparing ingestion frameworks
Gain a solid understanding of storage frameworks
Distinguish between text and binary data format
Find the key differences between the Spark, Tez, and Flink frameworks
Build a scalable Extract, Transform, Load (ETL) pipeline with Kafka Connect