Unified Batch and Stream Processing with Apache Beam Training Course

Course Code

beam

Duration

14 hours (usually 2 days including breaks)

Requirements

  • Experience with Python Programming.
  • Experience with the Linux command line.

Audience

  • Developers

Overview

Apache Beam is an open source, unified programming model for defining and executing parallel data processing pipelines. It's power lies in its ability to run both batch and streaming pipelines, with execution being carried out by one of Beam's supported distributed processing back-ends: Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam is useful for ETL (Extract, Transform, and Load) tasks such as moving data between different storage media and data sources, transforming data into a more desirable format, and loading data onto a new system.

In this instructor-led, live training (onsite or remote), participants will learn how to implement the Apache Beam SDKs in a Java or Python application that defines a data processing pipeline for decomposing a big data set into smaller chunks for independent, parallel processing.

By the end of this training, participants will be able to:

  • Install and configure Apache Beam.
  • Use a single programming model to carry out both batch and stream processing from withing their Java or Python application.
  • Execute pipelines across multiple environments.

Format of the Course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Note

  • This course will be available Scala in the future. Please contact us to arrange.

Course Outline

Introduction

  • Apache Beam vs MapReduce, Spark Streaming, Kafka Streaming, Storm and Flink

Installing and Configuring Apache Beam

Overview of Apache Beam Features and Architecture

  • Beam Model, SDKs, Beam Pipeline Runners
  • Distributed processing back-ends

Understanding the Apache Beam Programming Model

  • How a pipeline is executed

Running a sample pipeline

  • Preparing a WordCount pipeline
  • Executing the Pipeline locally

Designing a Pipeline

  • Planning the structure, choosing the transforms, and determining the input and output methods

Creating the Pipeline

  • Writing the driver program and defining the pipeline
  • Using Apache Beam classes
  • Data sets, transforms, I/O, data encoding, etc.

Executing the Pipeline

  • Executing the pipeline locally, on remote machines, and on a public cloud
  • Choosing a runner
  • Runner-specific configurations

Testing and Debugging Apache Beam

  • Using type hints to emulate static typing
  • Managing Python Pipeline Dependencies

Processing Bounded and Unbounded Datasets

  • Windowing and Triggers

Making Your Pipelines Reusable and Maintainable

Create New Data Sources and Sinks

  • Apache Beam Source and Sink API

Integrating Apache Beam with other Big Data Systems

  • Apache Hadoop, Apache Spark, Apache Kafka

Troubleshooting

Summary and Conclusion

Testimonials

★★★★★
★★★★★

Related Categories

Related Courses

Course Discounts

No course discounts for now.

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.

Some of our clients

is growing fast!

We are looking to expand our presence in Egypt!

As a Business Development Manager you will:

  • expand business in Egypt
  • recruit local talent (sales, agents, trainers, consultants)
  • recruit local trainers and consultants

We offer:

  • Artificial Intelligence and Big Data systems to support your local operation
  • high-tech automation
  • continuously upgraded course catalogue and content
  • good fun in international team

If you are interested in running a high-tech, high-quality training and consulting business.

Apply now!

This site in other countries/regions