Building Batch Data Pipelines on Google Cloud

Mark as Favorite Share
image

What Will You Learn?

• Review different methods of data loading: EL, ELT and ETL and when to use what
• Run Hadoop on Dataproc, leverage Cloud Storage, and optimize Dataproc jobs
• Build your data processing pipelines using Dataflow
• Manage data pipelines with Data Fusion and Cloud Composer

About This Course

Provider: coursera.org
Format: Online
Duration:  17 hours to complete [Approx]
Target Audience: Intermediate
Learning Objectives: This course describes which paradigm should be used and when for batch data.Learn new concepts from industry experts
Course Prerequisites:NA
Assessment and Certification:Earn a Certificate upon completion from the relevant Provider
Instructor: Google Cloud
Key Topics: Batch Data,Dataproc,EL,Dataflow
Topic Coverd:
- Introduction
- EL, ELT, ETL
- Quality considerations
- How to carry out operations in BigQuery
- Shortcomings
- ETL to solve data quality issues
- The Hadoop ecosystem
- Cloud Storage instead of HDFS
- Optimizing Dataproc Storage
- Optimizing Dataproc Templates and Autoscaling
- Running Apache Spark jobs on Dataproc
- Dataflow
- Building Dataflow Pipelines in code
- Key considerations with designing pipelines
- Transforming data with PTransforms
- Aggregate with GroupByKey and Combine
- MapReduce in Beam
- Side Inputs and Windows of data
- Practicing Pipeline Side Inputs
- Creating and re-using Pipeline Templates
- Components of Cloud Data Fusion
- Cloud Data Fusion UI
- Explore data using wrangler
- Orchestrate work between Google Cloud services with Cloud Composer
- Apache Airflow Environment
- DAGs and Operators
- Workflow scheduling
- Monitoring and Logging

0 Comments

No reviews yet !!

Please login first