Description
The two main components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud in technical detail. Also, this course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment.
Learning Objectives
- Differentiate between data lakes and data warehouses.
- Explore use-cases for each type of storage and the available data lake and warehouse solutions on Google Cloud.
- Discuss the role of a data engineer and the benefits of a successful data pipeline to business operations.
- Examine why data engineering should be done in a cloud environment.
Prerequisites
Basic proficiency with a common query language such as SQL.
Who Should Attend
This course is intended for developers who are responsible for querying datasets, visualizing query results, and creating reports.
Specific job roles include:
- Data engineer
- Data analyst
- Database administrators
- Big data architects
Course Outline
Module 1 – Introduction to Data Engineering
Topics:
- The role of a data engineer
- Data engineering challenges
- Introduction to BigQuery
- Data lakes and data warehouses
- Transactional databases versus data warehouses
- Partnering effectively with other data teams
- Managing data access and governance
- Build production-ready pipelines
- Google Cloud customer case study
Objectives:
- Discuss the role of a data engineer.
- Discuss benefits of doing data engineering in the cloud.
- Discuss challenges of data engineering practice and how building data pipelines in the cloud helps to address these.
- Review and understand the purpose of a data lake versus a data warehouse, and when to use which.
Module 2 – Building a Data Lake
Topics:
- Introduction to data lakes
- Data storage and ETL options on Google Cloud
- Building a data lake by using Cloud Storage
- Securing Cloud Storage
- Storing all sorts of data types
- Cloud SQL as your OLTP system
Objectives:
- Discuss why Cloud Storage is a great option to build a data lake on Google Cloud.
- Explain how to use Cloud SQL for a relational data lake.
Module 3 – Building a Data Warehouse
Topics:
- The modern data warehouse
- Introduction to BigQuery
- Getting started with BigQuery
- Loading data into BigQuery
- Exploring schemas
- Schema design
- Nested and repeated fields
- Optimizing with partitioning and clustering
Objectives:
- Discuss the requirements of a modern warehouse.
- Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
- Discuss the core concepts of BigQuery and review options of loading data into BigQuery.



