Loading...

Course Description

A sound data engineering approach is the baseline by which all other data initiatives depend. Moving data through ETLs/ELTs into data warehouses that represent data in a way that is serviceable to the data goals of the organization is a critical, ongoing process to any data-driven organization. However, ETLs are filled with risks to data integrity, quality, and provenance and must be approached with best practices in mind.

This course will provide an overview of data engineering approaches and their trade offs in building different types of data warehouses. Tools for batch or stream data processing will be introduced. Common issues related to reliability, robustness, data loss, and data provenance will be explored. 

This course is part of the Data Engineering track of the Advanced Data Science Certificate.

Course Objectives

Upon successful completion of the course, students will be able to:

  • Understand the different approaches to data warehousing across varying sized organizations.

  • Structure and organize data engineering initiatives.

  • Compose data pipelines that integrate a wide range of data tools including Python scheduling libraries or services, distributed processing systems, and multiple data stores.

  • Articulate the issues related to data lineage and provenance and evaluate workflow options that preserve them.

  • Design unit and end-to-end automated tests for data pipelines

  • Understand anomaly detection and alerting techniques to reinforce robust data pipelines.

Notes

This course is part of the Data Engineering track of the Advanced Data Science Certificate.

Applies Towards the Following Certificates

Loading...
Thank you for your interest in this course. Unfortunately, the course you have selected is currently not open for enrollment. Please complete a Course Inquiry so that we may promptly notify you when enrollment opens.
Required fields are indicated by .