In this session, we will delve into the essential building blocks of data engineering, placing a spotlight on the discov...
8 RSVP'd
In this session, we will delve into the essential building blocks of data engineering, placing a spotlight on the discovery process. From framing the problem statement to navigating the intricacies of exploratory data analysis (EDA), data modeling using Python, VS Code, Jupyter Notebooks, SQL, and GitHub, you'll gain a solid understanding of the fundamental aspects that drive effective data engineering projects.
#DevFest Series
1. Introduction:
The "Why": We'll discuss why understanding your data upfront is crucial for success.
The Problem: We'll introduce a real-world problem that will guide our exploration.
2. Data Loading and Preparation:
Loading: We'll demonstrate how to efficiently load data from an online source directly into our workspace.
Structuring: We'll prepare the loaded data for analysis, making it easy to work with.
3. Exploratory Data Analysis (EDA):
First Look: We'll learn how to quickly generate and interpret summary statistics for our data.
The Story: We'll use these statistics to understand the data's characteristics and identify any red flags or anomalies.
4. Data Cleaning and Modeling:
Cleaning: We'll identify and handle common data issues like missing values and inconsistencies.
Modeling: We'll organize our data into separate tables for dimensions (descriptive attributes) and facts (measurable values).
5. Visualization and Real-World Application:
Bringing it to Life: We'll create charts to visualize the data and find patterns.
Solving the Problem: We'll apply the insights gained to address our original problem and discuss practical solutions.
Key Takeaways:
- Mastery of the foundational aspects of data engineering.
- Hands-on experience with EDA techniques, emphasizing the discovery phase.
- Appreciation for the value of a code-centric approach in the data engineering discovery process.
Upcoming Talks:
Join us for subsequent sessions in our Data Engineering Process Fundamentals series, where we will delve deeper into specific facets of data engineering, exploring topics such as data modeling, pipelines, and best practices in data governance.
This presentation is based on the book, "Data Engineering Process Fundamentals," which provides a more comprehensive guide to the topics we'll cover. You can find all the sample code and datasets used in this presentation on our popular GitHub repository.
GDG Organizer
Contact Us