Building Real-Time Data Pipelines: A Practical Guide - Data Engineering Process Fundamentals

GDG Broward County - FL

This session builds upon your existing batch data processing knowledge! We'll delve into the world of data streaming, equipping you with the skills to seamlessly integrate a real-time pipeline into your data lake. Discover how to leverage Apache Kafka and Apache Spark to capture and process information as it's generated, unlocking the power of continuous data flow.

Jul 24, 4:00 – 4:45 PM (UTC)

17 RSVP'd

Key Themes

Career DevelopmentCommunity BuildingDataGoogle Cloud Next

About this event

 Description:

This session builds upon your existing batch data processing knowledge! We'll delve into the world of data streaming, equipping you with the skills to seamlessly integrate a real-time pipeline into your data lake. Discover how to leverage Apache Kafka and Apache Spark to capture and process information as it's generated, unlocking the power of continuous data flow. Learn how this real-time data seamlessly integrates with your existing data lake, ultimately feeding into your data warehouse for even deeper analysis. Gain valuable insights from a combined batch and real-time approach, empowering you to make faster and more informed decisions.

Agenda:

1. What is Data Streaming?

- Understanding the concept of continuous data flow.

- Real-time vs. batch processing.

- Benefits and use cases of data streaming.

2. Data Streaming Channels

- APIs (Application Programming Interfaces)

- Events (system-generated signals)

- Webhooks (HTTP callbacks triggered by events)

3. Data Streaming Components

- Message Broker (Apache Kafka)

- Producers and consumers

- Topics for data categorization

- Stream Processing Engine (Apache Spark Structured Streaming)

4. Solution Design and Architecture

- Real-time data source integration

- Leveraging Kafka for reliable message delivery

- Spark Structured Streaming for real-time processing

- Writing processed data to the data lake

6. Q&A Session

- Get your questions answered by the presenters.

Why Attend:

- Stay Ahead of the Curve: Gain a comprehensive understanding of data streaming, a crucial aspect of modern data engineering.

- Unlock Real-Time Insights: Learn how to leverage data streaming for immediate processing and analysis, enabling faster decision-making.

- Master Kafka and Spark: Explore the power of Apache Kafka as a message broker and Apache Spark Structured Streaming for real-time data processing.

- Build a Robust Data Lake: Discover how to integrate real-time data into your data lake for a unified data repository.

- Ask the Experts: Get your questions answered by data engineering professionals during the Q&A session.


Please RSVP to secure your spot for this session.  We believe in fostering a welcoming and inclusive environment where everyone's unique perspectives are valued and contribute to our collective success.

Speaker

  • Oscar Garcia

    ozkary.com

    VP of product development

Organizer

  • oscar garcia

    GDG Organizer

Contact Us