Introduction to working with Big Data - Hands On

Portland State University (Karl Miller Center), 615 SW Harrison St, Portland, OR 97201, Portland, 97201

GDG Portland

Learn to work with large datasets in this hands-on workshop. We will be exploring the fundamentals of data engineering and the unique challenges of working with large datasets.

May 31, 2024, 12:00 – 2:00 AM (UTC)

35 RSVP'd

Key Themes

Data

About this event

Join us for an exciting workshop on working with big data. This workshop will be led by GDG member Yaakov Bressler.

Bring your laptop!


Why is this course important?

  1. If data is too big, don't throw in the towel, you can process it using these ways
  2. If you already process big data, maybe there are more efficient or cost effective ways to do it
  3. Build the right solution for the right problem.

If you take this course:
You will know how to process big data in multiple ways and which is the best choice for you.


Themes:

  1. Multiple ways to process a file
    1. in memory
    2. in chunks
    3. streaming (sometimes the same as chunking, sometimes not)
    4. map reduce
    5. massively parallel processing (MPP) [out of scope]
  2. Big data is IO bound (when downloading/uploading big files)
    1. Compress when possible
    2. Move compute closer to the data (private network / VPC / access point / or, in the actual data center)
  3. Don't do things twice
    1. Caching (via disk) - don't download a file twice
    2. Incrementalism: use your data to determine offsets - don't process data twice
  4. Orchestrate pipelines instead of executing straight code
    1. Simplifies complex systems
    2. Allows delegation to other machines
  5. Big powerful tools can be expensive - but sometimes they are worth it
    1. Perhaps demonstrate how to process this all in Snowflake or BigQuery



Prerequisites: (Complete at least 1 day in advance)

  • Familiarity with python programming language
  • Familiarity with SQL
  • Complete the installation of necessary softwares (following this guide)
    • Python installed on your machine
      • Install poetry (dependency management)
      • Install pyenv (python version manager)
    • Install duckdb

Resources:


NOTE:

Due to limited space, we have very few spots available for this workshop. (Priority will be given to PSU students or alumni.) Feel free to join the waitlist and we'll let you know when space opens up.

👉 Want more? See all upcoming events: gdg-portland.dev

Speaker

  • Yaakov Bressler

    Headspace

    Senior Data Engineer

Facilitator

  • Keylan Petty

    Portland State University

    College of Computer Science

Host

  • Tyler C de Laguna

    GDG Portland

    Chapter Head

Organizers

  • Kyle Beechly

    Developer

  • Ian O'Gorman

    Software Developer, LATERAL.systems

  • Kaity Heflin

    Mentor

  • Tyler C de Laguna

    Chapter Head

  • Yaakov Bressler

    Headspace

    Sr. Data Engineer

  • Isil Berkun

    Founder of DigiFab AI

  • Cody Pika Tavita McGraw

    Guide on the Side, Llc.

    Event Organizer

  • Abigail Weisenbloom

    Developer

  • Evan Pierce

    Mobile Developer, Netflix

  • Shouryan Nikam

    Tektronix

    Software Engineer

Contact Us