Martin Gorner is here: Massive-scale data processing with Google Cloud Dataflow

GDG Cloud Netherlands
Fri, Apr 21, 2017, 9:00 AM (CEST)

About this event

How can you handle the data coming from a fleet of NYC taxis reporting their location in real time ? And how can you focus on the processing, not on deploying and babysitting servers ? How can you compute accurate business data for reporting as well as fast estimates for your real-time dashboard with the same code and infrastructure?

Data processing technologies have evolved significantly in the past 10 years, from MapReduce (2004!) to the real-time low-latency parallel data stream processing tools of today. In this lab you will write a data pipeline using the "Dataflow model" embraced by both Google Cloud Dataflow for a no-ops cloud deployment or Apache Beam for on-premise execution. You will test it on a really big data stream and see it autoscale to tens of nodes to process the load.

This lab uses real NYC taxi data to simulate a real-time stream of taxis reporting their position and status. With the help of Cloud Dataflow you will be able to aggregate the massive amount of incoming data and display it on a visually engaging dasboard map.