GDG on Campus Vienna University of Technology
We explore trends in large-scale HPC systems, their role in ML training and latest research on optimizing GPU communication for faster distributed deep learning.
33 RSVP'd
Location: https://maps.tuwien.ac.at/?q=HEEG02
Cost: Free, but limited capacity – please sign up!
Abstract:
High-Performance Computing (HPC) has evolved to support the growing demands of scientific computing and AI, leading to exascale supercomputers. This talk explores the trends in large-scale HPC systems, the role of HPC in AI training, and the infrastructure required for deep learning and large language models (LLMs).
We will also discuss distributed deep learning and the importance of efficient communication in GPU-accelerated clusters and present our recent research on tuning the NVIDIA Collective Communications Library (NCCL) for faster distributed deep learning training.
Speaker Bio:
Majid Salimibeni is a postdoctoral researcher in the Parallel Computing research unit at TU Wien since 2024. Before that, he was a postdoctoral researcher at the University of Salerno, Italy, where he also earned his PhD in Computer Science.
His research focuses on High-Performance and Parallel Computing, particularly optimizing communication in large-scale HPC systems. During his PhD, he worked on improving message passing in HPC environments, and his current research explores GPU-based communication optimization for deep learning workloads.
Friday, March 14, 2025
4:00 PM – 8:00 PM (UTC)
Contact Us