Learn to build an open-set person search system using Google's Gemini Multimodal API and SigLIP. This session covers VLM foundations, a fine-tuning pipeline for natural language queries in video footage, dataset curation with FiftyOne, and deployment challenges. Join us for pizza and tech talk!
10 RSVP'd
Discover how to combine Google’s Gemini Multimodal API and SigLIP to build a powerful open-set person search system for real-world applications like retail and security. This talk walks through the foundations of Vision-Language Models (VLMs), explains how they work under the hood, and demonstrates a real fine-tuning pipeline that enables natural language queries like "a woman with a blue jacket near the entrance" to find relevant people in video footage, even when they were never seen during training. We’ll also explore dataset curation with FiftyOne and practical challenges in deploying these systems.
Come out to Atomic Robot's office Wednesday evening, have some pizza with like-minded tech aficionados, and find enjoy this presentation from Google Developer Expert, Adonai Vera!
June 18 – 19, 2025
10:30 PM – 12:30 AM (UTC)
Google Developer Expert
Contact Us