GDG on Campus Texas Tech University - Lubbock, United States
π Launching Attacks β Running a Greedy Coordinate Gradient (GCG) attack on Llama 3.2 3B and Phi 3.5 Mini Instruct to extract unsafe responses. π Exposing Vulnerabilities β Identifying flaws in response generation and safety mechanisms of these models. π οΈ Reinforcing Defenses β Fine-tuning both models using adversarial examples to resist future attacks. π‘ Testing the Fixes β Reassess security
1 RSVP'd
Fine-Art of Fine-Tuning: Attacking and Securing LLMs
β Perform a Greedy Coordinate Gradient
(GCG) attack on Llama 3.2 3B and Phi 3.5
Mini Instruct to extract malicious
responses
β Identify weaknesses in the response
generation and safety mechanisms of
both models
β Train Llama 3.2 3B and Phi 3.5 Mini Instruct
using adversarial examples to resist
prompt attacks
β Reassess model security by attempting
new prompt attacks post-fine-tuning
PETR 110; 5:30 PM to 6:30 PM, Feb 13 & Feb 20
February 13βββ14, 2025
11:30 PM β 12:30 AM (UTC)
Organizer
Contact Us