Fine-Art of Fine-Tuning: Attacking and Securing LLMs - Day 1

Bob L. Herd Department of Petroleum Engineering, 807 Boston Ave, Lubbock, 79409

GDG on Campus Texas Tech University - Lubbock, United States

πŸš€ Launching Attacks – Running a Greedy Coordinate Gradient (GCG) attack on Llama 3.2 3B and Phi 3.5 Mini Instruct to extract unsafe responses. πŸ” Exposing Vulnerabilities – Identifying flaws in response generation and safety mechanisms of these models. πŸ› οΈ Reinforcing Defenses – Fine-tuning both models using adversarial examples to resist future attacks. πŸ’‘ Testing the Fixes – Reassess security

Feb 13, 11:30β€―PM – Feb 14, 12:30β€―AM (UTC)

1 RSVP'd

Key Themes

AIDataDesignGemini

About this event

Fine-Art of Fine-Tuning: Attacking and Securing LLMs

● Perform a Greedy Coordinate Gradient

(GCG) attack on Llama 3.2 3B and Phi 3.5

Mini Instruct to extract malicious

responses

● Identify weaknesses in the response

generation and safety mechanisms of

both models

● Train Llama 3.2 3B and Phi 3.5 Mini Instruct

using adversarial examples to resist

prompt attacks

● Reassess model security by attempting

new prompt attacks post-fine-tuning

PETR 110; 5:30 PM to 6:30 PM, Feb 13 & Feb 20

When

When

February 13 – 14, 2025
11:30 PM – 12:30 AM (UTC)

Organizer

  • Atharva Lade

    Organizer

Contact Us