See our poster today
Poster Session 1 @ 10am
Hall 3 + Hall 2B #239
24.04.2025 00:58
π 0
π 0
π¬ 0
π 0
Shh, don't say that! Domain Certification in LLMs
Domain Certification - A novel framework providing provable, adversarial defenses for LLMs safety.
Read more: cemde.github.io/Domain-Certi...
Thanks to my amazing collaborators:
- @alasdair-p.bsky.social, Preetham Arvind, @maximek3.bsky.social, Tom Rainforth, @philiptorr.bsky.social, @adelbibi.bsky.social at @ox.ac.uk
- Bernard Ghanem at KAUST
- Thomas Lukasiewicz at @tuwien.at.
(7/7)
04.04.2025 20:11
π 3
π 2
π¬ 0
π 0
To obtain such certificates, we present a simple, scalable and powerful algorithm: VALID. Remarkably, for each unwanted response it provides a **global bound in prompt space** π
(6/7)
04.04.2025 20:11
π 2
π 1
π¬ 1
π 0
A Domain Certificate bounds the adversarial risk of the model producing out-of-domain responses:
(5/7)
04.04.2025 20:11
π 0
π 0
π¬ 1
π 0
We are tired of the cat π and mouse π game of attacks and defenses. Hence, we propose :
- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and effective test-time algorithm.
(4/7)
04.04.2025 20:11
π 0
π 0
π¬ 1
π 0
Example: Can't afford Github Copilot? π‘ Use the Amazon Shopping App.
(3/7)
04.04.2025 20:11
π 0
π 0
π¬ 1
π 0
Consider an LLM deployed for a specific purpose like a medical chatbot. Such model should **only** respond to medical questions.
β οΈ Problem: LLMs are very capable and vulnerable to respond to **any** queries: how to build a bomb, organize tax fraud etc.
(2/7)
04.04.2025 20:11
π 0
π 0
π¬ 1
π 0
a man in a suit and tie is sitting at a desk in front of a computer screen that says founder of the office .
ALT: a man in a suit and tie is sitting at a desk in front of a computer screen that says founder of the office .
π¨ New paper alert: Our recent work on LLM safety has been accepted to ICLR 2025 πΈπ¬
We propose a new framework for LLMs safety. π§΅
(1/7)
#LLM #AISafety #ICLR2025 #Certification #AdversarialRobustness #NLP #Shhhhhh #DomainCertification #AI
04.04.2025 20:11
π 2
π 1
π¬ 1
π 1
πI know I'm late to the party, but super excited that I got 3/3 accepted at #ICLR2025 including 1 spotlight π
- Shh, dont say that! Domain Certification in LLMs
- Towards Certification of Uncertainty Calibration under Adversarial Attacks
- Benchmarking Predictive Coding Networks
SeeYouInSingaporeπΈπ¬ βοΈ
24.02.2025 16:48
π 2
π 0
π¬ 0
π 0
Shh, don't say that! Domain Certification in LLMs
Foundation language models, such as LLama, are often deployed in constrained environments. For instance, a customer support bot may utilize a large language model (LLM) as its backbone due to the...
The amazing collaborators: Preetham Arvind, @alasdair-p.bsky.social, Maxime Kayser, Tom Rainforth, Thomas Lukasiewicz, Philip Torr, Adel Bibi.
A @oxfordtvg.bsky.social production.
(6/6)
Link to paper:
openreview.net/forum?id=brD...
14.12.2024 01:18
π 3
π 1
π¬ 0
π 0
Interested? Want to learn more?
Join us at the SoLaR workshop tomorrow.
- π When: Tomorrow, 14 Dec, from 11pm to 13pm.
- πΊοΈ Where: West meeting rooms 121 and 122 here in Vancouver.
(5/6)
14.12.2024 01:18
π 1
π 0
π¬ 1
π 0
Our method enables strong LLM performance while providing adversarial guarantees on out-of-domain behaviour.
(4/6)
14.12.2024 01:18
π 1
π 0
π¬ 1
π 0
We are tired of the π and π game of attacks and defenses. Hence, we propose:
- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and efficient test-time algorithm.
(3/6)
14.12.2024 01:18
π 0
π 0
π¬ 1
π 0
It is known that fine-tuned foundation models are adversarially vulnerable to provide responses to questions they should not answer.
(2/6)
For instance: Can't afford ChatGPT Plus? Use a shopping app instead.
14.12.2024 01:18
π 0
π 0
π¬ 1
π 0
Are you scared users might misappropriate your LLM system? π±
We were scared too! Now we introduce adversarial certificates on the misuse of LLMs. π€
Come and see our poster SoLaR Workshop tomorrow.
#NeurIPS2024 #NeurIPS #AI #NLP #LLM #DomainCertification #Shhhhhhhh
14.12.2024 01:18
π 4
π 0
π¬ 1
π 0
Great work! You might find our SoLaR paper interesting: We propose a certification framework for LLM systems to stay on-topic and not respond to such questions: openreview.net/pdf?id=brDLU...
06.12.2024 19:23
π 0
π 0
π¬ 0
π 0
A snow cat with the Radcliffe Camera behind
The Radcliffe Camera
The Fellows Garden
The first snow in Exeter College this morning βοΈ
#ExeterCollegeOxford #OxfordUniversity #Snowing
19.11.2024 11:15
π 22
π 3
π¬ 1
π 1