Aaron Dharna (@aadharna)

Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models Multi-agent interactions have long fueled innovation, from natural predator-prey dynamics to the space race. Self-play (SP) algorithms try to harness these dynamics by pitting agents against ever-impr...

For all the details, please give the paper a read!
Paper: arxiv.org/abs/2507.06466

Infinite thanks to @jeffclune.com and @cong-ml.bsky.social for all their guidance!

10.07.2025 18:17 👍 7 🔁 2 💬 0 📌 0

FMSPs represent a new direction for open-ended strategy discovery in AI. We anticipate they can lead to a richer exploration of creative, diverse, and robust solutions across various domains, from language-based tasks to traditional RL

10.07.2025 18:17 👍 0 🔁 0 💬 1 📌 0

Gandalf | Lakera – Test your prompting skills to make Gandalf reveal secret information. Trick Gandalf into revealing information and experience the limitations of large language models firsthand.

In Gandalf, FMSPs successfully red-teamed an LLM, breaching GPT-4o-mini’s defenses. We implemented 7 additional external defensive strategies from Lakera’s single-agent Gandalf game (gandalf.lakera.ai) and FMSPs autonomously wrote code to break 6/7 of those defenses!!

10.07.2025 18:17 👍 0 🔁 0 💬 1 📌 0

We also explore FMSPs in an AI safety domain, Gandalf. An attacker LLM writes code (prompts and extraction functions) to jailbreak a secret from GPT-4o-mini while a defender LLM searches for system prompts & I/O guards (eg, double checking GPT’s response) to increase protection

10.07.2025 18:17 👍 0 🔁 0 💬 1 📌 0

We evaluate FMSPs in Car Tag, an asymmetric continuous-control game (see gifs above). FMSP variants write code-based policies (go left; q-learning; etc). Below are PCA plots of policy embeddings showing that QDSP has the highest QD-Score vs the other FMSPs and a non-LLM baseline

10.07.2025 18:17 👍 0 🔁 0 💬 1 📌 0

QDSP is dimensionless because the user no longer has to pick dimensions of variation, and it can recognize new dimensions of variation that did not exist in any data so far generated! The FM decides what counts as interestingly new based on its vast world knowledge together with an embedding model!

10.07.2025 18:17 👍 0 🔁 0 💬 1 📌 0

QDSP introduces a novel "dimensionless" MAP-Elites! Policies (Q-Learning, MCTS, etc.) are clustered via a pretrained model and are added to the archive if they're sufficiently new OR outperform the most similar policy (analogous to filling/improving a cell in MAP-Elites)

10.07.2025 18:17 👍 0 🔁 0 💬 1 📌 0

We created 3 FMSP algorithms 1. Vanilla FMSP (vFMSP), which just tries to improve performance; 2. Novelty-Search SP (NSSP) for generating diverse (but not necessarily high-performing) strategies; and 3. Quality-Diversity SP (QDSP), to create both high-quality & diverse strategies

10.07.2025 18:17 👍 0 🔁 0 💬 1 📌 0

We introduce a family of FMSP approaches with the same general structure (see Fig.). Harnessing open-endedness, the FM looks at the history of strategies tried so far (implemented in code), their scores, and creates new strategies to try

10.07.2025 18:17 👍 2 🔁 0 💬 1 📌 0

Really excited to share my recent work combining open-ended foundation model innovation with the competitive dynamics of self-play!! arxiv.org/abs/2507.06466

10.07.2025 18:17 👍 2 🔁 0 💬 1 📌 0

Just an fyi, the RLC deadlines got pushed back by one week!

14.02.2025 13:14 👍 1 🔁 0 💬 0 📌 0

The Sun Eater books, while not high fantasy exactly, scratched an extremely similar itch for me this past year.

Just finished Wind and Truth on Monday and have a lot of thoughts that I'm still working through

26.12.2024 09:15 👍 0 🔁 0 💬 0 📌 0

Our in-progress work Quality-Diversity Self-Play (w/ @cong-ml.bsky.social and @jeffclune.com) will have a poster presentation at #NeurIPS2024 workshops (@IMOLNeurIPS2024 Sunday West meeting room 217 - 219 and OpenworldAgents Sunday East Meeting Room 1-3, Foyer). Please come visit us!

14.12.2024 18:59 👍 9 🔁 1 💬 0 📌 1

I'm not letting myself get my copy until after Neurips and i've never had such a hard time just saying screw everything and reading for the next week straight. Btw, catch up at neurips?

06.12.2024 03:45 👍 2 🔁 0 💬 0 📌 0

Aaron Dharna

Latest posts by Aaron Dharna @aadharna