Ambroise Odonnat 's Avatar

Ambroise Odonnat

@ambroiseodt

Ph.D. student in Machine Learning at Inria. Website: https://ambroiseodt.github.io/ Blog: https://logb-research.github.io

87
Followers
128
Following
36
Posts
18.11.2024
Joined
Posts Following

Latest posts by Ambroise Odonnat @ambroiseodt

SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods... Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift. While many...

SKADA-Bench : Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities, has been published published in TMLR today πŸš€. It was a huge team effort to design (and publish) an open source fully reproducible DA benchmark 🧡1/n. openreview.net/forum?id=k9F...

29.07.2025 12:54 πŸ‘ 16 πŸ” 7 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸš€ We are happy to organize the BERTΒ²S workshop @neuripsconf.bsky.social 2025 on Recent Advances in Time Series Foundation Models.
🌐 berts-workshop.github.io
πŸ“œSubmit by August 22
πŸŽ“Speakers and panelists: Chenghao Liu, Mingsheng Long, Zoe Piran, Danielle C. Maddix, Ameet Talwalkar, Qingsong Wen

22.07.2025 14:41 πŸ‘ 5 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

Here is the recording with the slides for those interested!
🎀 youtu.be/UONvP1TL0-g?...
πŸ“Š drive.google.com/file/d/14ZIo...
πŸ“‘ arxiv.org/pdf/2410.02724

@cohere.com @cohereforai.bsky.social

24.06.2025 16:07 πŸ‘ 2 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

πŸš€ Very happy to be presenting Large Language Models as Markov Chains at Cohere Labs on June 19th at 6 pm CET (Paris time)!!

Huge thanks to Andrej JovanoviΔ‡ @cohere.com @cohereforai.bsky.social for the invitation πŸ€—

Paper: arxiv.org/pdf/2410.02724
Learn more: cohere.com/events/Coher...

13.06.2025 07:54 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1
Post image

Skada Sprint Alert: Contribute to Domain Adaptation in Python

πŸ“– Machine learning models often fail when the data distribution changes between training and testing. That’s where Domain Adaptation comes in β€” helping models stay reliable across domains.

20.05.2025 09:30 πŸ‘ 12 πŸ” 6 πŸ’¬ 1 πŸ“Œ 0

Congrats!

17.04.2025 13:19 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ“‘Paper: arxiv.org/pdf/2410.02724
πŸ“ˆSlides: drive.google.com/file/d/1JDrV... (better with Adobe Reader for nice GIFs)
🌐Website: ambroiseodt.github.io

28.02.2025 13:03 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

πŸ€—Thanks a lot @haeggee.bsky.social and @mjaggi.bsky.social for having me in the MLO group at EPFL @icepfl.bsky.social to present "Large Language Models as Markov Chains".

Slides are available on my website (link in thread).

πŸŽ‰ New experiments with Llama and Gemma models in the updated paper!

28.02.2025 13:03 πŸ‘ 4 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

πŸ€— Very happy to have (humbly) contributed to this work!

This is a collab with the usual open-source suspects from Inria, @polytechniqueparis.bsky.social and @univparissaclay.bsky.social.

Check it out if you are interested in open-source reproducible research πŸ˜‡

12.02.2025 16:09 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

πŸš€ Policy gradient methods like DeepSeek’s GRPO are great for finetuning LLMs via RLHF.

But what happens when we swap autoregressive generation for discrete diffusion, a rising architecture promising faster & more controllable LLMs?

Introducing SEPO !

πŸ“‘ arxiv.org/pdf/2502.01384

πŸ§΅πŸ‘‡

04.02.2025 15:42 πŸ‘ 6 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Post image

Finally, I can't thank you enough Wes and @viviencabannes.bsky.social for this collab: you are a rare combination of super-smart and fun to work with!

Hopefully, more to come soon🀠

"Moi, si je devais rΓ©sumer ma vie aujourd’hui avec vous, je dirais que c’est d’abord des rencontres."

04.02.2025 11:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

We want to thank Elvis Dohmatob, Eshaan Nichani, @giupaolo.bsky.social , Faniriana Rakoto Endor, and Ievgen Redko for fruitful discussions during the elaboration of this work πŸ˜‡

04.02.2025 11:56 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

From the theoretical side, we show that clustering heads can be learned via gradient descent and provide theoretical insights into the two-stage learning observed in practice.
6/🧡

04.02.2025 11:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We investigate loss spikes, suggesting potential strategies for mitigation, which could lead to more stable training processes. We also peek into the transferability of circuits to showcase the usefulness of curriculum learning and data curation.
5/🧡

04.02.2025 11:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

In the second, we unveil "π‘ͺπ’π’–π’”π’•π’†π’“π’Šπ’π’ˆ 𝑯𝒆𝒂𝒅𝒔", circuits that learn the invariance of the task. Their training dynamic is in two phases: 1) clustering of the attention embeddings according to invariance and 2) classifier fitting.
4/🧡

04.02.2025 11:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

In the first paper, we show how GD (gradient descent) reinforces useful circuits in transformers while pruning others to create sub-circuits that help solve complex tasks by breaking them down into intermediate reasoning steps.

3/🧡

04.02.2025 11:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We consider the 𝒔𝒑𝒂𝒓𝒔𝒆 π’Žπ’π’…π’–π’π’‚π’“ π’‚π’…π’…π’Šπ’•π’Šπ’π’ problem where the inputs are sequences of L tokens in the ring of integers modulo p and the corresponding targets are the sum of the first k terms modulo p. Formally, we aim to learn the mapping:

2/🧡

04.02.2025 11:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸš€Proud to share our work on the training dynamics in Transformers with Wassim Bouaziz & @viviencabannes.bsky.social @Inria @MetaAI

πŸ“Easing Optimization Paths arxiv.org/pdf/2501.02362 (accepted @ICASSP 2025 πŸ₯³)

πŸ“Clustering Heads πŸ”₯https://arxiv.org/pdf/2410.24050

πŸ–₯️ github.com/facebookrese...

1/🧡

04.02.2025 11:56 πŸ‘ 5 πŸ” 4 πŸ’¬ 1 πŸ“Œ 1
Preview
GitHub - abenechehab/dicl: Official implementation of DICL (Disentangled In-Context Learning), featured in the paper Zero-shot Model-based Reinforcement Learning using Large Language Models. Official implementation of DICL (Disentangled In-Context Learning), featured in the paper Zero-shot Model-based Reinforcement Learning using Large Language Models. - abenechehab/dicl

Happy to see Disentangled In-Context Learning accepted at ICLR 2025 πŸ₯³

Make zero-shot reinforcement learning with LLMs go brrr πŸš€

πŸ–₯️ github.com/abenechehab/...

πŸ“œ arxiv.org/pdf/2410.11711

Congrats Abdelhakim (abenechehab.github.io) for leading it, always fun working with nice and strong people πŸ€—

25.01.2025 13:10 πŸ‘ 5 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

🎀Presenting our work on Unsupervised Accuracy Estimation at #NeurIPS2024 this week!

βœ‹πŸΎPoster Session 4 West - on Thu. at 4:30 pm

πŸ“ Poster #4310 - East Exhibit Hall A-C

DM me if you'd like to chat :)

10.12.2024 14:44 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Checkout the new version of this awesome domain adaptation library! So nice to work with such good people πŸ€—

06.12.2024 19:25 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Hi @vickiboykis.com, thanks for your interest. Don’t hesitate if you have any questions on the paper, we would be happy to help with @ozekri.bsky.social :)

04.12.2024 10:23 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Ahah, thanks, still a lot to learn before that πŸ˜…

03.12.2024 21:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

πŸ€—This is joint work with Renchunzi Xie, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, and Bo An.

Finally, I want to thank @ramealexandre.bsky.social Youssef Attia El Hili for fruitful discussions during the elaboration of this work.

🧡/🧡

03.12.2024 16:58 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

πŸ₯³Finally the awaited surprise!
Our work includes a result akin to the one of
@petar-v.bsky.social in β€œsoftmax is not sharp enough” (arxiv.org/pdf/2410.01104). We discuss its implications in the context of unsupervised accuracy estimation.

12/🧡

03.12.2024 16:58 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Last but not least, we discuss in great detail the limitations of our approach and how to formalize prediction bias in unsupervised settings. We believe this is a missing piece in the current literature and hope our work can be a first step toward bridging this gap.

11/🧡

03.12.2024 16:58 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We also qualitatively demonstrate the superiority of our approach.

10/🧡

03.12.2024 16:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We obtain SOTA performance for various shifts (subpopulation, synthetic, natural) and architectures (ResNet, ConvNext, and Vision Transformers).

9/🧡

03.12.2024 16:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Thus, we truncate the exponential when the model is not calibrated. As we cannot access test labels, we provide a criterion to select the proper normalization to use automatically: softmax or Taylor. This boils down to a simple three-step recipe:

8/🧡

03.12.2024 16:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Here’s where it gets tricky! How do you normalize the logits? Simply using the softmax is bad as it's overconfident (see arxiv.org/pdf/2310.14814 and arxiv.org/pdf/2205.09310). We even show that it accumulates prediction bias in miscalibrated scenarios.

7/🧡

03.12.2024 16:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0