SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods...
Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift. While many...
SKADA-Bench : Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities, has been published published in TMLR today π. It was a huge team effort to design (and publish) an open source fully reproducible DA benchmark π§΅1/n. openreview.net/forum?id=k9F...
29.07.2025 12:54
π 16
π 7
π¬ 1
π 0
π We are happy to organize the BERTΒ²S workshop @neuripsconf.bsky.social 2025 on Recent Advances in Time Series Foundation Models.
π berts-workshop.github.io
πSubmit by August 22
πSpeakers and panelists: Chenghao Liu, Mingsheng Long, Zoe Piran, Danielle C. Maddix, Ameet Talwalkar, Qingsong Wen
22.07.2025 14:41
π 5
π 2
π¬ 0
π 0
Here is the recording with the slides for those interested!
π€ youtu.be/UONvP1TL0-g?...
π drive.google.com/file/d/14ZIo...
π arxiv.org/pdf/2410.02724
@cohere.com @cohereforai.bsky.social
24.06.2025 16:07
π 2
π 2
π¬ 0
π 0
π Very happy to be presenting Large Language Models as Markov Chains at Cohere Labs on June 19th at 6 pm CET (Paris time)!!
Huge thanks to Andrej JovanoviΔ @cohere.com @cohereforai.bsky.social for the invitation π€
Paper: arxiv.org/pdf/2410.02724
Learn more: cohere.com/events/Coher...
13.06.2025 07:54
π 3
π 0
π¬ 0
π 1
Skada Sprint Alert: Contribute to Domain Adaptation in Python
π Machine learning models often fail when the data distribution changes between training and testing. Thatβs where Domain Adaptation comes in β helping models stay reliable across domains.
20.05.2025 09:30
π 12
π 6
π¬ 1
π 0
Congrats!
17.04.2025 13:19
π 2
π 0
π¬ 0
π 0
πPaper: arxiv.org/pdf/2410.02724
πSlides: drive.google.com/file/d/1JDrV... (better with Adobe Reader for nice GIFs)
πWebsite: ambroiseodt.github.io
28.02.2025 13:03
π 0
π 0
π¬ 0
π 0
π€Thanks a lot @haeggee.bsky.social and @mjaggi.bsky.social for having me in the MLO group at EPFL @icepfl.bsky.social to present "Large Language Models as Markov Chains".
Slides are available on my website (link in thread).
π New experiments with Llama and Gemma models in the updated paper!
28.02.2025 13:03
π 4
π 2
π¬ 1
π 0
π€ Very happy to have (humbly) contributed to this work!
This is a collab with the usual open-source suspects from Inria, @polytechniqueparis.bsky.social and @univparissaclay.bsky.social.
Check it out if you are interested in open-source reproducible research π
12.02.2025 16:09
π 2
π 0
π¬ 0
π 0
π Policy gradient methods like DeepSeekβs GRPO are great for finetuning LLMs via RLHF.
But what happens when we swap autoregressive generation for discrete diffusion, a rising architecture promising faster & more controllable LLMs?
Introducing SEPO !
π arxiv.org/pdf/2502.01384
π§΅π
04.02.2025 15:42
π 6
π 2
π¬ 1
π 0
Finally, I can't thank you enough Wes and @viviencabannes.bsky.social for this collab: you are a rare combination of super-smart and fun to work with!
Hopefully, more to come soonπ€
"Moi, si je devais rΓ©sumer ma vie aujourdβhui avec vous, je dirais que cβest dβabord des rencontres."
04.02.2025 11:56
π 0
π 0
π¬ 0
π 0
We want to thank Elvis Dohmatob, Eshaan Nichani, @giupaolo.bsky.social , Faniriana Rakoto Endor, and Ievgen Redko for fruitful discussions during the elaboration of this work π
04.02.2025 11:56
π 1
π 0
π¬ 1
π 0
From the theoretical side, we show that clustering heads can be learned via gradient descent and provide theoretical insights into the two-stage learning observed in practice.
6/π§΅
04.02.2025 11:56
π 0
π 0
π¬ 1
π 0
We investigate loss spikes, suggesting potential strategies for mitigation, which could lead to more stable training processes. We also peek into the transferability of circuits to showcase the usefulness of curriculum learning and data curation.
5/π§΅
04.02.2025 11:56
π 0
π 0
π¬ 1
π 0
In the second, we unveil "πͺπππππππππ π―πππ
π", circuits that learn the invariance of the task. Their training dynamic is in two phases: 1) clustering of the attention embeddings according to invariance and 2) classifier fitting.
4/π§΅
04.02.2025 11:56
π 0
π 0
π¬ 1
π 0
In the first paper, we show how GD (gradient descent) reinforces useful circuits in transformers while pruning others to create sub-circuits that help solve complex tasks by breaking them down into intermediate reasoning steps.
3/π§΅
04.02.2025 11:56
π 0
π 0
π¬ 1
π 0
We consider the ππππππ πππ
ππππ ππ
π
πππππ problem where the inputs are sequences of L tokens in the ring of integers modulo p and the corresponding targets are the sum of the first k terms modulo p. Formally, we aim to learn the mapping:
2/π§΅
04.02.2025 11:56
π 0
π 0
π¬ 1
π 0
πProud to share our work on the training dynamics in Transformers with Wassim Bouaziz & @viviencabannes.bsky.social @Inria @MetaAI
πEasing Optimization Paths arxiv.org/pdf/2501.02362 (accepted @ICASSP 2025 π₯³)
πClustering Heads π₯https://arxiv.org/pdf/2410.24050
π₯οΈ github.com/facebookrese...
1/π§΅
04.02.2025 11:56
π 5
π 4
π¬ 1
π 1
π€Presenting our work on Unsupervised Accuracy Estimation at #NeurIPS2024 this week!
βπΎPoster Session 4 West - on Thu. at 4:30 pm
π Poster #4310 - East Exhibit Hall A-C
DM me if you'd like to chat :)
10.12.2024 14:44
π 2
π 0
π¬ 0
π 0
Checkout the new version of this awesome domain adaptation library! So nice to work with such good people π€
06.12.2024 19:25
π 2
π 1
π¬ 0
π 0
Hi @vickiboykis.com, thanks for your interest. Donβt hesitate if you have any questions on the paper, we would be happy to help with @ozekri.bsky.social :)
04.12.2024 10:23
π 3
π 0
π¬ 1
π 0
Ahah, thanks, still a lot to learn before that π
03.12.2024 21:35
π 0
π 0
π¬ 0
π 0
π€This is joint work with Renchunzi Xie, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, and Bo An.
Finally, I want to thank @ramealexandre.bsky.social Youssef Attia El Hili for fruitful discussions during the elaboration of this work.
π§΅/π§΅
03.12.2024 16:58
π 1
π 0
π¬ 0
π 0
π₯³Finally the awaited surprise!
Our work includes a result akin to the one of
@petar-v.bsky.social in βsoftmax is not sharp enoughβ (arxiv.org/pdf/2410.01104). We discuss its implications in the context of unsupervised accuracy estimation.
12/π§΅
03.12.2024 16:58
π 1
π 0
π¬ 1
π 0
Last but not least, we discuss in great detail the limitations of our approach and how to formalize prediction bias in unsupervised settings. We believe this is a missing piece in the current literature and hope our work can be a first step toward bridging this gap.
11/π§΅
03.12.2024 16:58
π 1
π 0
π¬ 1
π 0
We also qualitatively demonstrate the superiority of our approach.
10/π§΅
03.12.2024 16:58
π 0
π 0
π¬ 1
π 0
We obtain SOTA performance for various shifts (subpopulation, synthetic, natural) and architectures (ResNet, ConvNext, and Vision Transformers).
9/π§΅
03.12.2024 16:58
π 0
π 0
π¬ 1
π 0
Thus, we truncate the exponential when the model is not calibrated. As we cannot access test labels, we provide a criterion to select the proper normalization to use automatically: softmax or Taylor. This boils down to a simple three-step recipe:
8/π§΅
03.12.2024 16:58
π 0
π 0
π¬ 1
π 0
Hereβs where it gets tricky! How do you normalize the logits? Simply using the softmax is bad as it's overconfident (see arxiv.org/pdf/2310.14814 and arxiv.org/pdf/2205.09310). We even show that it accumulates prediction bias in miscalibrated scenarios.
7/π§΅
03.12.2024 16:58
π 0
π 0
π¬ 1
π 0