Indeed, but we also show the other side of the coin: personalized generation and its evaluation remain extremely challenging, and IMO professional human translators are still essential to produce a truly original, publication-ready final work as of today.
05.01.2026 12:41
๐ 0
๐ 0
๐ฌ 0
๐ 0
Want models to translate in the style you actually like?
Our paper is here! See you in Marocco! ๐ฒ๐ฆ
04.01.2026 18:11
๐ 5
๐ 1
๐ฌ 0
๐ 0
EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling
With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequenc...
Takeaway: EAGer shows we can be MORE efficient & MORE effective by letting models focus compute where it matters most.
๐Paper: arxiv.org/abs/2510.11170
๐ปCode: github.com/DanielSc4/EA...
โจHuge thanks to my mentors and collaborators @leozotos.bsky.social E. Fersini @malvinanissim.bsky.social A. รstรผn
16.10.2025 12:07
๐ 2
๐ 0
๐ฌ 0
๐ 0
Results: Across 3B-20B models, EAGer cuts budget by up to 80%, boosts perf 13% w/o labels & 37% w/ labels on AIME.
As M scales, EAGer consistently:
๐ Achieves HIGHER Pass@k,
โ๏ธ Uses FEWER tokens than baseline,
๐บ Shifts the Pareto frontier favorably across all tasks.
๐งต5/
16.10.2025 12:07
๐ 0
๐ 0
๐ฌ 1
๐ 0
The fun part: EAGer-adapt reallocates saved budget to "saturating" prompts hitting the M cap, no labels needed! โ Training & Verification-Free ๐
Full EAGer uses labels to catch failing prompts, lowering threshold to branch or add sequences. Great for verifiable pipelines!
๐งต4/
16.10.2025 12:07
๐ 0
๐ 0
๐ฌ 1
๐ 0
EAGer works by monitoring token entropy during generation. High entropy token โ It branches to explore new paths (reusing prefixes). Token with low entropy โ It continues a single path.
We cap at M sequences/prompt, saving budget on easy ones without regen. Training-free!
๐งต3/
16.10.2025 12:07
๐ 1
๐ 0
๐ฌ 1
๐ 0
Why? Reasoning LLMs shine with CoTs, but full parallel samplingโgenerating multiple paths per promptโis inefficient ๐ค.
It wastes compute on redundant, predictable tokens, esp. for easy prompts. Hard prompts need more exploration but get the same budget. Enter EAGER๐ง !
๐งต2/
16.10.2025 12:07
๐ 1
๐ 0
๐ฌ 1
๐ 0
You can easily save up to 65% of compute while improving performance on reasoning tasks ๐คฏ ๐
Meet EAGer: We show that monitoring token-level uncertainty lets LLMs allocate compute dynamically - spending MORE on hard problems, LESS on easy ones.
๐งต๐
16.10.2025 12:07
๐ 1
๐ 1
๐ฌ 1
๐ 0
Iโll be attending the NEMI 2025 workshop this Friday and presenting a poster๐.
Happy to chat about cool interpretability stuff there!
20.08.2025 22:42
๐ 1
๐ 0
๐ฌ 0
๐ 0
๐ Whatโs happening in the model?
We find that SAE steering and multi-shot prompting impact internal representations similarly, suggesting insight from user examples are summarized with extra interpretability potential (look at latents) and better efficiency (no long context) 6/
23.05.2025 12:23
๐ 1
๐ 0
๐ฌ 1
๐ 0
๐ Across 7 languages, our SAE-based method matches or outperforms traditional prompting methods! Our method obtains better human-like translations (H) personalization accuracy (P), and maintains translation quality (Comet โ๏ธ @nunonmg.bsky.social) especially for smaller LLMs. 5/
23.05.2025 12:23
๐ 1
๐ 0
๐ฌ 1
๐ 0
๐ก We compare prompting (zero and multi-shot + explanations) and inference-time interventions (ActAdd, REFT and SAEs).
Following SpARE (@yuzhaouoe.bsky.social @alessiodevoto.bsky.social), we propose โจ contrastive SAE steering โจ with mutual info to personalize literary MT by tuning latent features 4/
23.05.2025 12:23
๐ 4
๐ 2
๐ฌ 1
๐ 0
๐ But can models recognize and replicate individual translator styles?:
โ Classifiers can find styles with high acc. (humans kinda donโt)
โ Multi-shot prompting boosts style a lot
โ We can detect strong style traces in activations (esp. mid layers) 3/
23.05.2025 12:23
๐ 2
๐ 0
๐ฌ 1
๐ 0
๐ Literary translation isn't just about accuracy, but also creatively conveying meaning across languages. But LLMs prompted for MT are very literal. Prompting & steering to the rescue!
Can we personalize LLMโs MT when few examples are available, without further tuning? ๐ 2/
23.05.2025 12:23
๐ 2
๐ 0
๐ฌ 1
๐ 0
๐ข New paper: Applied interpretability ๐ค MT personalization!
We steer LLM generations to mimic human translator styles on literary novels in 7 languages. ๐
SAE steering can beat few-shot prompting, leading to better personalization while maintaining quality.
๐งต1/
23.05.2025 12:23
๐ 20
๐ 5
๐ฌ 2
๐ 2
Hellooo ๐
04.12.2024 13:45
๐ 1
๐ 0
๐ฌ 1
๐ 0
Hey hello! ๐
28.11.2024 11:05
๐ 1
๐ 0
๐ฌ 0
๐ 0
Now on ๐ฆ!
21.11.2024 14:51
๐ 3
๐ 0
๐ฌ 0
๐ 0
Hello!
19.11.2024 19:39
๐ 1
๐ 0
๐ฌ 0
๐ 0
๐
19.11.2024 19:24
๐ 0
๐ 0
๐ฌ 1
๐ 0
It was great, I'm starting to get tickets for next year!
17.11.2024 20:30
๐ 1
๐ 0
๐ฌ 0
๐ 0
๐๐โโ๏ธ
16.11.2024 23:58
๐ 1
๐ 0
๐ฌ 0
๐ 0