π Excited to share our latest work, "Isolating Culture Neurons in Multilingual Large Language Models".
π» Data & code: github.com/namazifard/C...
π Preprint: arxiv.org/abs/2508.02241
π Excited to share our latest work, "Isolating Culture Neurons in Multilingual Large Language Models".
π» Data & code: github.com/namazifard/C...
π Preprint: arxiv.org/abs/2508.02241
RAG is a powerful way to improve LLMs' answering abilities across many languages. But how do LLMs deal with multilingual contexts? Do they answer consistently when the retrieved info is provided to them in different languages?
Joint work w/ @jiruiqi.bsky.social & Raquel_FernΓ‘ndez
See thread! ‡οΈ
ποΈ A simple trick improves embedding retrieval performance even without further training.
ZCA whitening increases isotropy of the embedding space and thereby helps retrieval
Paper by Andor Diera and with @lukasgalke.bsky.social at ESANN 2025.
Preprint: arxiv.org/abs/2411.17538
Thrilled to share our #ICLR2025 work on Meta-Causal States! π Causal graphs evolve with dynamic systems & agent actions. We show how to cluster causal models by qualitative behavior, revealing hidden dynamics & emergent relationships π #Causality #ML
https://arxiv.org/abs/2410.13054
Deep neural networks and humans both benefit from compositional language structure. New paper by @lukasgalke.bsky.social, Yoav Ram, and @limorraviv.bsky.social. doi.org/10.1038/s414....
What can we conclude? Humans and deep nets are not so different after all when learning a new language. The simplicity bias of overparameterized models seems to guide them towards learning compositional structures, even though they could easily memorize all different combinations.
When analyzing the learning trajectory of RNNs throughout training, we make several other interesting observations: medium-structured languages have an learnability advantage early in training (likely due to same word being used for multiple meanings) but fall behind high-structured languages later.
We find a similar effect when looking at memorization errors. In the memorization test, the task for in-context LLMs boils down to copying a word that is present earlier in the prompt. But even here, we can see an advantage of language structure.
All these learning systems, small RNNs, pre-trained LLMs, and humans, show *very* similar memorization and generalization behavior -- with more structured languages leading to generalizations that are more systematic generalization and more similar to the generalization of human participants.
Investigating the relationship between language learning and language structure, we find striking similarities between humans and language models: small recurrent neural networks trained from scratch and large pre-trained language models via in-context learning.
π₯ Now finally out in Nature Communications:
Deep neural networks and humans both benefit from compositional structure
with Yoav Ram and @limorraviv.bsky.social
Paper link right away: rdcu.be/d5f2e
π§΅β¬οΈ
Two more days left to apply for PhD positions on training multilingual language models at the Centre for Machine Learning in the Department of Mathematics and Computer Science (IMADA), University of Southern Denmark (SDU).
tinyurl.com/dfm2025phd
Application deadline: Dec 19, 2024
tell me about LLMs tool use best practices. I know the high level, and want to learn about implementation/prompting details, e.g.:
- how do you best feed in the tool specs or DSL to the LLM?
- how do you ask it to indicate a tool use (which wrapper / indicator)
- how do you ask for nested calls
etc
We have some openings for PhD/Postdoc positions on multilingual language modeling at SDU's Centre for Machine Learning, Denmark. Topics go down to the core of pre-training and instruction tuning and adjacent topics such as efficient language modeling. Please consider to apply :)
Research positions on LLMs and available at the SDU Centre for ML:
tinyurl.com/dfm2025phd
tinyurl.com/dfm2025postdoc
Thanks!
I'm Lukas, working on machine learning and natural language processing. I'm particularly interested in interpretability of language models, efficient language models, continual learning, ood generalization, and machine communication.
I hope to find a community like the ex-twitter ML community here.