Ruizhe Li's Avatar

Ruizhe Li

@ruizheli

Assistant Professor at University of Aberdeen | Postdoc at UCL | PhD at University of Sheffield | mechanistic interpretability & multimodal LLMs | https://www.ruizhe.space

152
Followers
259
Following
8
Posts
01.12.2024
Joined
Posts Following

Latest posts by Ruizhe Li @ruizheli

Post image Post image

Time to go to Vienna again! I’ll present one mechinterp work on 28th 17:00-18:30 Hall X4 X5

We have another work for multilingual instruction-following benchmark on 28th 14:00 at 1.15-16. Very honored to be involved in this oral work! Feel free to reach out & chat for mechinterp!

24.07.2025 08:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Paper page - Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation Join the discussion on this paper page

This work was collaborated with Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang and Prof. Emine Yilmaz.
Our paper: huggingface.co/papers/2505....
Our code: github.com/ruizheliUOA/...

03.06.2025 17:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This finding confirms the contribution of MLP located using ARC-JSD above, and it is reasonable because Chinese is one of main language resources used in Qwen2 pre- and post-training.

03.06.2025 17:25 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

In our case study for located MLP layers in Qwen2 models, we identify several correct decoded tokens are gradually transferred from their Chinese format to the English version, such as δΈ€εͺ(A), ζ‹₯ζœ‰(has) and 翅膀(wings) in the figure.

03.06.2025 17:25 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

In addition, we move forward to locate relevant attention heads and MLP layers using JSD from mechinterp view. We found that JSD-based mechinterp can identify context attribution-related attention heads and MLPs, which are mainly distributed around intermediate or higher layers.

03.06.2025 17:24 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We evaluate our ARC-JSD on TyDi QA, Hotpot QA and MuSiQue datasets using Qwen2-1.5B/7B-IT and Gemma2-2B/9B-IT, which can achieve higher attribution acc than baseline.

03.06.2025 17:23 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ€”Is it possible to accurately and effectively attribute RAG response to relevant context without finetuning or further training surrogate model?

πŸ’‘We propose an inference-time method called ARC-JSD using JSD for RAG context attribution, which only needs O(sent_num + 1)πŸš€

03.06.2025 17:18 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

Very glad to share a good news! One main conference and one findings are accepted at #ACL2025! See you again at Vienna!

16.05.2025 22:48 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0