Arvind Nagaraj's Avatar

Arvind Nagaraj

@narvind

Deep Learning | ML research | Ex.Robotics at Invento | πŸ”— https://narvind2003.github.io Here to strictly talk about ML, NNs and related ideas. Casual stuff on x.com/nagaraj_arvind

237
Followers
950
Following
117
Posts
19.11.2024
Joined
Posts Following

Latest posts by Arvind Nagaraj @narvind

Preview
The Loop is Back: Why HRM is the Most Exciting AI Architecture in Years Years ago, I sat in Jeremy Howard’s FastAI class, right at the dawn of a new era. He was teaching us ULMFiT, a method he (& Sebastian…

It's a story about why QKV is magic, my love for the loop, and why HRM might be the blueprint for the next generation of AI reasoning.
My post, written with the help of an LLM (the irony!), is here. I poured my heart into this one:
medium.com/@gedanken.th...

#AI #DeepLearning #RNN #Transformer #HRM

07.08.2025 08:49 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

The Hierarchical Reasoning Model (HRM) isn't just another model. It's a deep synthesis. It marries the iterative soul of an RNN (minus the BPTT nightmare) with the raw power of modern Attention.
I wrote a deep dive on why this is a full-circle moment for me, going back to the RNN finetuning days.

07.08.2025 08:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

What makes HRM truly special is its ability to "think fast and slow."Its ACT module isn't just a stop signal; it's a cognitive engine that learns to allocate effort.
It's the closest we've come yet to embodying Prof. Kahneman's vision of a System 1/2 mind in code.

07.08.2025 08:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

But how does it fix mistakes buried deep in the past? By not letting them stay in the past.
Each new "Thinking Session" (the M-loop) starts with the flawed result of the last one. It forces the model to confront its own errors until the logic is perfect.

07.08.2025 08:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

So how does HRM work? Imagine a tiny,2-person company.
🧠 A strategic CEO (H-module) who thinks slow, sees the big picture, and sets the overall strategy.
⚑️ A diligent Worker (L-module) who thinks fast, executing the details of the CEO's plan.
This separation allows for truly deep, iterative thought.

07.08.2025 08:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

The Hierarchical Reasoning Model (HRM) isn't just another model. It's a deep synthesis. It marries the iterative soul of an RNN (minus the BPTT nightmare) with the raw power of modern Attention.

07.08.2025 08:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Then, last month, a paper dropped that changes everything.
This is the architecture I've been waiting for since 2018. A thread on HRM. 🧡

07.08.2025 08:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

For years, I died a little inside every time I taught the Transformer model, grudgingly accepting that the elegant loop of the RNN was dead.

07.08.2025 08:49 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

You're supposed to what? Swallow the toothpaste?

30.03.2025 04:52 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ”₯πŸ”₯
MCTS rollout pruning, python interpreter verifier and iterative self improvement of intermediate steps during each round of training.
Brilliant stuff thisπŸ’ͺ
rStar-Math is the kind of paper I wish to see more of!

09.01.2025 23:45 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image

(1/7) For a while we've been working on an ambitious problem: The National Archive of Mexico #AGN holds 58 linear km of documents. Only a drop of this β€˜ocean’ has been studied due to many challenges. But great news: we are now unlocking this information! A thread 🧡 (1/8) #HTR #AI #CulturalHeritage

17.12.2024 14:15 πŸ‘ 140 πŸ” 60 πŸ’¬ 5 πŸ“Œ 13

Computer Vision: Fact & Fiction is now available on YouTube πŸ™ŒπŸΌ I made a playlist for it with the seven chapters. Enjoy this time capsule from two decades ago!

19.12.2024 16:50 πŸ‘ 58 πŸ” 16 πŸ’¬ 4 πŸ“Œ 4

I like how the new gemini 2.0 thinking model insists like a child...lol

19.12.2024 18:38 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0


Taking a time machine within a time machine... stealing someone's consciousness...the ideas were next level!
The guy is a beast.
It's a shame Shane Carruth couldn't carry on making more amazing films.

07.12.2024 20:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Yooo...a primer fan?
There are so many incredible moments in this film.
Wow...have you seen 'Upstream color' as well?

07.12.2024 19:57 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Wow!
I should read this!

05.12.2024 15:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Ah...

03.12.2024 18:22 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

What does "fuch" mean?

03.12.2024 14:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Diffusion transformer (DiT) ftw!!

03.12.2024 08:32 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

6. V is not rotated. Only Q and K are rotated relative to each other. Farther tokens now have a larger angle between them.
7. The encoding signal is not going to die out. It can be preserved by doing it as part of the softmax dot product attn.
8. What a gorgeous 😍 idea...

03.12.2024 06:32 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

4. RoPE takes this operation from the beginning of the input to inside the attention operation itself.
5. There are 2 benefits: the semantic meaning of the token is not corrupted. We only rotate the vector, preserving the magnitude.

03.12.2024 06:32 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

TL;DR:
1. We need a way to encode token positions when feeding them as input into the transformer
2. We could just concat 1,2,3 etc. but this doesn't scale for variable lengths
3. Noam Shazeer showed show sin and cos waves can produce a beautiful pattern that encodes relative positions bw tokens.

03.12.2024 06:32 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
fleetwood.dev fleetwood.dev

Link: fleetwood.dev/posts/you-co...

03.12.2024 06:32 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

RoPE has been the one πŸ’― genuine upgrade to the vanilla Vaswani transformer.

This beautiful blogpost by Chris Fleetwood explains the significance and how rotations of Q & K preserves meaning(magnitude) while encodes relative positions(angle shift) πŸ”₯πŸ”₯

03.12.2024 06:32 πŸ‘ 13 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Post image

Why does ChatGPT refuse to say "David Mayer" ?? πŸ€”
I have tried a bunch of ways and it refuses to!! 😭

01.12.2024 06:38 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ‘ŒπŸ™

30.11.2024 03:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image

πŸ€” Can you turn your vision-language model from a great zero-shot model into a great-at-any-shot generalist?

Turns out you can, and here is how: arxiv.org/abs/2411.15099

Really excited to this work on multimodal pretraining for my first bluesky entry!

🧡 A short and hopefully informative thread:

28.11.2024 14:32 πŸ‘ 134 πŸ” 24 πŸ’¬ 2 πŸ“Œ 7

πŸ˜„

29.11.2024 10:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

SIGGRAPH'25 (form): 48 days.
RSS'25 (abs): 49 days.
SIGGRAPH'25 (paper-md5): 55 days.
RSS'25 (paper): 56 days.
ICML'25: 62 days.
RLC'25 (abs): 77 days.
RLC'25 (paper): 84 days.
ICCV'25: 97 days.

29.11.2024 10:00 πŸ‘ 12 πŸ” 1 πŸ’¬ 0 πŸ“Œ 2

We should give this place a serious try...
It may work πŸ™

29.11.2024 10:07 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0