NEW: OpenAI is releasing two free open models today, ahead of the GPT-5 launch. One of the open-weight "GPT-OSS" models is small enough to run on a laptop. More from @alexeheath.com π www.theverge.com/openai/71878...
NEW: OpenAI is releasing two free open models today, ahead of the GPT-5 launch. One of the open-weight "GPT-OSS" models is small enough to run on a laptop. More from @alexeheath.com π www.theverge.com/openai/71878...
According to new research by waymo, self driving cars neural nets perform better according to power scaling laws. More data and compute = better performance. waymo.com/blog/2025/06...
Major reasoning models trained w RL so far with technical reports:
2025-01-22 β DeepSeek R1 β arxiv.org/abs/2501.12948
2025-01-22 β Kimi 1.5 β arxiv.org/abs/2501.12599
2025-03-31 β Open-Reasoner-Zero β arxiv.org/abs/2503.24290
2025-04-10 β Seed 1.5-Thinking β arxiv.org/abs/2504.13914
...
True but looking for data to back up what "you believe" is already a good sign, right? It seems better than trying to claim things without anything to back the claims. And at least, you can then argue with people who disagree based on the data on a scientific basis.
Interested in more insights about the progress of AI, you can check out these two sources:
www.bondcap.com/report/tai
www.ben-evans.com/presentations
State of AI in 4 plots.
The 200 ELO points difference between recent models and a model that is 2 years old means that a human rater has ~75% chance of preferring an answer from a recent model.
Based on available data, all indicators about the progress of AI (in particular LLMs) remain strong.
Not long ago, people laughed at the idea of AI generating minutes-long realistic videos. Now it's reality with tools like Sora and Veo 3 leading the way. Full movies in cinemas soon, generated from just a few prompts...
Shoutout to the creators of PQN and for the cleanRL baselines.
My co-authors: Jacob Kooi and Zhao Yang
Paper: arxiv.org/abs/2505.15345
Codebase: github.com/Jacobkooi/Ha...
Directly implementing the Hadamax encoder in other algorithms such as C51 also shows over 60% improvements.
The Hadamax architecture can be implemented in any pixel-based encoder. The most important design choices are:
1. Convolutional Hadamard Representations.
2. Max-pooling instead of convolutional down-sampling.
3. Gaussian Error Linear Unit activations.
Without changing any algorithmic hyperparameters, this encoder substitution places Hadamax-PQN among state-of-the-art model-free reinforcement learning, while remaining an order of magnitude faster than Rainbow.
π’New paper on arXiv: Hadamax Encoding: Elevating Performance in Model-Free Atari. (arxiv.org/abs/2505.15345)
Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithmβs Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!
Making stock market predictions (especially short/medium term) is tempting but unless you have privileged information, you might as well try predicting random noise. Financial markets are self-adapting systems where any predictable pattern tends to be exploited and arbitraged away by participants.
Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.
π magazine.sebastianraschka.com/p/the-state-...
This is indeed a great position paper, I like it a lot:
- pre-training w next token prediction creates local minima in reasoning we can't escape => pre-training should also be done with RL
- long context windows lead to exploitation of spurious correleations
- disentangle reasoning and knowledge
The funny thing about multimodal image generation as released in the last week by Google and OpenAI is that now LLM image generation works like how most people using LLMs for the past two years always thought LLM image generation works.
www.youtube.com/watch?v=9_Pe... An interview with Rich. The humility of Rich is truly inspiring: "There are no authorities in science". I wish people would listen and live by this.
Congrats Andrew and Rich, well deserved!! apnews.com/article/turi...
check this out: new postdoc program for AI-related research in Catalunya!
our group is looking to hire within this program, ideally to work on topics related to RL theory. in case you're interested, pls DM or email me.
(retweets appreciated!)
ramonllull-aira.eu/application
How DeepSeek R1's Multi-round Conversation works.
api-docs.deepseek.com/guides/reaso...
Bombshell from DeepSeek: the R1 family of models. Incredibly, it's MIT licensed and they encourage us to distill from it.
The core of the approach is reinforcement learning from verifiable rewards. No PRMs / MCTS. R1-zero doesn't even use SFT to start.
I probably donβt need to tell you that 2024 was a huge year for robotics. As a long-time robotics researcher, itβs been amazing to watch; some of the things that I always dreamed about actually seem to be happening.
For me, there are three big stories: itcanthink.substack.com/p/2024-robot...
Super happy to reveal our new paper! ππβοΈ
We trained a model to play four games, and the performance in each increases by "external search" (MCTS using a learned world model) and "internal search" where the model outputs the whole plan on its own!
RLDM will be held next year in Dublin!
A reminder that the call for workshops is out: rldm.org/call-for-wor...
The workshops are one of my favourite parts of the conference :) please get in touch if you have any questions!
Hello, world! You seem a bit wilder than I expected, but here we are.