Vincent Francois-Lavet (@vinfl)

OpenAI releases a free GPT model that can run right on your laptop GPT-OSS is OpenAI’s first open-weight model in six years.

NEW: OpenAI is releasing two free open models today, ahead of the GPT-5 launch. One of the open-weight "GPT-OSS" models is small enough to run on a laptop. More from @alexeheath.com 👇 www.theverge.com/openai/71878...

05.08.2025 17:01 👍 47 🔁 5 💬 1 📌 0

According to new research by waymo, self driving cars neural nets perform better according to power scaling laws. More data and compute = better performance. waymo.com/blog/2025/06...

14.06.2025 01:31 👍 64 🔁 8 💬 8 📌 2

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT)...

Major reasoning models trained w RL so far with technical reports:

2025-01-22 — DeepSeek R1 — arxiv.org/abs/2501.12948
2025-01-22 — Kimi 1.5 — arxiv.org/abs/2501.12599
2025-03-31 — Open-Reasoner-Zero — arxiv.org/abs/2503.24290
2025-04-10 — Seed 1.5-Thinking — arxiv.org/abs/2504.13914
...

11.06.2025 03:17 👍 48 🔁 8 💬 2 📌 2

True but looking for data to back up what "you believe" is already a good sign, right? It seems better than trying to claim things without anything to back the claims. And at least, you can then argue with people who disagree based on the data on a scientific basis.

05.06.2025 11:11 👍 3 🔁 0 💬 1 📌 0

Interested in more insights about the progress of AI, you can check out these two sources:
www.bondcap.com/report/tai
www.ben-evans.com/presentations

04.06.2025 14:27 👍 0 🔁 0 💬 0 📌 0

State of AI in 4 plots.

The 200 ELO points difference between recent models and a model that is 2 years old means that a human rater has ~75% chance of preferring an answer from a recent model.

Based on available data, all indicators about the progress of AI (in particular LLMs) remain strong.

04.06.2025 14:17 👍 3 🔁 0 💬 1 📌 0

Not long ago, people laughed at the idea of AI generating minutes-long realistic videos. Now it's reality with tools like Sora and Veo 3 leading the way. Full movies in cinemas soon, generated from just a few prompts...

25.05.2025 14:21 👍 2 🔁 0 💬 1 📌 0

Shoutout to the creators of PQN and for the cleanRL baselines.

22.05.2025 11:39 👍 0 🔁 0 💬 0 📌 0

My co-authors: Jacob Kooi and Zhao Yang
Paper: arxiv.org/abs/2505.15345
Codebase: github.com/Jacobkooi/Ha...

22.05.2025 11:38 👍 0 🔁 0 💬 1 📌 0

Directly implementing the Hadamax encoder in other algorithms such as C51 also shows over 60% improvements.

22.05.2025 11:35 👍 0 🔁 0 💬 1 📌 0

The Hadamax architecture can be implemented in any pixel-based encoder. The most important design choices are:

1. Convolutional Hadamard Representations.
2. Max-pooling instead of convolutional down-sampling.
3. Gaussian Error Linear Unit activations.

22.05.2025 11:34 👍 1 🔁 0 💬 1 📌 0

Without changing any algorithmic hyperparameters, this encoder substitution places Hadamax-PQN among state-of-the-art model-free reinforcement learning, while remaining an order of magnitude faster than Rainbow.

22.05.2025 11:34 👍 1 🔁 0 💬 1 📌 0

📢New paper on arXiv: Hadamax Encoding: Elevating Performance in Model-Free Atari. (arxiv.org/abs/2505.15345)

Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithm’s Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!

22.05.2025 11:33 👍 5 🔁 0 💬 1 📌 0

Making stock market predictions (especially short/medium term) is tempting but unless you have privileged information, you might as well try predicting random noise. Financial markets are self-adapting systems where any predictable pattern tends to be exploited and arbitraged away by participants.

17.05.2025 12:10 👍 1 🔁 0 💬 0 📌 0

Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.

🔗 magazine.sebastianraschka.com/p/the-state-...

19.04.2025 13:48 👍 61 🔁 10 💬 1 📌 2

This is indeed a great position paper, I like it a lot:
- pre-training w next token prediction creates local minima in reasoning we can't escape => pre-training should also be done with RL
- long context windows lead to exploitation of spurious correleations
- disentangle reasoning and knowledge

17.04.2025 07:04 👍 21 🔁 2 💬 0 📌 0

The funny thing about multimodal image generation as released in the last week by Google and OpenAI is that now LLM image generation works like how most people using LLMs for the past two years always thought LLM image generation works.

26.03.2025 01:17 👍 77 🔁 6 💬 1 📌 0

Introducing Gemini Robotics and Gemini Robotics-ER, AI models designed for robots to understand, act and react to the physical world. Introducing Gemini Robotics and Gemini Robotics-ER, AI models designed for robots to understand, act and react to the physical world.

deepmind.google/discover/blo... !

12.03.2025 16:25 👍 9 🔁 3 💬 0 📌 0

TURING AWARD WINNER Richard S. Sutton in Conversation with Cam Linke | No Authorities in Science YouTube video by Amii

www.youtube.com/watch?v=9_Pe... An interview with Rich. The humility of Rich is truly inspiring: "There are no authorities in science". I wish people would listen and live by this.

06.03.2025 20:50 👍 40 🔁 13 💬 2 📌 1

AI pioneers who channeled 'hedonistic' machines win computer science's top prize Teaching machines in the way that animal trainers mold the behavior of dogs or horses has been an important method for developing artificial intelligence and one that was recognized Wednesday with the...

Congrats Andrew and Rich, well deserved!! apnews.com/article/turi...

06.03.2025 03:40 👍 6 🔁 3 💬 0 📌 0

Ramon Llull AIRA Open Calls Open Calls In our inaugural call scheduled for December 2024, we aim to select up to 17 exceptional postdoctoral fellows, with an additional 16 to be chosen in Call 2 in 2025. 20 December 2024 ...

check this out: new postdoc program for AI-related research in Catalunya!

our group is looking to hire within this program, ideally to work on topics related to RL theory. in case you're interested, pls DM or email me.

(retweets appreciated!)

ramonllull-aira.eu/application

22.01.2025 16:55 👍 12 🔁 10 💬 0 📌 0

How DeepSeek R1's Multi-round Conversation works.

api-docs.deepseek.com/guides/reaso...

20.01.2025 17:04 👍 12 🔁 1 💬 0 📌 0

Bombshell from DeepSeek: the R1 family of models. Incredibly, it's MIT licensed and they encourage us to distill from it.

The core of the approach is reinforcement learning from verifiable rewards. No PRMs / MCTS. R1-zero doesn't even use SFT to start.

20.01.2025 15:35 👍 8 🔁 2 💬 1 📌 0

2024 Robotics Year in Review Robotics finally feels like it's happening

I probably don’t need to tell you that 2024 was a huge year for robotics. As a long-time robotics researcher, it’s been amazing to watch; some of the things that I always dreamed about actually seem to be happening.

For me, there are three big stories: itcanthink.substack.com/p/2024-robot...

02.01.2025 18:15 👍 35 🔁 7 💬 2 📌 3

Super happy to reveal our new paper! 🎉🙌♟️

We trained a model to play four games, and the performance in each increases by "external search" (MCTS using a learned world model) and "internal search" where the model outputs the whole plan on its own!

05.12.2024 09:09 👍 137 🔁 17 💬 4 📌 8

RLDM will be held next year in Dublin!

A reminder that the call for workshops is out: rldm.org/call-for-wor...

The workshops are one of my favourite parts of the conference :) please get in touch if you have any questions!

22.11.2024 09:57 👍 42 🔁 15 💬 1 📌 0

Hello, world! You seem a bit wilder than I expected, but here we are.

18.11.2024 19:15 👍 11 🔁 0 💬 3 📌 0

Vincent Francois-Lavet

Latest posts by Vincent Francois-Lavet @vinfl