Will you be at #NeurIPS2025? Come talk TMLR and collect swag!
EiCs Gautam Kamath (@gautamkamath.com) and Nihar Shah will be there -- if you are an AE or an Expert Reviewer, or have a Featured or Outstanding Certification, you can get a free TMLR laptop sticker! Locations ⬇️
27.11.2025 17:16
👍 14
🔁 2
💬 1
📌 2
10.11.2025 20:47
👍 6
🔁 2
💬 0
📌 1
Our discussion period just started. Authors, please read our instructions carefully. We require responses by June 2.
But, what you really want to hear about is stats .... right? -> 🧵
27.05.2025 17:41
👍 17
🔁 5
💬 2
📌 0
o3’s weird hallucinations could indicate they used llm as a judge (or other softer verifiers) in high volume and in addition to math/code correctness.
This addition lets OpenAI scale RL by making more data available to train on, but has new downstream problems to solve.
20.04.2025 14:06
👍 19
🔁 2
💬 1
📌 0
One of the first papers I've seen with RLVR / reinforcement finetuning of vision language models
Looks about as simple as we would expect it to be, lots of details to uncover.
Liu et al. Visual-RFT: Visual Reinforcement Fine-Tuning
buff.ly/DbGuYve
(posted a week ago, oops)
10.03.2025 15:44
👍 16
🔁 2
💬 1
📌 1
Monitoring Reasoning Models for Misbehavior and the Risks of
Promoting Obfuscation cdn.openai.com/pdf/34f2ada6...
11.03.2025 04:49
👍 0
🔁 0
💬 0
📌 0
Open-Reasoner-Zero: An Open Source Approach to Scaling Up
Reinforcement Learning on the Base Model
github.com/Open-Reasone...
20.02.2025 11:16
👍 0
🔁 0
💬 0
📌 0
Rare that a paper these days uses the original literature of "Outcome reward model" and not just doing bradley-terry model on right/wrong labels.
Nature is healing.
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
Lyu et al
arxiv.org/abs/2502.06781
15.02.2025 15:50
👍 14
🔁 4
💬 2
📌 0
Examining False Positives under Inference Scaling for Mathematical Reasoning arxiv.org/pdf/2502.06217
11.02.2025 08:26
👍 0
🔁 0
💬 0
📌 0
This is a potentially counterintuitive result. We actually want the reasoning models to generate more tokens for wrong answers. Eventually, models should “know” when they’re not right and keep spending more compute on it!
Regardless, is a great plot.
arxiv.org/abs/2501.18585
31.01.2025 13:36
👍 34
🔁 3
💬 3
📌 2
✍️ Reminder to reviewers: Check author responses to your reviews, and ask follow up questions if needed.
50% of papers have discussion - let’s bring this number up!
25.11.2024 12:45
👍 38
🔁 8
💬 1
📌 3