Tom Silver (@tomssilver)

Learning Montezuma's Revenge from a Single Demonstration We propose a new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge. Instead of imitating human demonstrations, as proposed in othe...

This week's #PaperILike is "Learning Montezuma’s Revenge from a Single Demonstration" (Salimans & Chen, 2018).

1 demo + known world model = very natural and still under-explored problem setting.

PDF: arxiv.org/abs/1812.03381

08.03.2026 15:28 👍 18 🔁 4 💬 0 📌 0

Partially Observable Task and Motion Planning with Uncertainty and Risk Awareness Integrated task and motion planning (TAMP) has proven to be a valuable approach to generalizable long-horizon robotic manipulation and navigation problems. However, the typical TAMP problem formulatio...

This week's #PaperILike is "Partially Observable Task and Motion Planning with Uncertainty and Risk Awareness" (Curtis et al., RSS 2024).

State of the art for TAMP + POMDPs. I learn more every time I read this paper.

PDF: arxiv.org/abs/2403.10454

01.03.2026 13:45 👍 5 🔁 0 💬 0 📌 0

This week's #PaperILike is "Empowerment: A Universal Agent-Centric Measure of Control" (Klyubin et al., 2005).

An important idea in RL, and a fun read -- mentions bacteria, chimpanzees, Newtonian mechanics, and Othello all within a few sentences.

PDF: uhra.herts.ac.uk/id/eprint/28...

22.02.2026 14:26 👍 11 🔁 1 💬 1 📌 0

This week's #PaperILike is "Integrating Planning and Learning: The PRODIGY Architecture" (Veloso et al., 1995).

A foundational project in the history of robot planning + learning, and a good place to look for old ideas that are worth resurfacing.

PDF: www.cs.cmu.edu/~jgc/publica...

15.02.2026 16:30 👍 0 🔁 0 💬 0 📌 0

Continuous Deep Q-Learning with Model-based Acceleration Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However,...

This week's #PaperILike is "Continuous Deep Q-Learning with Model-based Acceleration" (Gu et al., 2016).

Got swept away by other deep RL, but I always liked the idea of parameterizing Q in a form where the optimal policy can be derived analytically.

PDF: arxiv.org/abs/1603.00748

08.02.2026 16:44 👍 1 🔁 0 💬 0 📌 0

virtualtools Many animals, and an increasing number of artificial agents, display sophisticated capabilities to perceive and manipulate objects. But human beings remain distinctive in their capacity for flexible, ...

This week's #PaperILike is "Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning" (Allen et al., PNAS 2020).

Their "Virtual Tools Game" is one I revisit often when brainstorming open challenges.

PDF & game: sites.google.com/view/virtual...

01.02.2026 17:20 👍 1 🔁 0 💬 0 📌 0

This week's #PaperILike is "Robot Task Planning Under Local Observability" (Merlin et al., 2024).

LOMDPs are a very natural middle ground between MDPs and POMDPs with enough structure for interesting planning and learning.

PDF: maxmerl.in/papers/lomdp...

25.01.2026 16:55 👍 1 🔁 0 💬 0 📌 0

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts Informed and robust decision making in the face of uncertainty is critical for robots that perform physical tasks alongside people. We formulate this as Bayesian Reinforcement Learning over latent Mar...

This week's #PaperILike is "Bayesian Residual Policy Optimization" (Lee et al., 2020).

I like this POMDP approach because it reduces the problem to figuring out a good set of "clairvoyant experts".

PDF: arxiv.org/abs/2002.03042

18.01.2026 13:54 👍 5 🔁 0 💬 0 📌 0

Differentiable GPU-Parallelized Task and Motion Planning Planning long-horizon robot manipulation requires making discrete decisions about which objects to interact with and continuous decisions about how to interact with them. A robot planner must select g...

This week's #PaperILike is "Differentiable GPU-Parallelized Task and Motion Planning" (Shen et al., RSS 2025).

As always, meticulous work from @WillShenSaysHi and team. TAMP + GPU is long overdue!

PDF: arxiv.org/abs/2411.11833

11.01.2026 14:38 👍 3 🔁 0 💬 0 📌 0

Tom Silver | Prompt Fiddling Considered Harmful Tom Silver's academic website.

Continuing my pace of writing a new blog post every 2 years of so, here's the latest: "Prompt Fiddling Considered Harmful"

tomsilver.github.io/blog/2026/pr...

07.01.2026 13:27 👍 2 🔁 0 💬 1 📌 0

Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling Task and Motion Planning (TAMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While ...

This week's #PaperILike is "Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling" (Kwon & Kim, 2025).

I particularly like using VLMs to guide backtracking in TAMP. Outperforms PDDLStream and LLM3.

PDF: arxiv.org/abs/2510.26139

04.01.2026 15:09 👍 1 🔁 1 💬 0 📌 0

Home Learning Exploration Strategies to Solve Real-World Marble Runs

This week's #PaperILike is "Learning Exploration Strategies to Solve Real-World Marble Runs" (Allaire & Atkeson, ICRA 2023).

A very fun and creative challenge for robot physical reasoning.

Video: sites.google.com/view/learnin...
PDF: arxiv.org/abs/2303.04928

28.12.2025 16:09 👍 0 🔁 0 💬 0 📌 0

Elephants Don't Pack Groceries: Robot Task Planning for Low Entropy Belief States Recent advances in computational perception have significantly improved the ability of autonomous robots to perform state estimation with low entropy. Such advances motivate a reconsideration of robot...

This week's #PaperILike is "Elephants Don't Pack Groceries: Robot Task Planning for Low Entropy Belief States" (Adu-Bredu, RAL 2022).

Love the focus on planning with "low entropy beliefs" -- not full-fledged POMDPs, but also not full observability.

PDF: arxiv.org/abs/2011.09105

21.12.2025 13:59 👍 18 🔁 4 💬 0 📌 0

This week's #PaperILike is "Sloppy Programming" (Little et al., 2010).

Vibe coding before it was cool.

PDF: dspace.mit.edu/bitstream/ha...

14.12.2025 13:11 👍 0 🔁 0 💬 0 📌 0

This week's #PaperILike is "Robot Programming" (Tomas Lozano-Perez, 1983).

A prescient paper that asks how we might generally program robots like we program computers. Much remains true 42 years later.

PDF: homes.cs.washington.edu/~ztatlock/59...

07.12.2025 12:51 👍 0 🔁 0 💬 0 📌 0

This week's #PaperILike is "Learning Proofs of Motion Planning Infeasibility" (Li & Dantam, RSS 2021).

I like using learning to "fail fast", with guarantees. Important for TAMP, where there are other MP problems to try next.

PDF: www.roboticsproceedings.org/rss17/p064.pdf

30.11.2025 14:53 👍 1 🔁 0 💬 0 📌 0

Interleaving Monte Carlo Tree Search and Self-Supervised Learning for Object Retrieval in Clutter In this study, working with the task of object retrieval in clutter, we have developed a robot learning framework in which Monte Carlo Tree Search (MCTS) is first applied to enable a Deep Neural Netwo...

This week's #PaperILike is "Interleaving Monte Carlo Tree Search and Self-Supervised Learning for Object Retrieval in Clutter" (Huang et al., ICRA 2022).

Impressive results on a difficult and subtle problem, with a nice combo of planning + learning.

PDF: arxiv.org/abs/2202.01426

23.11.2025 16:20 👍 2 🔁 0 💬 0 📌 0

Goal-Oriented End-User Programming of Robots End-user programming (EUP) tools must balance user control with the robot's ability to plan and act autonomously. Many existing task-oriented EUP tools enforce a specific level of control, e.g., by re...

This week's #PaperILike is "Goal-Oriented End-User Programming of Robots" (Porfirio et al., HRI 2024).

I like this use of planning to fill in the gaps between subgoals that are directly programmed by end users.

PDF: arxiv.org/abs/2403.13988

16.11.2025 14:14 👍 3 🔁 0 💬 0 📌 0

Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models Large Language Models (LLMs) have emerged as a new paradigm for embodied reasoning and control, most recently by generating robot policy code that utilizes a custom library of vision and control primi...

This week's #PaperILike is "Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models" (Tziafas & Kasaei, ICRA 2024).

DreamCoder-like robot skill learning. Refactoring helps!

PDF: arxiv.org/abs/2406.18746

09.11.2025 13:52 👍 3 🔁 1 💬 0 📌 0

Y. Isabel Liu 刘亦颉 Y. Isabel Liu - PhD Applicant in Robotics. Princeton University Computer Science undergraduate researching task and motion planning, dexterous manipulation, and learning for robotics.

This project was led by a truly exceptional Princeton undergrad @yijieisabelliu.bsky.social, who is looking for PhD opportunities this year! Her website: isabelliu0.github.io

05.11.2025 15:22 👍 1 🔁 0 💬 0 📌 0

Happy to share some of the first work from my new lab! This project has shaped my thinking about how we can effectively combine planning and RL. Key idea: start with a planner that is slow and "robotic", then use RL to discover shortcuts that are fast and dynamic. (1/2)

05.11.2025 15:22 👍 8 🔁 1 💬 1 📌 0

Monte Carlo Tree Search with Spectral Expansion for Planning with Dynamical Systems The ability of a robot to plan complex behaviors with real-time computation, rather than adhering to predesigned or offline-learned routines, alleviates the need for specialized algorithms or training...

This week's #PaperILike is "Monte Carlo Tree Search with Spectral Expansion for Planning with Dynamical Systems" (Riviere et al., Science Robotics 2024).

A creative synthesis of control theory and search. I like using the Gramian to branch.

PDF: arxiv.org/abs/2412.11270

02.11.2025 13:27 👍 3 🔁 0 💬 0 📌 0

Reality Promises

This week's #PaperILike is "Reality Promises: Virtual-Physical Decoupling Illusions in Mixed Reality via Invisible Mobile Robots" (Kari & Abtahi, UIST 2025).

This is some Tony Stark level stuff! XR + robots = future.

Website: mkari.de/reality-prom...
PDF: mkari.de/reality-prom...

26.10.2025 15:02 👍 2 🔁 0 💬 0 📌 0

Learning to guide task and motion planning using score-space representation In this paper, we propose a learning algorithm that speeds up the search in task and motion planning problems. Our algorithm proposes solutions to three different challenges that arise in learning to ...

This week's #PaperILike is "Learning to Guide Task and Motion Planning Using Score-Space Representation" (Kim et al., IJRR 2019).

This is one of those papers that I return to over the years and appreciate more every time. Chock full of ideas.

PDF: arxiv.org/abs/1807.09962

19.10.2025 15:02 👍 1 🔁 0 💬 0 📌 0

Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics We present Points2Plans, a framework for composable planning with a relational dynamics model that enables robots to solve long-horizon manipulation tasks from partial-view point clouds. Given a langu...

Good question (and sorry I missed your reply!). Random ideas:
1. Revisit PSRs in the context of (neural) representation learning for RL, e.g., arxiv.org/abs/2508.13113
2. PSRs for learning abstractions for planning under partial observability, e.g., your work with Yixuan (arxiv.org/abs/2408.14769)

12.10.2025 12:47 👍 1 🔁 0 💬 0 📌 0

On the Utility of Koopman Operator Theory in Learning Dexterous Manipulation Skills Despite impressive dexterous manipulation capabilities enabled by learning-based approaches, we are yet to witness widespread adoption beyond well-resourced laboratories. This is likely due to practic...

This week's #PaperILike is "On the Utility of Koopman Operator Theory in Learning Dexterous Manipulation Skills" (Han et al., CoRL 2023).

This and others have convinced me that I need to learn Koopman! Another perspective on abstraction learning.

PDF: arxiv.org/abs/2303.13446

12.10.2025 12:38 👍 2 🔁 0 💬 0 📌 0

This week's #PaperILike is "Predictive Representations of State" (Littman et al., 2001).

A lesser known classic that is overdue for a revival. Fans of POMDPs will enjoy.

PDF: web.eecs.umich.edu/~baveja/Pape...

05.10.2025 14:45 👍 3 🔁 0 💬 1 📌 0

Influence-Augmented Local Simulators: A Scalable Solution for Fast Deep RL in Large Networked Systems Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which t...

This week's #PaperILike is "Influence-Augmented Local Simulators: A Scalable Solution for Fast Deep RL in Large Networked Systems" (Suau et al., ICML 2022).

Nice work on using fast local simulators to plan & learn in large partially observed worlds.

PDF: arxiv.org/abs/2202.01534

28.09.2025 12:18 👍 3 🔁 0 💬 0 📌 0

I’m doing this in my course right now. So far so good! One finding: if students try to install and uv while already being in a conda env, bad things happen. Make sure to deactivate conda first.

21.09.2025 22:35 👍 3 🔁 0 💬 1 📌 0

Optimal Interactive Learning on the Job via Facility Location Planning Collaborative robots must continually adapt to novel tasks and user preferences without overburdening the user. While prior interactive robot learning methods aim to reduce human effort, they are typi...

This week's #PaperILike is "Optimal Interactive Learning on the Job via Facility Location Planning" (Vats et al., RSS 2025).

I always enjoy a surprising connection between one problem (COIL) and another (UFL). And I always like work by Shivam Vats!

PDF: arxiv.org/abs/2505.00490

21.09.2025 15:44 👍 0 🔁 0 💬 0 📌 0

Tom Silver

Latest posts by Tom Silver @tomssilver