Thaddäus Wiedemer (@thwiedemer)

This work was a great collaboration; special shout-out to @jana-z.bsky.social for leading this project and submitting the first paper of her PhD!

03.02.2026 09:50 👍 0 🔁 0 💬 0 📌 0

Whether self-generated visuals can at some point serve a function similar to mental imagery in human thought remains to be seen.

For now, MentisOculi provides a small suite of tasks to study this topic.

03.02.2026 09:50 👍 0 🔁 0 💬 1 📌 0

The fact that we don't see strong benefits of using even ground-truth visuals points to information in the visual/textual domains being somewhat misaligned, potentially because models are not trained for similar tasks.

03.02.2026 09:50 👍 0 🔁 0 💬 1 📌 0

This work is motivated by the same intuition as my work on Video models last fall: Can media generation capabilities be useful beyond just generating nice visuals?

For real-world, embodied applications being able to visualize the outcome of an action seems useful.

03.02.2026 09:50 👍 0 🔁 0 💬 1 📌 0

How useful are self-generated 'mental images' (visual aids) in MLLM/UMM reasoning?

Turns out: currently not very. Visualizations have small errors that compound in multi-step problems, and models often ignore correct visual aids in their decision making.

03.02.2026 09:50 👍 2 🔁 1 💬 1 📌 0

🚀 We're hiring! The @ellisinsttue.bsky.social leads the AI development for Germany’s new open-source nationwide Adaptive Intelligent System learning platform for schools (as part of a consortium led by Assecor & KI macht Schule, and mandated by the FWU).

👉 Apply now: forms.gle/XmLkwEDD45fY...

15.12.2025 13:37 👍 5 🔁 3 💬 1 📌 1

🎉 Excited to present our paper VGGSounder: Audio‑Visual Evaluations for Foundation Models today at #ICCV2025!

🕦 Poster Session 1 | 11:30–13:30
📍 Poster #88

Come by if you're into audio-visual learning and want to know whether multiple modalities actually help or hurt.

21.10.2025 18:06 👍 6 🔁 1 💬 1 📌 0

I'm truly honored to have worked on this at Google DeepMind with my amazing collaborators!

With 2 months left in my internship, I'm excited about our next steps in this direction!

25.09.2025 17:02 👍 1 🔁 0 💬 0 📌 0

And as with other 'zero-shot' works, it's clear that Veo has been exposed to samples of many of our tasks in the training data. The promise lies into its ability to quickly be adapted for general tasks with just a prompt, no fine-tuning required!

25.09.2025 17:02 👍 1 🔁 0 💬 1 📌 0

Of course, performance is not perfect yet and lacks behind SotA. Video models are also expensive to train and run, so they won't replace all vision models just yet. But the rapid progress from Veo 2 to Veo 3 illustrates their potential to become vision foundation models.

25.09.2025 17:02 👍 0 🔁 0 💬 1 📌 0

Intuitively, some tasks are easier to directly solve in the vision domain, and we also observe this in maze solving tasks. This makes me super excited about a future where generalist vision and language models could be integrated for reasoning in the real world by 'imagining' possible outcomes.

25.09.2025 17:02 👍 0 🔁 0 💬 1 📌 0

On the reasoning side, videos as 'chain-of-frames' parallel chain-of-thought in LLMs. Complex visual tasks that an image editing model like Nano Banana would have to solve in one go can be broken down into smaller steps.

25.09.2025 17:02 👍 0 🔁 0 💬 1 📌 0

Specifically, Veo 3 can perceive (segment, locacalize, detect edges, ...), model (physics, abstract relations, memory), manipulate (edit images, simulate robotics), and reason about the visual world.

Video models might well become vision foundation models.

25.09.2025 17:02 👍 0 🔁 0 💬 1 📌 0

Are we experiencing a 'GPT moment' in vision?

In our new preprint, we show that generative video models can solve a wide range of tasks across the entire vision stack without being explicitly trained for it.

🌐 video-zero-shot.github.io

1/n

25.09.2025 17:02 👍 4 🔁 1 💬 2 📌 2

Check out our newest paper!

As always, it was super fun working on this with @prasannamayil.bsky.social

18.02.2025 14:12 👍 5 🔁 1 💬 0 📌 0

CuratedThoughts: Data Curation for RL Datasets 🚀

Since DeepSeek-R1 introduced reasoning-based RL, datasets like Open-R1 & OpenThoughts emerged for fine-tuning & GRPO. Our deep dive found major flaws — 25% of OpenThoughts needed elimination by data curation.

Here's why 👇🧵

17.02.2025 18:22 👍 13 🔁 9 💬 1 📌 1

Hiring announcement: ELLIS Institute Tübingen is looking for ML Researchers & Engineers for Open-Source AI Tutoring (m/f/d). The image features a white background with bold black text and the colorful ELLIS logo at the bottom.

🚀 We’re hiring! Join Bernhard Schölkopf & me at @ellisinsttue.bsky.social to push the frontier of #AI in education!

We’re building cutting-edge, open-source AI tutoring models for high-quality, adaptive learning for all pupils with support from the Hector Foundation.

👉 forms.gle/sxvXbJhZSccr...

11.02.2025 16:34 👍 9 🔁 14 💬 1 📌 1

Thaddäus Wiedemer

Latest posts by Thaddäus Wiedemer @thwiedemer