Tobias Weyand's Avatar

Tobias Weyand

@tobw.net

Researcher at Google DeepMind working towards human-level video understanding ๐Ÿ”— tobw.net

404
Followers
59
Following
14
Posts
22.11.2024
Joined
Posts Following

Latest posts by Tobias Weyand @tobw.net

Preview
Arxiv paper โ€“ MINERVA: Evaluating Complex Video Reasoning In this episode, we discuss MINERVA: Evaluating Complex Video Reasoning by Arsha Nagrani, Sachit Menon, Ahmet Iscen, Shyamal Buch, Ramin Mehran, Nilpa Jha, Anja Hauth, Yukun Zhu, Carl Vondrick, Mikhai...

Listen to the AGI Breakdown podcast on Minerva here: aibreakdown.org/arxiv-paper-...

13.05.2025 00:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Advancing the frontier of video understanding with Gemini 2.5- Google Developers Blog Explore Gemini 2.5, enhancing video understanding and combining audio-visual data and code for new interactive applications

The newly released Gemini 2.5 Pro (Preview 05/06) sets the state-of-the art on Minerva with 63.5% accuracy. Human accuracy is 92.5%.

developers.googleblog.com/en/gemini-2-...

13.05.2025 00:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
MINERVA: Evaluating Complex Video Reasoning Multimodal LLMs are turning their focus to video benchmarks, however most video benchmarks only provide outcome supervision, with no intermediate or interpretable reasoning steps. This makes it challe...

๐Ÿ“œ Paper: arxiv.org/abs/2505.006...
๐Ÿ“Š Dataset: github.com/google-deepm...

This is work with my amazing colleagues and collaborators Arsha Nagrani, Sachit Menon, Ahmet Iscen, Shyamal Buch, Ramin Mehran, Nilpa Jha, Anja Hauth, Yukun Zhu, Carl Vondrick, Mikhail Sirotenko, and Cordelia Schmid

13.05.2025 00:06 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

We're excited to release Minerva ๐Ÿ•ต๏ธโ€โ™€๏ธ, a benchmark to evaluate if AI can truly reason about videos, from spotting game-changing moments in sports ๐Ÿ€ to understanding character motivations in short films ๐Ÿฟ. We provide the "why" behind the answers! Pointers below ๐Ÿ‘‡

13.05.2025 00:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

And the ICLR decisions

22.01.2025 21:51 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

6yo daughter: Papa, are you the boss of Google?
Me: No
6yo daughter: Why?

08.01.2025 04:05 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Excited to share Long-Video Masked Autoencoder (LVMAE) our team just published at NeurIPS'24! We boost the context length of video models using an adaptive decoder and a dual-masking strategy and achieve SotA on several video benchmarks.

Paper: arxiv.org/abs/2411.13683

05.12.2024 22:56 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Whoa, massive news! Excited for you and looking forward to seeing what you'll build there!

05.12.2024 07:08 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Another nice way to get an ETA is

import tqm
for i in tqdm(range(len(dataset)):
...

28.11.2024 00:22 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Professor knocks - "Hey, I have a 'research' project for you"

27.11.2024 03:19 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Thanks, looks promising!

27.11.2024 03:15 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Oh nice, seems to work for the first few papers I tried. Thank you!

27.11.2024 03:14 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Is there a better way to find the publication venue of an ArXiv paper than searching for the title on Google / Google Scholar / OpenReview and checking authors' websites?

24.11.2024 23:16 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 0

Tap, tap. Is this thing on?

24.11.2024 23:13 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0