Oncel Tuzel's Avatar

Oncel Tuzel

@onceltuzel

AI researcher at Apple

18
Followers
13
Following
1
Posts
23.11.2024
Joined
Posts Following

Latest posts by Oncel Tuzel @onceltuzel

Video thumbnail

Check out the code, models, and demo iOS/macOS app using MLX for our fast vision-language models, FastVLM:
github.com/apple/ml-fas...

Paper: "FastVLM: Efficient Vision Encoding for Vision Language Models", Anasosalu et al., CVPR 2025
arxiv.org/abs/2412.13303

#CVPR2025 #Apple #research

07.05.2025 12:20 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Today is a great day for optimal transport ๐ŸŽ‰! Lots of gratitude ๐Ÿ™ for all folks who contributed to ott-jax.readthedocs.io and pushed for the MOSCOT (now @ nature!) paper, from visionaries @dominik1klein.bsky.social, G. Palla, Z. Piran to the magician, Michal Klein! โค๏ธ

www.nature.com/articles/s41...

22.01.2025 22:17 ๐Ÿ‘ 22 ๐Ÿ” 7 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1
Preview
FastVLM: Efficient Vision Encoding for Vision Language Models Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders su...

For more, check out our paper on arxiv: arxiv.org/abs/2412.13303

With the amazing people: @pavankumarvasu.bsky.social , Fartash Faghri, Chun-Liang Li, Hadi Pouransari, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, and @onceltuzel.bsky.social

19.12.2024 19:22 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
WVD Pipeline

WVD Pipeline

๐Ÿค”Image-to-3D, monocular depth estimation, camera pose estimation, โ€ฆ, can we achieve all of this with just ONE model easily?

๐Ÿš€Our answer is Yes -- Excited to introduce our latest work: World-consistent Video Diffusion (WVD) with Explicit 3D Modeling!

arxiv.org/abs/2412.01821

04.12.2024 13:41 ๐Ÿ‘ 14 ๐Ÿ” 6 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0