Telling your students about research before the ImageNet moment
@tarekbouamer
Research Specialist @ATRC, 3D Computer Vision, Machine Learning & Robotics. Previously ICG @TU_Graz, Paris-Sud & CentraleSupΓ©lec π. Looking for innovative research opportunities π in AI, robotics, and 3D vision.
Telling your students about research before the ImageNet moment
Introducing StereoSpace -- our new end-to-end method for turning photos into stereo images without explicit geometry or depth maps. This makes it especially robust with thin structures and transparencies. Try the demo below
ACE-SLAM: Scene Coordinate Regression for Neural Implicit Real-Time SLAM
Ignacio Alzugaray, @marwantaher.bsky.social, @ajdavison.bsky.social
tl;dr: in title; ACE+SLAM
arxiv.org/abs/2512.14032
SAM 3D: 3Dfy Anything in Images
SAM 3D Team et al?
tl;dr: in title. 8-stage training, dataset, human labeling. Do not read tl;dr, read whole paper
arxiv.org/abs/2511.16624
Last week we launched IMC2025-Ongoing on
@kaggle.com
The dataset is exactly as in IMC2025, but the competition is on-going for a year, making it better for academic leaderboard and persistency.
kaggle.com/competitions...
1/2
MASt3R-Fusion: Integrating Feed-Forward Visual Model with IMU, GNSS for High-Functionality SLAM
Yuxuan Zhou, Xingxing Li, Shengyu Li, Zhuohao Yan, Chunxi Xia, Shaoquan Feng
tl;dr: MASt3R-SLAM+IMU+GNSS
arxiv.org/abs/2509.20757
LongSplat a robust unposed 3D Gaussian Splatting for Casual Long Videos
web linjohnss.github.io/longsplat/
code github.com/NVlabs/LongS...
MapAnything, a simple, end-to-end trained transformer model that directly regresses the factored metric 3D geometry of a scene given various types of inputs (images, calibration, poses, or depth).
code: github.com/facebookrese...
web: map-anything.github.io
3D and 4D World Modeling: A Survey
tl;dr: in title
arxiv.org/abs/2509.07996
OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics
Yinan Deng, Yufeng Yue, Jianyu Dou, Jingyu Zhao, Jiahui Wang, Yujie Tang, Yi Yang, Mengyin Fu
tl;dr: optics, geometry, and semantics->3DGS-Voxel hybrid representation
arxiv.org/abs/2509.07500
Faster VGGT with Block-Sparse Global Attention
Chung-Shien Brian Wang, Christian Schmidt, Jens Piekenbrinck, Bastian Leibe
tl;dr: block-sparse attention replaces global attention
another work to improve scalability of VGGT
arxiv.org/abs/2509.07120
Life is hard without the fast internet weβre used to.
I could not even join my Google Meet this morning. π
CausNVS: Autoregressive Multi-view Diffusion for Flexible 3D Novel View Synthesis
Xin Kong, Daniel Watson, Yannick StrΓΌmpler, @miniemeyer.bsky.social, Federico Tombari
tl;dr: a framewise attention layer with causal masking on top of a pretrained 2D diffusion backbone
arxiv.org/abs/2509.06579
Stages of the eclipse, captured by a friend.
#mooneclipse
Lunar eclipse over Abu Dhabi tonight at 22:20 PM ππβ¨
Apply for the AITHYRA-CeMM International PhD Program!
15-20 fully funded PhD fellowships available in Vienna, AT
in AI/ML and Life Sciences
Deadline for applications:
10 September 2025 apply.cemm.at
Franca official code and pretrained models are up on github and pytorch hub! github.com/valeoai/franca
Eager to learn how will it be used.
Reconstruct, Inpaint, Finetune: Dynamic Novel-view Synthesis from Monocular Videos
Kaihua Chen, @tarashakhurana.bsky.social, Deva Ramanan
tl;dr: in title; fine-tune CogVideoX->train 2D video-inpainter
arxiv.org/abs/2507.12646
St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World
TL;DR: a feed-forward; (reconstructs+tracks dynamic video content); dust3r-like pointmaps for a pair of frames captured at different moments (1/2)
www.liruilong.cn/prope/
ππ°οΈπWant to explore universal visual features? Check out our interactive demo of concepts learned from our #ICML2025 paper "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment".
Come see our poster at 4pm on Tuesday in East Exhibition hall A-B, E-1208!
Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching
Yuhan Liu, Jingwen Fu, Yang Wu, Kangyi Wu, Pengna Li, Jiayi Wu, Sanping Zhou, Jingmin Xin
tl;dr: Stable Diffusion+attention-based prompt in LoFTR-type framework
no eval. on IMC
arxiv.org/abs/2507.10318
Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding
Yuchen Rao, Stefan Ainetter, Sinisa Stekovic, @vincentlepetit.bsky.social , Friedrich Fraundorfer
tl;dr: in title
arxiv.org/abs/2504.13580
A Guide to Structureless Visual Localization
Vojtech Panek, Qunjie Zhou, Yaqing Ding, SΓ©rgio Agostinho, Zuzana Kukelova @sattlertorsten.bsky.social @lealtaixe.bsky.social
tl;dr: RoMa>MAST3r outdoors with 5pt solver, indoors MAST3r is king. M3Dv2 depth comparable to MAST3r
arxiv.org/abs/2504.17636
π Never miss a beat in science again!
π¬ Scholar Inbox is your personal assistant for staying up to date with your literature. It includes: visual summaries, collections, search and a conference planner.
Check out our white paper: arxiv.org/abs/2504.08385
#OpenScience #AI #RecommenderSystems
Super excited to share Visual Chronicles! Huge kudos to @boyangdeng.bsky.social on his fantastic internship work with us at Google DeepMind. It was one of the coolest and most fun projects I've ever been a part of!
Tell us what trends we discovered surprise you: boyangdeng.com/visual-chron...
#EidMubarak to all my friends and colleagues celebrating! πβ¨
May these blessed days bring joy, peace, and prosperity to you and your families. ππ€²
(1/3) Happy to share LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes, that uplifts visual features from models such as DINOv2 (left) & CLIP (mid) to 3DGS scenes. Joint work w. @dlarlus.bsky.social @jmairal.bsky.social
Webpage & code: juliettemarrie.github.io/ludvig
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli
tl;dr: one more multiview-transformer decoder for DUST3r encoder. Optimized
arxiv.org/abs/2501.13928