One thing that I am excited about is unlocking the power of function space models for domain-agnostic learning. I am positive that a single architecture can achieve SOTA generation results for images, videos, 3D pointclouds and graphs.
One thing that I am excited about is unlocking the power of function space models for domain-agnostic learning. I am positive that a single architecture can achieve SOTA generation results for images, videos, 3D pointclouds and graphs.
When we started getting this results on ImageNet-256 I was impressed that a model that predicts each pixel independently (through a cross-attention block), can generate these high-frequency details.
Here's one to read on your flight to #NeurIPS2024! A flow-matching transformer model in function space! This model has all the advantages of neural fields: resolution-free generation and domain-agnostic architecture, while obtaining strong results on ImageNet-256 and Objaverse!