Disentanglement is an intriguing phenomenon that arises in generative latent variable models for reasons that are not fully understood.
If you’re interested in learning why, I highly recommend giving Carl’s blog a read!
Disentanglement is an intriguing phenomenon that arises in generative latent variable models for reasons that are not fully understood.
If you’re interested in learning why, I highly recommend giving Carl’s blog a read!
I am hiring for RS/RE positions! If you are interested in language-flavored multimodal learning, evaluation, or post-training apply here 🦎 boards.greenhouse.io/deepmind/job...
I will also be #NeurIPS2024 so come say hi! (Please email me to find time to chat)
Our big_vision codebase is really good! And it's *the* reference for ViT, SigLIP, PaliGemma, JetFormer, ... including fine-tuning them.
However, it's criminally undocumented. I tried using it outside Google to fine-tune PaliGemma and SigLIP on GPUs, and wrote a tutorial: lb.eyer.be/a/bv_tuto.html
I think this comes down to the model behind p(x,y). If features of x cause y, e.g. aspects of a website (x) -> clicks (y); age/health -> disease, then p(y|x) is a (regression) fn of x. But if x|y is a distrib'n of different y's (e.g. cats) then p(y|x) is given by Bayes rule (squint at softmax).
Read our paper:
Context-Aware Multimodal Pretraining
Now on ArXiv
Can you turn vision-language models into strong any-shot models?
Go beyond zero-shot performance in SigLixP (x for context)
Read @confusezius.bsky.social thread below…
And follow Karsten … a rising star!
We maintain strong zero-shot transfer of CLIP / SigLIP across model size and data scale, while achieving up to 4x few-shot sample efficiency and up to +16% performance gains!
Fun project with @confusezius.bsky.social, @zeynepakata.bsky.social, @dimadamen.bsky.social and
@olivierhenaff.bsky.social.
Just a heads up to everyone: @deep-mind.bsky.social is unfortunately a fake account and has been reported. Please do not follow it nor repost anything from it.
Could you add me please? :)
Me too please :)