Alexander Kolesnikov's Avatar

Alexander Kolesnikov

@kolesnikov.ch

799
Followers
71
Following
20
Posts
24.11.2024
Joined
Posts Following

Latest posts by Alexander Kolesnikov @kolesnikov.ch

Post image

Looking for a small or medium sized VLM? PaliGemma 2 spans more than 150x of compute!

Not sure yet if you want to invest the time 🪄finetuning🪄 on your data? Give it a try with our ready-to-use "mix" checkpoints:

🤗 huggingface.co/blog/paligem...
🎤 developers.googleblog.com/en/introduci...

19.02.2025 17:47 👍 19 🔁 7 💬 0 📌 0
Preview
Knowledge distillation: A good teacher is patient and consistent There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we addres...

The full answer is probably very complex.

I really like the "function matching" angle we discovered (or rediscovered) in one of our papers that partially demystifies distillation for me: arxiv.org/abs/2106.05237

21.12.2024 17:39 👍 15 🔁 0 💬 0 📌 0

Thank you!

21.12.2024 08:21 👍 5 🔁 0 💬 0 📌 0
x.com

Also check out this concurrent work that is very similar in spirit to Jet and JetFormer, which proposes autoregressive ViT-powered normalizing flows (NFs): x.com/zhaisf/statu...

20.12.2024 14:39 👍 6 🔁 0 💬 0 📌 0

Joint work with @asusanopinto.bsky.social
and @mtschannen.bsky.social performed at Google Deepmind.

20.12.2024 14:39 👍 2 🔁 0 💬 1 📌 0

Final note: we see the Jet model as a powerful tool and a building block for advanced generative models, like JetFormer bsky.app/profile/mtsc..., and not as a standalone competitive generative model.

20.12.2024 14:39 👍 1 🔁 0 💬 1 📌 0
Preview
Jet: A Modern Transformer-Based Normalizing Flow In the past, normalizing generative flows have emerged as a promising class of generative models for natural images. This type of model has many modeling advantages: the ability to efficiently compute...

Check out the paper for more juicy details: arxiv.org/abs/2412.15129.

My favorite mini-insight is how implicit half-precision matrix multiplications (with float32 accumulation) can 'eat' entropy and lead to an overly optimistic, flawed objective and evaluations.

20.12.2024 14:39 👍 3 🔁 1 💬 1 📌 0
Preview
Add "Jet: A Modern Transformer-Based Normalizing Flow" by andresusanopinto · Pull Request #143 · google-research/big_vision Implementation used in https://arxiv.org/abs/2412.15129 There are a few other small fixes in big_vision codebase.

We release full Jet code (including training) in big_vision repo: github.com/google-resea....

20.12.2024 14:39 👍 4 🔁 1 💬 1 📌 0
Post image Post image

When trained on 'small' data, such as ImageNet-1k, overfitting occurs.

Another contribution is a demonstration that transfer learning is effective in mitigating overfitting. The recipe is: pretrain on a large image database and then fine-tune to a small dataset, e.g., CIFAR-10.

20.12.2024 14:39 👍 2 🔁 1 💬 1 📌 0
Post image

We observe robust performance improvements with compute scaling, showing behavior similar to classical scaling laws.

These are the results of varying the Jet model size when training on ImageNet-21k images:

20.12.2024 14:39 👍 2 🔁 0 💬 1 📌 0
Post image

Our main contribution is a very straightforward design: Jet is just repeated affine coupling layers with ViT inside. We show that many standard components are not needed with our simple design:
❌ invertible dense layer
❌ ActNorm layer
❌ multiscale latents
❌ dequant. noise

20.12.2024 14:39 👍 3 🔁 1 💬 1 📌 0
Post image

With some delay, JetFormer's *prequel* paper is finally out on arXiv: a radically simple ViT-based normalizing flow (NF) model that achieves SOTA results in its class.

Jet is one of the key components of JetFormer, deserving a standalone report. Let's unpack: 🧵⬇️

20.12.2024 14:39 👍 42 🔁 7 💬 2 📌 1
Preview
Jet: A Modern Transformer-Based Normalizing Flow In the past, normalizing generative flows have emerged as a promising class of generative models for natural images. This type of model has many modeling advantages: the ability to efficiently compute...

Here it is: arxiv.org/abs/2412.15129

20.12.2024 14:32 👍 1 🔁 1 💬 1 📌 0

Paligemma2 is out! Bigger models, better results. For the best experience, do not forget to finetune.

Congrats Paligemma2 team!

05.12.2024 18:28 👍 13 🔁 1 💬 0 📌 0

Ok, it is yesterdays news already, but good night sleep is important.

After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.

04.12.2024 09:14 👍 116 🔁 11 💬 8 📌 5

In arxiv.org/abs/2303.00848, @dpkingma.bsky.social and @ruiqigao.bsky.social had suggested that noise augmentation could be used to make other likelihood-based models optimise perceptually weighted losses, like diffusion models do. So cool to see this working well in practice!

02.12.2024 18:36 👍 53 🔁 11 💬 0 📌 0

The answer has just dropped: bsky.app/profile/kole...

02.12.2024 19:00 👍 15 🔁 2 💬 2 📌 0

JetFormer product of endless and heated (but friendly) arguing and discussions with @mtschannen.bsky.social
and @asusanopinto.bsky.social.

Very excited about this model due to its potential to unify multimodal learning with a simple and universal end-to-end approach.

02.12.2024 17:19 👍 1 🔁 0 💬 0 📌 0
Post image

We evaluate JetFormer potential to model large-scale multimodal image+text data and do image-to-text, text-to-image and VQA tasks, and get rather encouraging results.

02.12.2024 17:19 👍 1 🔁 0 💬 1 📌 0
Post image

We also present novel data augmentation: "noise curriculum". It helps a pure NLL model to focus on high-level image details.

Even though it is inspired by diffusion, it is very different: it only affects training and does not require iterative denoising during inference.

02.12.2024 17:19 👍 2 🔁 0 💬 1 📌 0
Post image

JetFormer is just an autoregressive transformer, trained end-to-end in one go, with no pretrained image encoders/quantizers.

There is a small twist though. An image input is re-encoded with a normalizing flow model, which is trained jointly with the main transformer model.

02.12.2024 17:19 👍 2 🔁 0 💬 1 📌 0

I always dreamed of a model that simultaneously

1. optimizes NLL of raw pixel data,
2. generates competitive high-res. natural images,
3. is practical.

But it seemed too good to be true. Until today!

Our new JetFormer model (arxiv.org/abs/2411.19722) ticks on all of these.

🧵

02.12.2024 17:19 👍 37 🔁 5 💬 2 📌 0