Nick Stracke's Avatar

Nick Stracke

@rmsnorm

PhD Student at Ommer Lab (Stable Diffusion) Trying to understand motion... 🌐 https://nickstracke.dev

659
Followers
280
Following
9
Posts
18.11.2024
Joined
Posts Following

Latest posts by Nick Stracke @rmsnorm

Two great works on how we can manipulate style for generative modeling by PiMa!

18.10.2025 08:37 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ€” What happens when you poke a scene β€” and your model has to predict how the world moves in response?

We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions.

It learns to predict the π˜₯π˜ͺ𝘴𝘡𝘳π˜ͺ𝘣𝘢𝘡π˜ͺ𝘰𝘯 of motion itself πŸ§΅πŸ‘‡

15.10.2025 01:56 πŸ‘ 24 πŸ” 8 πŸ’¬ 1 πŸ“Œ 1
Our method pipeline

Our method pipeline

πŸ€”When combining Vision-language models (VLMs) with Large language models (LLMs), do VLMs benefit from additional genuine semantics or artificial augmentations of the text for downstream tasks?

🀨Interested? Check out our latest work at #AAAI25:

πŸ’»Code and πŸ“Paper at: github.com/CompVis/DisCLIP

πŸ§΅πŸ‘‡

08.01.2025 15:54 πŸ‘ 15 πŸ” 8 πŸ’¬ 1 πŸ“Œ 0

And thanks for the kind words ! :)

09.12.2024 11:29 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

It was due to a compute constraint at that time. We will update it with numbers run on the complete test set once we release a new version of the paper.

09.12.2024 11:29 πŸ‘ 2 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

We make code and cleaned 🧹 weights available for SD 1.5 and SD 2.1.

Have a look now!
πŸ“ Paper: compvis.github.io/cleandift/st...
πŸ’» Code: github.com/CompVis/clea...
πŸ€— Hugging Face: huggingface.co/CompVis/clea...

04.12.2024 23:31 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

We show you can, with just 30 minutes of task-agnostic finetuning on a single GPU. 🀯

No noise. Better features. Better performance. Across many tasks.

And no timestep searching headaches! πŸ‘‡

04.12.2024 23:31 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

They need noisy images as input - and the right noise level for each task.
So we have to find the right timestep for every downstream task? 🀯

What if you could ditch all of that? πŸ‘‡

04.12.2024 23:31 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

This work was co-led by @stefanabaumann.bsky.social and @koljabauer.bsky.social.

✨ Diffusion models are amazing at learning world representations. Their features power many tasks:
β€’ Semantic correspondence
β€’ Depth estimation
β€’ Semantic segmentation
… and more!

But here’s the catch βš‘οΈπŸ‘‡

04.12.2024 23:31 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

πŸ€” Why do we extract diffusion features from noisy images? Isn’t that destroying information?

Yes, it is - but we found a way to do better. πŸš€

Here’s how we unlock better features, no noise, no hassle.

πŸ“ Project Page: compvis.github.io/cleandift
πŸ’» Code: github.com/CompVis/clea...

πŸ§΅πŸ‘‡

04.12.2024 23:31 πŸ‘ 42 πŸ” 10 πŸ’¬ 2 πŸ“Œ 5
Post image

me right now..

20.11.2024 14:22 πŸ‘ 48 πŸ” 3 πŸ’¬ 4 πŸ“Œ 0
Post image

Hi, just sharing an updated version of the PyTorch 2 Internals slides: drive.google.com/file/d/18YZV.... Content: basics, jit, dynamo, Inductor, export path and executorch. This is focused on internals so you will need a bit of C/C++. I show how you can export and run a model on a Pixel Watch too.

19.11.2024 11:05 πŸ‘ 87 πŸ” 17 πŸ’¬ 2 πŸ“Œ 1
[EEML'24] Sander Dieleman - Generative modelling through iterative refinement
[EEML'24] Sander Dieleman - Generative modelling through iterative refinement YouTube video by EEML Community

While we're starting up over here, I suppose it's okay to reshare some old content, right?

Here's my lecture from the EEML 2024 summer school in Novi SadπŸ‡·πŸ‡Έ, where I tried to give an intuitive introduction to diffusion models: youtu.be/9BHQvQlsVdE

Check out other lectures on their channel as well!

19.11.2024 09:57 πŸ‘ 114 πŸ” 11 πŸ’¬ 3 πŸ“Œ 0