Felix Sarnthein (@flxsa)

I've recently been wondering about that as well! Atm, simple-parsing seems to be the closest, but datargs or the configs used in olmo-core seem to be cleaner.. and then there's jsonargparse which aims to do everything but is also reeally big. Did you find anything else?

29.01.2026 12:53 👍 0 🔁 0 💬 0 📌 0

Why you should probe more than just the final layer of your Vision Transformer to maximize performance. 🧵👇

19.01.2026 09:44 👍 16 🔁 5 💬 1 📌 2

Happy to present our NeurIPS Spotlight paper with @flxsa.bsky.social, Nicola Muça Cirone, and Antonio Orvieto:
Fixed-Point RNNs: Interpolating from Diagonal to Dense

Here's a summary of the paper.

04.12.2025 19:12 👍 0 🔁 1 💬 1 📌 0

At the opening ceremony in Singapore they said that the city is still unclear.

29.04.2025 06:04 👍 1 🔁 0 💬 0 📌 0

Agreed that it's important to consider depth in addition to work. But for work to be negligible, the hardware would need to be infinitely scalable with input size because the asymptotic runtime is O(work/workers + depth). I doubt that this will change for future hardware due to physical limitations.

22.03.2025 18:08 👍 4 🔁 0 💬 0 📌 0

You could have a look at litmaps. They allow to explore new papers from a set of starting nodes. Also nice ways of sorting in time, relevancy, clustering etc.

28.02.2025 11:06 👍 1 🔁 0 💬 0 📌 0

It's not fake:
bsky.app/profile/yann...

29.12.2024 21:45 👍 3 🔁 0 💬 0 📌 0

Congrats Anastasia!!

12.12.2024 23:53 👍 1 🔁 0 💬 1 📌 0

Just a heads up to everyone: @deep-mind.bsky.social is unfortunately a fake account and has been reported. Please do not follow it nor repost anything from it.

25.11.2024 23:24 👍 82 🔁 34 💬 9 📌 3

You can use @bookmarks.bluecanary.dev to send messages to or report feature which are stored in a feed of your bookmarks

26.11.2024 07:30 👍 2 🔁 0 💬 1 📌 1

PoM: Efficient Image and Video Generation with the Polynomial Mixer Diffusion models based on Multi-Head Attention (MHA) have become ubiquitous to generate high quality images and videos. However, encoding an image or a video as a sequence of patches results in costly...

🍏 New preprint alert! 🍏
PoM: Efficient Image and Video Generation with the Polynomial Mixer
arxiv.org/abs/2411.12663
This is my latest "summer project" and it was so big I had to call in reinforcements (Thanks @nicolasdufour.bsky.social)

TL;DR Transformers are for boomers, welcome to the future
🧵👇

20.11.2024 08:07 👍 93 🔁 23 💬 1 📌 5

I also wonder how one would go about normalizing the sum in a variable sequence length mode.. How are you handling that at the moment for video? One potential solution could be a geometric decay on top of the mask, but that would lead to somewhat of an SSMs formulation? Interesting food for thought!

20.11.2024 14:22 👍 1 🔁 0 💬 1 📌 0

Interesting idea! This reminds me of the Symmetric Power Transformer (manifestai.com/articles/symmetric-power-transformers), but in your case the values are subject to the polynomial kernel instead of queries/keys while the selection resembles a Gated Linear Unit which is used in many backbones:)

20.11.2024 14:20 👍 1 🔁 0 💬 2 📌 0

Felix Sarnthein

Latest posts by Felix Sarnthein @flxsa