Arseny Kapoulkine (@zeux.io)

Lion Cove - Wikipedia

Funnily if you ask Intel, then since 2024 L1 isn’t the “top” level cache either as there’s L0 :) en.wikipedia.org/wiki/Lion_Cove

(of course that’s mostly naming; could have named the new middle level cache something else instead)

28.02.2026 15:44 👍 2 🔁 0 💬 1 📌 0

Strongly depends on the specifics of the algorithm. Source: replaced this sort of usecase with dedicated data structures at least twice with significant performance benefits.

27.02.2026 20:54 👍 1 🔁 0 💬 1 📌 0

If you’re in SF the week of GDC and would like to get coffee and chat about game technology, email me! I probably won’t be at GDC proper but will be around.

23.02.2026 20:01 👍 5 🔁 0 💬 0 📌 0

Comparing f25b318a924a70ad98cb43bd6bbd5b1dd04c7780...8cf95feee3e7f37b2864795dbcbbf0acc375860c · zeux/niagara A Vulkan renderer written from scratch on stream. Contribute to zeux/niagara development by creating an account on GitHub.

Yesterday on stream we've sketched out most of the support code for descriptor heaps. If you want to see the rest of the owl, I've committed the remaining pieces after the stream. For now all of the previous descriptor code is still required to support production drivers.

github.com/zeux/niagara...

16.02.2026 18:19 👍 17 🔁 1 💬 0 📌 0

Starting in 30 minutes!

15.02.2026 18:32 👍 1 🔁 0 💬 0 📌 0

("syscalls" to be understood broadly, e.g. GPU command stream submissions; and for the app-side CPU work you expect the code to deterministically reproduce the same command stream data)

13.02.2026 04:06 👍 2 🔁 0 💬 1 📌 0

This is very cool!

Do I understand it right that it records & replays all syscalls, and to function as a QA tool it has to be running from the beginning of execution, and replaying that execution would take about as much time as the it took IRL for complex games that stress HW to the full extent?

13.02.2026 04:05 👍 2 🔁 0 💬 1 📌 0

niagara: Descriptor heaps YouTube video by Arseny Kapoulkine

Upcoming Niagara stream! This *Sunday* (Feb 15), at 11 AM PST (7 PM GMT), we will embark on a journey to replace all descriptor uses within the renderer with descriptor heaps. www.youtube.com/live/VXN4Gew...

12.02.2026 18:36 👍 9 🔁 2 💬 1 📌 0

Tint sees a similar performance gap for scalar versions. It does support subgroups but the various restrictions around uniform control flow make the conversion pretty painful. Also the compiler takes a while for some reason. Will not pursue WGSL further.

09.02.2026 16:35 👍 1 🔁 0 💬 0 📌 0

I tried it briefly via naga. I have two shader versions, one scalar one using subgroups. In Slang, scalar shader runs as fast, subgroup shader is slower. In WGSL+naga, scalar shader is slower, subgroup doesn’t run because naga doesn’t support subgroups. To try Tint I need to compile it myself :(

09.02.2026 16:12 👍 2 🔁 0 💬 1 📌 0

Tried using Slang. It can output working MSL but regresses Metal decoding performance by a few %. The output shader isn’t very readable so not sure where the regression is atm.

Might just add CUDA code as an example and whoever needs this can get their LLM of choice to translate directly :-/

09.02.2026 04:56 👍 1 🔁 0 💬 0 📌 0

Just having slang as an example might be a reasonable compromise. My understanding was that its translation expects a complete shader, not bits of shader code, so integration into non-Slang based pipelines might require manual edits (eg pulling bits of translated code into a header)

08.02.2026 18:32 👍 0 🔁 0 💬 0 📌 0

I wish there was a clean way to provide shader code that works across the multitude of shading languages; this works in Metal and should work in GLSL/Slang/et al but having to write this code five times would not be ideal…

08.02.2026 17:42 👍 3 🔁 0 💬 3 📌 0

and just in case you have a spare GPU

08.02.2026 16:23 👍 9 🔁 0 💬 1 📌 0

06.02.2026 03:19 👍 16 🔁 0 💬 0 📌 0

Still some bits to sand off but getting closer.

31.01.2026 17:29 👍 13 🔁 0 💬 1 📌 0

VK_QCOM_rotated_copy_commands(3) :: Vulkan Documentation Project

I believe this is primarily for docs.vulkan.org/refpages/lat... & docs.vulkan.org/refpages/lat...

20.01.2026 20:55 👍 3 🔁 0 💬 0 📌 0

You don't get a playable experience if you do frame gen on top of unplayable stutter.

06.01.2026 18:55 👍 1 🔁 0 💬 1 📌 0

why? it's the opposite. hallucination is cheaper than rendering frames so we can hallucinate 10 frames for each real frame and call it a day!

06.01.2026 06:20 👍 3 🔁 0 💬 2 📌 0

The fact that the entire industry is now wasting time working on, integrating, analyzing, reviewing, and commenting on frame generation is so sad. Surely there are actually productive things we all can be doing instead.

06.01.2026 04:34 👍 48 🔁 2 💬 3 📌 0

This is almost certainly LLM generated. It's not just a copy-edit either; some of the text seems semi accurate, some things seem like half-truths, and some is just nonsensical.

Maybe this helps someone understand the cache mechanics, but I'd rather see expert human written content on the subject.

31.12.2025 19:19 👍 24 🔁 1 💬 1 📌 0

Also the stream from yesterday is up on YouTube if you missed it.

And with that, I plan to do absolutely nothing productive during what little remains of this year. See you in 2026!

22.12.2025 22:51 👍 9 🔁 0 💬 0 📌 0

meshoptimizer 1.0 released A short post today. If you’re following me on any of the vast array of social accounts then you’re probably aware, but if you’re reading this blog through the ancient technology otherwise known as RSS...

One more thing! A few weeks ago I wrote a post that aggregates various articles I have written on meshoptimizer over the years:

zeux.io/2025/12/09/m...

If you’re looking for some reading to do over the break, consider revisiting these.

22.12.2025 22:51 👍 27 🔁 7 💬 1 📌 0

niagara: Cooking geometry YouTube video by Arseny Kapoulkine

Upcoming niagara stream!

Tomorrow (Sunday, Dec 21) at 11 AM PST (7 PM GMT), we will work on reducing the startup time by "cooking" the geometry and, if that gets done quickly enough, optimizing texture loading.

www.youtube.com/live/d04h0sZ...

21.12.2025 03:15 👍 7 🔁 1 💬 0 📌 0

🐇 meshoptimizer Mesh optimization library that makes meshes smaller and faster to render

One fun feature that meshoptimizer got in the last two years that I forgot to mention in my v1 post is support for provoking vertex optimization for primitive ID rendering.

Implementation is based on exploration by John Hable (thanks!) and is used by Wicked Engine

meshoptimizer.org#visibility-b...

18.12.2025 17:11 👍 28 🔁 2 💬 0 📌 0

Thanks for the heads-up! Looks like my "v1 custom" test uses 16-bit deltas only on the channel where the input data happens to be 0, so this bug would indeed not trigger a test failure. I'll adjust this.

16.12.2025 18:38 👍 0 🔁 0 💬 0 📌 0

Tomorrow comes.

12.12.2025 03:10 👍 1 🔁 0 💬 0 📌 0

Critique of “Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis”

FYI girl.surgery/bad_paper

11.12.2025 23:01 👍 6 🔁 2 💬 0 📌 0

Half resolution is when you cut width and height in half, and get at best half of your rendering time back, since rarely if ever do things scale perfectly with pixel count.

11.12.2025 22:39 👍 10 🔁 1 💬 0 📌 0

Thanks! Great writeup.

It also illustrates an imbalance that people who are new to development often haven’t internalized: it required what looks like lots of investigation, experiments and emails to produce a one-line change. That improved upload throughput 30x.

11.12.2025 16:54 👍 2 🔁 0 💬 0 📌 0

Arseny Kapoulkine

Latest posts by Arseny Kapoulkine @zeux.io