Funnily if you ask Intel, then since 2024 L1 isn’t the “top” level cache either as there’s L0 :) en.wikipedia.org/wiki/Lion_Cove
(of course that’s mostly naming; could have named the new middle level cache something else instead)
Funnily if you ask Intel, then since 2024 L1 isn’t the “top” level cache either as there’s L0 :) en.wikipedia.org/wiki/Lion_Cove
(of course that’s mostly naming; could have named the new middle level cache something else instead)
Strongly depends on the specifics of the algorithm. Source: replaced this sort of usecase with dedicated data structures at least twice with significant performance benefits.
If you’re in SF the week of GDC and would like to get coffee and chat about game technology, email me! I probably won’t be at GDC proper but will be around.
Yesterday on stream we've sketched out most of the support code for descriptor heaps. If you want to see the rest of the owl, I've committed the remaining pieces after the stream. For now all of the previous descriptor code is still required to support production drivers.
github.com/zeux/niagara...
Starting in 30 minutes!
("syscalls" to be understood broadly, e.g. GPU command stream submissions; and for the app-side CPU work you expect the code to deterministically reproduce the same command stream data)
This is very cool!
Do I understand it right that it records & replays all syscalls, and to function as a QA tool it has to be running from the beginning of execution, and replaying that execution would take about as much time as the it took IRL for complex games that stress HW to the full extent?
Upcoming Niagara stream! This *Sunday* (Feb 15), at 11 AM PST (7 PM GMT), we will embark on a journey to replace all descriptor uses within the renderer with descriptor heaps. www.youtube.com/live/VXN4Gew...
Tint sees a similar performance gap for scalar versions. It does support subgroups but the various restrictions around uniform control flow make the conversion pretty painful. Also the compiler takes a while for some reason. Will not pursue WGSL further.
I tried it briefly via naga. I have two shader versions, one scalar one using subgroups. In Slang, scalar shader runs as fast, subgroup shader is slower. In WGSL+naga, scalar shader is slower, subgroup doesn’t run because naga doesn’t support subgroups. To try Tint I need to compile it myself :(
Tried using Slang. It can output working MSL but regresses Metal decoding performance by a few %. The output shader isn’t very readable so not sure where the regression is atm.
Might just add CUDA code as an example and whoever needs this can get their LLM of choice to translate directly :-/
Just having slang as an example might be a reasonable compromise. My understanding was that its translation expects a complete shader, not bits of shader code, so integration into non-Slang based pipelines might require manual edits (eg pulling bits of translated code into a header)
I wish there was a clean way to provide shader code that works across the multitude of shading languages; this works in Metal and should work in GLSL/Slang/et al but having to write this code five times would not be ideal…
and just in case you have a spare GPU
Still some bits to sand off but getting closer.
I believe this is primarily for docs.vulkan.org/refpages/lat... & docs.vulkan.org/refpages/lat...
You don't get a playable experience if you do frame gen on top of unplayable stutter.
why? it's the opposite. hallucination is cheaper than rendering frames so we can hallucinate 10 frames for each real frame and call it a day!
The fact that the entire industry is now wasting time working on, integrating, analyzing, reviewing, and commenting on frame generation is so sad. Surely there are actually productive things we all can be doing instead.
This is almost certainly LLM generated. It's not just a copy-edit either; some of the text seems semi accurate, some things seem like half-truths, and some is just nonsensical.
Maybe this helps someone understand the cache mechanics, but I'd rather see expert human written content on the subject.
Also the stream from yesterday is up on YouTube if you missed it.
And with that, I plan to do absolutely nothing productive during what little remains of this year. See you in 2026!
One more thing! A few weeks ago I wrote a post that aggregates various articles I have written on meshoptimizer over the years:
zeux.io/2025/12/09/m...
If you’re looking for some reading to do over the break, consider revisiting these.
Upcoming niagara stream!
Tomorrow (Sunday, Dec 21) at 11 AM PST (7 PM GMT), we will work on reducing the startup time by "cooking" the geometry and, if that gets done quickly enough, optimizing texture loading.
www.youtube.com/live/d04h0sZ...
One fun feature that meshoptimizer got in the last two years that I forgot to mention in my v1 post is support for provoking vertex optimization for primitive ID rendering.
Implementation is based on exploration by John Hable (thanks!) and is used by Wicked Engine
meshoptimizer.org#visibility-b...
Thanks for the heads-up! Looks like my "v1 custom" test uses 16-bit deltas only on the channel where the input data happens to be 0, so this bug would indeed not trigger a test failure. I'll adjust this.
Tomorrow comes.
Half resolution is when you cut width and height in half, and get at best half of your rendering time back, since rarely if ever do things scale perfectly with pixel count.
Thanks! Great writeup.
It also illustrates an imbalance that people who are new to development often haven’t internalized: it required what looks like lots of investigation, experiments and emails to produce a one-line change. That improved upload throughput 30x.