yep, hence why I don't care about those sizes π
@joshjers.drilian.com
I've been programming for way too long Also sometimes I write or perform music Mostly I talk about random bullshit (he/him) https://drilian.com/ https://cathoderetro.com/ https://procyongame.com/ https://www.youtube.com/@Drilian
yep, hence why I don't care about those sizes π
Yep, that's about right then. I think it starts to fall out of L1 cache at 128k size, and then once it hits 512 it can't fit nicely in the 8-way L2 either
The good news is that it still scales better than the 3rd party ones I'm testing against, it's just that it's worse - only on arm - at those sizes than my previous implementation
I also don't personally care about sizes that large since this is mostly for realtime audio, so it's just a curiosity π
While it can work in contiguous blocks, some passes require 16 of them at once, and I'd be willing to bet on the pi 5 that up around 128-512k that's where suddenly the addresses get to the point where all of them map the the same 4- or 8- associative sets and that's why it falls off
Yeah the nice thing about the self-sorting version (at least, as I've designed it - no idea if this is a standard thing) is that it not only eliminates the initial (or final) bit reverse pass, but it always gets to work in contiguous blocks
I actually think I figured out what's going on...
The thing that makes just recursing into smaller chunks tricky is that I'm doing a self sorting FFT, which changes the memory access patterns in a way that is harder (for me) to reason about - it's more spread out in general
Thanks!
I understand the basics - in my case the perf cliff happens at about a 256k FFT, which probably means my blocking structure finally hits the limits of cache associativity or something, I'm just not exactly sure why
Yeah I needed effectively an array of arrays (sub arrays of varying lengths) which is where it fell apart π«
I couldn't figure out a great way to do that with any sort of nesting that didn't require writing a ton of boilerplate
If I ever get my hands on c++26 reflection I have some ideas though π
In the meantime I'm pretty liberal with "= delete" on copy construct/assign
But for custom math types, SIMD wrappers, etc, it's necessary for code readability to be able to overload even assignment
My main gripe is that C++ makes it too easy to copy most things where a copy is expensive, but there's no world for me in which disallowing a custom assignment is the correct solution.
I'd rather have an explicit "yes I want to do a copy here" operation (like "a = copy(b)" or something)
I hope he's okay that looked dangerous π¬
I have ADHD that went untreated until well into my 40s and anyone who uses it as an excuse to use AI can fuck off into the sea
Not for the next 3 years at least π«
crayon napkin drawing of USS MOTHER Fucker (enterprise d) BoLDLy Fucking your Shit up
idk where i found this or what the context is but i like it
Yeah there are some neat instructions (I like the load deinterleave/store interleaved ones) but also some weird omissions (32-bit arm not even having float divide???)
NEON does have a negate instruction though so that's one point up on SSE/AVX π
I can't imagine being like "I need to sit on this thing, so lemme just get this *seat* out of the way"
The other surprising thing is I get almost zero performance improvement on arm using the fused multiply add instructions (in fact, a trick I did that leverages fmas to do a little more "work" in a way that's more efficient on x64 is *much* slower on NEON, which is a bummer)
Been noodling on a from-first-principles FFT implementation, it outperforms some fairly well-optimized implementations, but for some reason (cache) on the Raspberry Pi it *massively* falls off a perf cliff in a way I thought I'd accounted for (same isn't true on x64)
I'm not sure wtf, exactly
First toilet I've seen that bidets itself
The (intentional) conflation of all these things makes me so mad
I'm reclaiming "vibe coding" to refer to writing code while listening to good music.
Choosing 9 was hard
Just off the list but almost made it:
- Thief II
- Illusion of Gaia
- Super Metroid
my9games.com
I was like "oh yeah you could definitely do that, but you may have to manually adjust the matrix (or post-multiply with one that scales/biases z" but then I was like "wait shit is that gonna affect the perspective correction on the textures" and now I don't know lol
They teach World History*
*it's actually just watered-down European history
Look, I know what you're asking me to do, but I'm sorry I can't imagine dragons I have aphantasia
If you're in the game industry and use LinkedIn, please do me a favor and share my post about Sentona Games screwing me out of money I'm owed. Thank you
*looking at a giant detective board of all the terrible shit the US government is doing with various threads connecting them via pins, trying to figure out which things are distractions from which other things*
Circumstances have changed, and Im trying to raise 6000 USD.
I have an ongoing 20% off sale in my shop and a ko-fi.
I currently have emergency commissions up on ko-fi
I know times are tough for everyone, so all I can ask for is shares.
ko-fi.com/megglesart
www.etsy.com/shop/Meggles...
Hadn't thought about it, but this absolutely makes sense. Neat π€©