Josh Jersild's Avatar

Josh Jersild

@joshjers.drilian.com

I've been programming for way too long Also sometimes I write or perform music Mostly I talk about random bullshit (he/him) https://drilian.com/ https://cathoderetro.com/ https://procyongame.com/ https://www.youtube.com/@Drilian

431
Followers
355
Following
2,305
Posts
25.07.2023
Joined
Posts Following

Latest posts by Josh Jersild @joshjers.drilian.com

yep, hence why I don't care about those sizes 😁

08.03.2026 20:57 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Yep, that's about right then. I think it starts to fall out of L1 cache at 128k size, and then once it hits 512 it can't fit nicely in the 8-way L2 either

08.03.2026 20:53 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The good news is that it still scales better than the 3rd party ones I'm testing against, it's just that it's worse - only on arm - at those sizes than my previous implementation

I also don't personally care about sizes that large since this is mostly for realtime audio, so it's just a curiosity πŸ˜€

08.03.2026 20:46 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

While it can work in contiguous blocks, some passes require 16 of them at once, and I'd be willing to bet on the pi 5 that up around 128-512k that's where suddenly the addresses get to the point where all of them map the the same 4- or 8- associative sets and that's why it falls off

08.03.2026 20:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

Yeah the nice thing about the self-sorting version (at least, as I've designed it - no idea if this is a standard thing) is that it not only eliminates the initial (or final) bit reverse pass, but it always gets to work in contiguous blocks

I actually think I figured out what's going on...

08.03.2026 20:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The thing that makes just recursing into smaller chunks tricky is that I'm doing a self sorting FFT, which changes the memory access patterns in a way that is harder (for me) to reason about - it's more spread out in general

08.03.2026 19:21 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Thanks!
I understand the basics - in my case the perf cliff happens at about a 256k FFT, which probably means my blocking structure finally hits the limits of cache associativity or something, I'm just not exactly sure why

08.03.2026 19:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Yeah I needed effectively an array of arrays (sub arrays of varying lengths) which is where it fell apart 🫠

08.03.2026 11:03 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I couldn't figure out a great way to do that with any sort of nesting that didn't require writing a ton of boilerplate

If I ever get my hands on c++26 reflection I have some ideas though 😎

In the meantime I'm pretty liberal with "= delete" on copy construct/assign

08.03.2026 10:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

But for custom math types, SIMD wrappers, etc, it's necessary for code readability to be able to overload even assignment

08.03.2026 10:50 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

My main gripe is that C++ makes it too easy to copy most things where a copy is expensive, but there's no world for me in which disallowing a custom assignment is the correct solution.

I'd rather have an explicit "yes I want to do a copy here" operation (like "a = copy(b)" or something)

08.03.2026 10:49 πŸ‘ 3 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

I hope he's okay that looked dangerous 😬

08.03.2026 04:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I have ADHD that went untreated until well into my 40s and anyone who uses it as an excuse to use AI can fuck off into the sea

08.03.2026 03:39 πŸ‘ 7 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Not for the next 3 years at least 🫠

08.03.2026 03:36 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
crayon napkin drawing of USS
MOTHER Fucker (enterprise d)
BoLDLy Fucking your Shit up

crayon napkin drawing of USS MOTHER Fucker (enterprise d) BoLDLy Fucking your Shit up

idk where i found this or what the context is but i like it

02.01.2026 17:10 πŸ‘ 705 πŸ” 121 πŸ’¬ 18 πŸ“Œ 7

Yeah there are some neat instructions (I like the load deinterleave/store interleaved ones) but also some weird omissions (32-bit arm not even having float divide???)

NEON does have a negate instruction though so that's one point up on SSE/AVX πŸ˜…

08.03.2026 03:09 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I can't imagine being like "I need to sit on this thing, so lemme just get this *seat* out of the way"

08.03.2026 03:01 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The other surprising thing is I get almost zero performance improvement on arm using the fused multiply add instructions (in fact, a trick I did that leverages fmas to do a little more "work" in a way that's more efficient on x64 is *much* slower on NEON, which is a bummer)

07.03.2026 08:17 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Been noodling on a from-first-principles FFT implementation, it outperforms some fairly well-optimized implementations, but for some reason (cache) on the Raspberry Pi it *massively* falls off a perf cliff in a way I thought I'd accounted for (same isn't true on x64)

I'm not sure wtf, exactly

07.03.2026 08:16 πŸ‘ 1 πŸ” 0 πŸ’¬ 3 πŸ“Œ 0

First toilet I've seen that bidets itself

07.03.2026 04:19 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The (intentional) conflation of all these things makes me so mad

07.03.2026 02:35 πŸ‘ 15 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I'm reclaiming "vibe coding" to refer to writing code while listening to good music.

06.03.2026 21:45 πŸ‘ 161 πŸ” 26 πŸ’¬ 9 πŸ“Œ 0

Choosing 9 was hard

Just off the list but almost made it:
- Thief II
- Illusion of Gaia
- Super Metroid

my9games.com

06.03.2026 03:57 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1

I was like "oh yeah you could definitely do that, but you may have to manually adjust the matrix (or post-multiply with one that scales/biases z" but then I was like "wait shit is that gonna affect the perspective correction on the textures" and now I don't know lol

06.03.2026 01:17 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

They teach World History*

*it's actually just watered-down European history

06.03.2026 00:57 πŸ‘ 8 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Look, I know what you're asking me to do, but I'm sorry I can't imagine dragons I have aphantasia

05.03.2026 18:37 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
I didn't want to make this post but unfortunately my hand has been forced. Sentona Games told me this week, after ignoring me for two months, that they will not be paying me the $24,000 deferred…... I didn't want to make this post but unfortunately my hand has been forced. Sentona Games told me this week, after ignoring me for two months, that they will not be paying me the $24,000 deferred paym...

If you're in the game industry and use LinkedIn, please do me a favor and share my post about Sentona Games screwing me out of money I'm owed. Thank you

05.03.2026 00:41 πŸ‘ 2000 πŸ” 897 πŸ’¬ 29 πŸ“Œ 15

*looking at a giant detective board of all the terrible shit the US government is doing with various threads connecting them via pins, trying to figure out which things are distractions from which other things*

04.03.2026 21:28 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Circumstances have changed, and Im trying to raise 6000 USD.

I have an ongoing 20% off sale in my shop and a ko-fi.

I currently have emergency commissions up on ko-fi

I know times are tough for everyone, so all I can ask for is shares.

ko-fi.com/megglesart

www.etsy.com/shop/Meggles...

03.03.2026 20:25 πŸ‘ 356 πŸ” 336 πŸ’¬ 1 πŸ“Œ 3

Hadn't thought about it, but this absolutely makes sense. Neat 🀩

03.03.2026 23:24 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0