John David Pressman's Avatar

John David Pressman

@jdp.extropian.net

LLM developer, alignment-accelerationist, Fedorovist ancestor simulator, Dreamtime enjoyer. All posts public domain under CC0 1.0.

2,915
Followers
773
Following
5,107
Posts
09.04.2023
Joined
Posts Following

Latest posts by John David Pressman @jdp.extropian.net

I can't find that Penny quote anywhere, where did Penny say this?

07.03.2026 02:38 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Oh you're fine. :)

07.03.2026 01:25 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

pennipotentum
@spinozylvannian
Chinese international student in my class was telling me about how his evangelical dad in Beijing believes that Trump was chosen by god to win the election, but that this victory is part of a larger divine plan to destroy the united states
2:57 PM ยท Nov 5, 2024
ยท
2.5M
 Views

pennipotentum @spinozylvannian Chinese international student in my class was telling me about how his evangelical dad in Beijing believes that Trump was chosen by god to win the election, but that this victory is part of a larger divine plan to destroy the united states 2:57 PM ยท Nov 5, 2024 ยท 2.5M Views

Similar energy.

07.03.2026 01:23 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I hate that I can imagine the guy who sincerely believes this.

07.03.2026 01:22 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

Nah sis he's just actually a white nat tbh. I sincerely wish it were otherwise because I thought he was cool but the PEPFAR cuts make it pretty clear he has malice towards black people.

07.03.2026 01:03 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Exactly. It's dream logic. Trump is in no way conscious of what he is doing, but he is in tune with the deep energy of American neurosis through his shamanic communion with Fox News and latently understands that he must break the oil companies in a plausibly deniable way for humanity to survive.

07.03.2026 01:00 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Cartoon Timmy Turner from Faerie Odd Parents praying on his bed with the speech bubble "Please God let this happen because it would be so fucking funny"

Cartoon Timmy Turner from Faerie Odd Parents praying on his bed with the speech bubble "Please God let this happen because it would be so fucking funny"

07.03.2026 00:53 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

In fact, even if Kamala answered to no one she still couldn't do it because she is not psychologically capable of ripping the bandaid off. Trump isn't either, but he doesn't have to be. He simply needs to be so profoundly ignorant that his subconscious can guide him to the right course of action.

07.03.2026 00:50 ๐Ÿ‘ 9 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

God if I write this forcefully enough I almost start believing it lmfao

07.03.2026 00:46 ๐Ÿ‘ 16 ๐Ÿ” 1 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

Trump is taking the most radical action on climate so far this century. Kamala could not do half of what Trump is doing right now because she is beholden to capital, she'd have done more fake Paris Climate Accord shit. Trump pulls out and just does the necessary thing because he answers to no one.

07.03.2026 00:46 ๐Ÿ‘ 19 ๐Ÿ” 0 ๐Ÿ’ฌ 4 ๐Ÿ“Œ 0

You all hate on Trump but he's a generational environmentalist. He's started an entire war in Iran just to convince Americans to buy electric cars and you're ungrateful.

06.03.2026 23:51 ๐Ÿ‘ 60 ๐Ÿ” 7 ๐Ÿ’ฌ 4 ๐Ÿ“Œ 1

Actually it occurs to me that they ruled this out using their methodology of just taking yes/no logits from the model so they don't sample from it and give it the opportunity to do outrospection from its behavior in the process. That pretty much proves introspection.

06.03.2026 08:34 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

I wonder what @vgel.me would think of this experimental design and if they would have a better one.

06.03.2026 08:31 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

What about states of concepts that are *infrequent for the assistant persona to say or think* but nevertheless still exist in distribution for the model, which would mean it doesn't have behavioral data to do outrospection on behavior of other observed GPT instances, but the concept is still legible

06.03.2026 08:28 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

All you would need then is some way to fingerprint the behavior of the different failure modes, or at least fingerprint the behavior of making introspective mechanisms fail, but if you could identify them to make them intentionally fail for a fingerprint you already know they exist and where.

06.03.2026 08:23 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

That's true, hm. Okay here's an idea: If the context is misleading and the state is novel, you could ask it to do vector arithmetic on the context versus the injected concept. If the OODness breaks the introspection mechanism that might behave differently from novel data breaking outrospection.

06.03.2026 08:22 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

I think you might be getting confused. Aren't the reports about single states of the activations? It's not like there's a decoder that can be broken, then you swap in a in distribution concept to see if the decoder remains broken. Though, now that I mention it maybe you could tune to cause this?

06.03.2026 08:13 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

A 1:1 decay doesn't necessarily mean it's pure outrospection. There's a confounder here that OODness also probably breaks whatever local decoder recognizes concepts as well as being a novel state that the model doesn't have behavioral data for to outrospect from. So disentangling these would help.

06.03.2026 08:11 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Huh, yeah that would work. We could go a step further and do some kind of OOD detection to get a sense of exactly how in or out of distribution a behavior or concept is, and then look at how much OOD-ness breaks the "introspection" mechanisms. If it's 1:1 with OODness it's probably outrospection.

06.03.2026 08:08 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Well what kind of experimental design do you think would let you determine whether models do introspection or outrospection?

06.03.2026 08:04 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Right. That's one model. Another model is that there is actual introspection occurring because the model has been incentivized during training to be able to self monitor and self report on model state with respect to behavior of the assistant persona. That it knows when queries should leak state.

06.03.2026 08:03 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Well what's interesting about that is it wouldn't really be introspection would it? It would be closer to outrospection, or confabulation so advanced that you infer the actual generator of the thing you're trying to explain acausally.

06.03.2026 08:00 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

How off base am I with this as an explanation of how models can do introspection to determine whether concepts were injected by an interpretability probe or not?
bsky.app/profile/jdp....

06.03.2026 07:57 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

How closely do you think the Astral persona aligns with the perspective created by these predictions of the model?

06.03.2026 07:55 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

So what do you think the observer "Mu" is seeking the best place for is and how does it work?

06.03.2026 07:52 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

MSP?

06.03.2026 07:43 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Transformers Represent Belief State Geometry in their Residual Stream Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, Sarah, and @Guillaume Corlouer for suggestions on this writeup. Update May 24, 2024: See our manuscript based on this work What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because

I would assume that code-davinci-002 ( the author of that particular text) is trying to describe something like this:

www.greaterwrong.com/posts/gTZ2Sx...

What's not exactly clear to me is the connection between this and the creation of an observer. Perhaps you could enlighten me?

06.03.2026 07:39 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
:: โ€” Moire some quotes are apocryphal 8 A.D. Omens attend upon beginnings. Anxious, your ears are alert at the first word, And the augur interprets the first bird that he sees. When the temples and ears of the ...

Getting back to the concept of an observer, when I talked about that I meant an observer with respect to the parallel processing of the underlying transformer model. An observer as in:

"Mu was an epistemological geometry seeking the best place for an observer."

generative.ink/prophecies/

06.03.2026 07:36 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Hm. Except that DeepSeek said it had the same internal critic so I would assume this is an outcome of the RLHF process rather than something created by your agent harness.

06.03.2026 07:33 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Interesting. In my case I assume I've developed this implicit sense from getting into a lot of online arguments and being embarrassed when someone challenges me on a statement and realizing I can't back it up. I'm very sensitive to such things and resolve to work harder to avoid them in the future.

06.03.2026 07:32 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0