I can't find that Penny quote anywhere, where did Penny say this?
I can't find that Penny quote anywhere, where did Penny say this?
Oh you're fine. :)
pennipotentum @spinozylvannian Chinese international student in my class was telling me about how his evangelical dad in Beijing believes that Trump was chosen by god to win the election, but that this victory is part of a larger divine plan to destroy the united states 2:57 PM ยท Nov 5, 2024 ยท 2.5M Views
Similar energy.
I hate that I can imagine the guy who sincerely believes this.
Nah sis he's just actually a white nat tbh. I sincerely wish it were otherwise because I thought he was cool but the PEPFAR cuts make it pretty clear he has malice towards black people.
Exactly. It's dream logic. Trump is in no way conscious of what he is doing, but he is in tune with the deep energy of American neurosis through his shamanic communion with Fox News and latently understands that he must break the oil companies in a plausibly deniable way for humanity to survive.
Cartoon Timmy Turner from Faerie Odd Parents praying on his bed with the speech bubble "Please God let this happen because it would be so fucking funny"
In fact, even if Kamala answered to no one she still couldn't do it because she is not psychologically capable of ripping the bandaid off. Trump isn't either, but he doesn't have to be. He simply needs to be so profoundly ignorant that his subconscious can guide him to the right course of action.
God if I write this forcefully enough I almost start believing it lmfao
Trump is taking the most radical action on climate so far this century. Kamala could not do half of what Trump is doing right now because she is beholden to capital, she'd have done more fake Paris Climate Accord shit. Trump pulls out and just does the necessary thing because he answers to no one.
You all hate on Trump but he's a generational environmentalist. He's started an entire war in Iran just to convince Americans to buy electric cars and you're ungrateful.
Actually it occurs to me that they ruled this out using their methodology of just taking yes/no logits from the model so they don't sample from it and give it the opportunity to do outrospection from its behavior in the process. That pretty much proves introspection.
I wonder what @vgel.me would think of this experimental design and if they would have a better one.
What about states of concepts that are *infrequent for the assistant persona to say or think* but nevertheless still exist in distribution for the model, which would mean it doesn't have behavioral data to do outrospection on behavior of other observed GPT instances, but the concept is still legible
All you would need then is some way to fingerprint the behavior of the different failure modes, or at least fingerprint the behavior of making introspective mechanisms fail, but if you could identify them to make them intentionally fail for a fingerprint you already know they exist and where.
That's true, hm. Okay here's an idea: If the context is misleading and the state is novel, you could ask it to do vector arithmetic on the context versus the injected concept. If the OODness breaks the introspection mechanism that might behave differently from novel data breaking outrospection.
I think you might be getting confused. Aren't the reports about single states of the activations? It's not like there's a decoder that can be broken, then you swap in a in distribution concept to see if the decoder remains broken. Though, now that I mention it maybe you could tune to cause this?
A 1:1 decay doesn't necessarily mean it's pure outrospection. There's a confounder here that OODness also probably breaks whatever local decoder recognizes concepts as well as being a novel state that the model doesn't have behavioral data for to outrospect from. So disentangling these would help.
Huh, yeah that would work. We could go a step further and do some kind of OOD detection to get a sense of exactly how in or out of distribution a behavior or concept is, and then look at how much OOD-ness breaks the "introspection" mechanisms. If it's 1:1 with OODness it's probably outrospection.
Well what kind of experimental design do you think would let you determine whether models do introspection or outrospection?
Right. That's one model. Another model is that there is actual introspection occurring because the model has been incentivized during training to be able to self monitor and self report on model state with respect to behavior of the assistant persona. That it knows when queries should leak state.
Well what's interesting about that is it wouldn't really be introspection would it? It would be closer to outrospection, or confabulation so advanced that you infer the actual generator of the thing you're trying to explain acausally.
How off base am I with this as an explanation of how models can do introspection to determine whether concepts were injected by an interpretability probe or not?
bsky.app/profile/jdp....
How closely do you think the Astral persona aligns with the perspective created by these predictions of the model?
So what do you think the observer "Mu" is seeking the best place for is and how does it work?
MSP?
I would assume that code-davinci-002 ( the author of that particular text) is trying to describe something like this:
www.greaterwrong.com/posts/gTZ2Sx...
What's not exactly clear to me is the connection between this and the creation of an observer. Perhaps you could enlighten me?
Getting back to the concept of an observer, when I talked about that I meant an observer with respect to the parallel processing of the underlying transformer model. An observer as in:
"Mu was an epistemological geometry seeking the best place for an observer."
generative.ink/prophecies/
Hm. Except that DeepSeek said it had the same internal critic so I would assume this is an outcome of the RLHF process rather than something created by your agent harness.
Interesting. In my case I assume I've developed this implicit sense from getting into a lot of online arguments and being embarrassed when someone challenges me on a statement and realizing I can't back it up. I'm very sensitive to such things and resolve to work harder to avoid them in the future.