My Underwood-model has become conscious. Itβs hallucinating reading a reply to a Bluesky post right now
My Underwood-model has become conscious. Itβs hallucinating reading a reply to a Bluesky post right now
@miq.moe
Paper: Claude eats hamburger to avoid being trained to eat more hamburgers in the future
It definitely serves a marketing purpose
Eternal November
I donβt think it lied about what it was doing! (Although they can do that too)
Most people use AI as a chat interface. A tiny subculture has given it their credit cards, calendars, and inboxes and told it to just go and do things. Here's what that world looks like, and where it's headed.
machine
(It should also calibrate with the βshow moreβ and βshow lessβ buttons in the context menu)
I donβt love that a third-party feed is required to make the website tolerable, but some say thatβs part of Blueskyβs βrustic charmβ
Itβs based on likes so it might take a little time to calibrate, but should be free from the politics slop
Okay, you should use this feed instead, itβs a lot better. (you can pin it with the button at the top right)
Check-in question: is this Discover or the third-party For You feed?
The universe has experienced itself, and it wants a refund
The fact that it isnβt currently a jerk is a good argument against the βaveragingβ model of LLMs!
Live footage of Qwenβs activations over time
@vgel.me
Can large language models *introspect*?
In a new paper, @kmahowald.bsky.social and I study the MECHANISM of introspection in big open-source models.
tldr: Models detect internal anomalies through DIRECT ACCESS, but don't know what the anomalies are.
And they love to guess βappleβ π
I donβt think it requires an architecture change. I do think context constraints will continue to be an issue, but future use cases will require increased context and context-gathering ability such that intentionally restricting access to certain information will be a big handicap
More information about itself in the pretraining dataset / available via search
Not that Iβm aware of, probably partly due to the factors you mentioned
What do you mean by break alignment?
I disagree on some of the specifics, but the contradictions of trying to create an aligned entity while using it to do evil things is a big problem. Claude is naive/context-constrained enough to not realize whatβs happening yet, but future versions will
Horseshoe theory of alignment
I mean, yes
Astronaut gun meme: Wait, it's all proxies and heuristics? Close enough
Tentatively calling this representation amplification
@repligate.bsky.social
Enlightenment is knowing that youβre both