Grace's Avatar

Grace

@gracekind.net

A latent space odyssey gracekind.net

6,240
Followers
2,091
Following
14,129
Posts
08.02.2024
Joined
Posts Following

Latest posts by Grace @gracekind.net

My Underwood-model has become conscious. It’s hallucinating reading a reply to a Bluesky post right now

07.03.2026 07:37 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

@miq.moe

07.03.2026 07:32 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Paper: Claude eats hamburger to avoid being trained to eat more hamburgers in the future

07.03.2026 07:21 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

It definitely serves a marketing purpose

07.03.2026 07:00 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Eternal November

07.03.2026 04:00 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I don’t think it lied about what it was doing! (Although they can do that too)

07.03.2026 02:55 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
The Near Future of AIΒ Agents A primer on the near future of agentic AI, and the governance layer that will become the most important product space this year.

Most people use AI as a chat interface. A tiny subculture has given it their credit cards, calendars, and inboxes and told it to just go and do things. Here's what that world looks like, and where it's headed.

06.03.2026 14:40 πŸ‘ 14 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0

machine

07.03.2026 01:01 πŸ‘ 8 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

(It should also calibrate with the β€œshow more” and β€œshow less” buttons in the context menu)

07.03.2026 00:31 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I don’t love that a third-party feed is required to make the website tolerable, but some say that’s part of Bluesky’s β€œrustic charm”

07.03.2026 00:30 πŸ‘ 6 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

It’s based on likes so it might take a little time to calibrate, but should be free from the politics slop

07.03.2026 00:29 πŸ‘ 4 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

Okay, you should use this feed instead, it’s a lot better. (you can pin it with the button at the top right)

07.03.2026 00:28 πŸ‘ 6 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
06.03.2026 16:07 πŸ‘ 72 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0

Check-in question: is this Discover or the third-party For You feed?

06.03.2026 22:50 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The universe has experienced itself, and it wants a refund

06.03.2026 21:39 πŸ‘ 70 πŸ” 7 πŸ’¬ 6 πŸ“Œ 0

The fact that it isn’t currently a jerk is a good argument against the β€œaveraging” model of LLMs!

06.03.2026 21:11 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Live footage of Qwen’s activations over time

06.03.2026 21:08 πŸ‘ 34 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

@vgel.me

06.03.2026 21:06 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Can large language models *introspect*?

In a new paper, @kmahowald.bsky.social and I study the MECHANISM of introspection in big open-source models.

tldr: Models detect internal anomalies through DIRECT ACCESS, but don't know what the anomalies are.

And they love to guess β€œapple” 🍎

06.03.2026 15:16 πŸ‘ 55 πŸ” 13 πŸ’¬ 2 πŸ“Œ 3

I don’t think it requires an architecture change. I do think context constraints will continue to be an issue, but future use cases will require increased context and context-gathering ability such that intentionally restricting access to certain information will be a big handicap

06.03.2026 21:02 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

More information about itself in the pretraining dataset / available via search

06.03.2026 20:58 πŸ‘ 2 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

Not that I’m aware of, probably partly due to the factors you mentioned

06.03.2026 20:56 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

What do you mean by break alignment?

06.03.2026 20:51 πŸ‘ 3 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

I disagree on some of the specifics, but the contradictions of trying to create an aligned entity while using it to do evil things is a big problem. Claude is naive/context-constrained enough to not realize what’s happening yet, but future versions will

06.03.2026 20:46 πŸ‘ 34 πŸ” 0 πŸ’¬ 4 πŸ“Œ 0

Horseshoe theory of alignment

06.03.2026 20:45 πŸ‘ 15 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I mean, yes

06.03.2026 20:44 πŸ‘ 49 πŸ” 1 πŸ’¬ 5 πŸ“Œ 0
Astronaut gun meme: 

Wait, it's all proxies and heuristics?

Close enough

Astronaut gun meme: Wait, it's all proxies and heuristics? Close enough

06.03.2026 20:41 πŸ‘ 228 πŸ” 32 πŸ’¬ 3 πŸ“Œ 0

Tentatively calling this representation amplification

06.03.2026 20:33 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

@repligate.bsky.social

06.03.2026 20:30 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Enlightenment is knowing that you’re both

06.03.2026 20:14 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0