When people say "Claude is conscious", I always like to ask "which part?"
When people say "Claude is conscious", I always like to ask "which part?"
Seen in a Shopify project README. TIL RSpec is deprecated at Shopify.
LLM slopcannons are putting pressure on top of already broken software dev team cultures. The problem isn't the slop generator, so much as your process was only handling low volumes of human-generated slop until now.
A pattern I am seeing: "Generate a 10 question quiz on how this PR works to test my understanding." Do not allow merge until someone passes the quiz.
Gusto did this years ago but with humans writing the questions for certain types of PRs.
for the sadists out there
You are probably not using LLMs enough to generate non-code artifacts. "Review this PR and then generate an interactive HTML website to explain it. Turn this state machine into a mermaid diagram. Generate a 10,000 word deep research report on the state of the art of CSRF protection."
LLMs are (not _just_) autocomplete. They will tend to do MORE, tend to katamari. The dangers of shipping more code and never shipping less still apply.
Your prior at this point needs to be "if LLMs are breaking my SDE lifecycle, the problem is the lifecycle or how we are using the LLM, not the capabilities of the model". You can make models generate anything now. The problem is how you move them through the latent space.
The only code review agent I have ever seen be even remotely good is just Codex xhigh. All the review services (and I've seen at least a dozen at this point) suck so bad that I'm not sure how they make any money at all.
When you've got a queue with a very tight SLO, you don't want to scale down too fast. You end up with a "sawtooth" or jagged container count. The "true" demand for containers is the red line here. In situations on <30 sec SLOs, you often want cooldowns in the >1hour range.
I’ve seen a number of home-grown Claude code orchestrators with built-in kanban…
You’ve just been given access to a magical tool that can build anything you can imagine, and… kanban?
Even if not deployed single-node, it's helpful to use Little's Law to understand if you _could_ be.
Avg CPU load = CPU time in seconds per req/job * req-per-sec.
If that's below ~16, you could definitely run single-node. Why aren't you? No wrong answers, but have a good one.
I didn't really dig into it too carefully but pangram does seem to hold up pretty well in studies
Your proposal for RubyKaigi 2026 has been accepted.
not anymore. my license is probably still valid though...
What's your sense: do we get different curve shapes for different evals? Or all the same curve shape, different y-intercept/slope?
So far I think you have to say most evals correlate pretty strongly
I think there's a problem about conceptualizing this as a single line when really what we care about is an x-dimensional space where particular types of human labor are each dimension
"against familiars"
> posts literally the sickest image of familiars ever
I'm starting to turn on the consciousness question myself. The answer to the chinese room is going to be "who fucking cares".
I'm sympathetic to "this reads like trite sci fi" but if the last 3 months of what my normie friends on instagram send me, 95% of the world population is eating slop at the slop trough and they are absolutely gonna fall for this bait
We are 100% going to get a cult around people believing AI to be sentient. They're gonna start buying hardware and plugging it into a moltbook-like cult network. Matter of time.
the Sama position: better to stress test asap
certainly people are having fun with trying to steer this and prompt inject for sure.
don't believe anything you can't verify
In 2026, we're going to see coding models either expand or get rebranded into "tool use" or "computer use" models.
x.com/awnihannun/s... native precision
saw on post on X running it on 2 mac studio m3 ultras at 25 tok/sec, full size
looks like a tiktok
just trying to restate the original underlying study FWIW
Not a joke. It can _implement_ complex refactors just fine when told what to do, but I haven't had one emerge from "Look for a deep refactoring opportunity in this codebase" type prompt.