Isn’t “stopping reward hacking” just another way to say “alignment”? Seems like something I would expect @thezvi.bsky.social to dunk on at every opportunity
Isn’t “stopping reward hacking” just another way to say “alignment”? Seems like something I would expect @thezvi.bsky.social to dunk on at every opportunity
(I have not been following the technical details of frontier research particularly closely; caught up on recent stuff on Zvi’s blog over the long weekend so far. Similarly I have never paid for a premium LLM because I don’t know what I would do with it.)
Ok so on the one hand, I am a very good software engineer and I have not yet figured out how/why I would want to use LLMs. On the other hand, if @thezvi.bsky.social ‘s blog is to be believed, the AGI fire alarm is blaring and humanity is default not going to survive AGI let alone ASI. I am confused.