I find it kind of bizzare that people are saying that like the real AI safety argument wasn't wheather the system could understand you it was getting it to care. Like, maybe these people have always been saying that concrete problems in AI safety w

Thread

I find it kind of bizzare that people are saying that like the real AI safety argument wasn't wheather the system could understand you it was getting it to care.

Like, maybe these people have always been saying that concrete problems in AI safety was a bad paper.

But if you were convinced of the AI safety case by the concrete problems paper it now seems very unlikely that roboGPT wouldn't be able to clean a room without killing a baby.

It's completely reasonable to argue that we still have inner misalignment problems to be worried about, or scaleable oversight problems, but it seems frankly disingenious to say to say LLMs shouldn't update you away from an important class of problems.

arxiv.org/pdf/1606.06565.pdf the concrete problems paper

obvious this doesn't apply if you were always saying that this would never be the problem, but when this post www.deepmind.com/blog/specification-gaming-the-flip-side-of-ai-ingenuity came I out I don't think the reaction of the AIS community was "deepmind clearly doesn't understand the real problem"

Mentions

See All

David Krueger @DavidSKrueger · Apr 18, 2023

Post
From Twitter

This is a good point / thread. Note that @MIRIBerkeley and @ESYudkowsky deserve credit for (purportedly) focusing on inner alignment from the beginning. The first public documentation of this I'm aware of are Alignment Forum posts from 2017, e.g.:

Thread by Nathan Barnard

Thread

Mentions