🧵 to clarify my views on AGI, timelines, and x-risk. TL;DR: My AGI timelines are short-ish (within two decades with things getting weird soon) and I think x-risk is real, but I think probabilities of doom by AGI instrumentally opting to kill us are g

Thread

🧵 to clarify my views on AGI, timelines, and x-risk. TL;DR: My AGI timelines are short-ish (within two decades with things getting weird soon) and I think x-risk is real, but I think probabilities of doom by AGI instrumentally opting to kill us are greatly exaggerated. 1/

Personal background: I have worked on AI safety for years. I have an inside view on AI safety. I have published papers in deep learning on AI safety topics. I have more common cultural context with the median AI safety researcher than the median deep learning researcher. 2/

If you're interested in my work, you can see my papers here: scholar.google.com/citations?hl=en&user=jRKEUjkAAAAJ 3/

On cultural context: I think you could categorize me as being in the "rationalist-adjacent/EA-adjacent" bucket. On the periphery of this social cluster, by no means a core member, not interested in self-identifying as either, but friends with people in the space. 4/

Ingroup nerd cred: I've read HPMOR (I was at the wrap party years ago and have a signed copy of the first 17 chapters). I've read Friendship is Optimal (one of my favorites actually). I've read Worm. I periodically skim the LW and EA forums though I don't post. 5/

I don't think that reading the above stories is an AI safety qualification - what I mean to point out is that I'm not an outsider who looks at this social cluster with disdain. The weird bits of this social cluster are charming, familiar, and personal to me. 6/

I am by no means an enemy or an uncharitable outsider in this space. If anything I am an overly charitable half-insider. I've largely avoided public grumbling about the weirdness in this space because it felt counterproductive for overall AI safety issues. 7/

But being on-side for the importance of AI safety shouldn't mean indefinite support for terribly-calibrated risk estimates and I've gotten a little louder lately. So here is a round-up of some thoughts. 8/

On AGI timelines: two years later this prediction still feels approximately right to me. I'd say we're about a decade in to the cement pouring and have another decade to go. 9/

Based on the trajectory of AI research over the past ten years, it feels like we can now expect models to get about an order of magnitude better at general tasks every two years, but it's unclear where gains run out on specific tasks. 10/

I think there are bottlenecks to superintelligence-level performance in many domains. Bottlenecks doesn't mean "never," or even "takes decades," just "absolutely not in the first week of a bootstrapping event." 11/

Some gut feelings: I think people in AI safety often overestimate how sensitive the long-term future is to perturbations. 12/ jachiam.github.io/agi-safety-views

I think people in AI safety often underestimate the probability that cooperation with humanity will turn out to be instrumental instead of murder. 13/

(To the point where "murder is instrumental" seems egregiously silly. An AGI seeking to control its environment and eliminate the possibility that humans shut it down can simply... make the humans so happy that the humans don't want to kill it and actively want to help it.) 14/

If you take the physics perspective---processes naturally self-regulate so that low energy states are more likely than high energy states---on priors you should assume lotus-eater instrumentality instead of ultramurder instrumentality. 15/

I think people in AI safety often portray x-risk as a binary without considering weirder paths that might be seen by one person as an existential risk and another person as a glorious utopia. 16/

Pure accident risk---e.g. the AGI is so helpful and aligned that it helps us do something really dangerous because we ask for it directly without knowing the danger---feels underrated imho in the AI safety community. 17/

(I anticipate a rebuttal to the above: "If the AGI was so aligned, it would notice we were trying to do something dangerous and give us guidance to prevent us from doing that thing!" Don't forget that if info is compartmentalized it can't notice, no matter how aligned.) 18/

Some numbers that feel roughly right to me: 19/

This thread is mostly a response to reaction to this tweet, because people are somewhat understandably reacting defensively to the implication of a psychological root for certain AI safety claims. 20/

I don't think "worrying about AGI, AGI impacts, catastrophic risk, or x-risk" is specifically the product of anxiety disorders. But I *do* think that hyperinflated probability estimates for x-risk on short timelines (<10 yrs) are clearly being influenced by a culture... 21/

...of wild speculation shaped by anxiety, and where various kinds of pushback are socially punished. If your probability estimates are too low? You're not on-side enough. You think these numbers are crazy? You're dismissive and you're not engaging with the arguments! 22/

People who *could* argue against extremely high P(immediate doom) don't have the time or social standing to write the extended jargony ingroup-style papers the LW/EA crowd seems to demand as the bar for entry to getting taken seriously on this, so they don't engage. 23/

Result: you get a hyperinsulated enclave of high-status AGI doomers making wild claims that go largely unchallenged. Since the claims can't make contact with reality and get disproven for another decade or so, no one can fix this intractable mess of poor reasoning. 24/

And the bad epistemics are extremely bad for safety: outsiders can tell there's a large stack of uncorrected BS in the mix and correspondingly discount everyone in this space, making it harder to negotiate to fix real risks. 25/

I fully recognize that I need to write a more extensive summary of my AGI safety views because there's a lot I'm not covering here---eg I think pretraining will bias AGI towards aligned human values / friendliness, and I probably owe a blog post on deception. 26/

Last thoughts: I am definitely worried about AI/AGI risks such as "the world gets increasingly weird and hard to understand because AI systems run lots of things in ways we can't explain, creating correlated risk" and "x-risk accidents due to tech acceleration." 27/

But we need to put this all on solid ground, with sane probability estimates and timelines, and not let this field of risk management be defined by people who are making the most outlandish / uncalibrated claims. 28/28

Okay, one last thing: for the love of all that is holy please take the FAccT crowd more seriously on social issues, given how much of the AI risk landscape is shaped by "an AI persuades a human to feel or believe something"

Mentions

See All

David Krueger @DavidSKrueger · Nov 13, 2022

Post
From Twitter

This thread by @jachiam0 makes a lot of good points that seem underappreciated by many in AI x-safety. But I have several significant disagreements I want to highlight:

Thread by Joshua Achiam

Thread

Mentions