Potential Risks from Advanced Artificial Intelligence

We have updated our thinking on this subject since this page was published. For our most current content on this topic, see this blog post.

This is a writeup of a shallow investigation, a brief look at an area that we use to decide how to prioritize further research.

In a nutshell

  • What is the problem? It seems plausible that some time this century, people will develop algorithmic systems capable of efficiently performing many or even all of the cognitive tasks that humans perform. These advances could lead to extreme positive developments, but could also potentially pose risks from intentional misuse or catastrophic accidents. For example, it seems possible that (i) the technology could be weaponized or used as a tool for social control, or (ii) someone might create an extremely powerful artificial intelligence agent with values misaligned with humanity’s interests. It also seems possible that progress along these directions could be surprisingly rapid, leaving society underprepared for the transition.
  • What are possible interventions? A philanthropist could fund technical research aimed at ensuring the robustness, predictability, and goal-alignment of advanced artificial intelligence systems; research in law, ethics, economics, and policy related to advanced artificial intelligence; and/or education related to such research.
  • Who else is working on this? Elon Musk has donated $10 million to the Future of Life Institute for regranting to researchers focused on addressing these and other potential future risks from advanced artificial intelligence. In addition, a few relatively small nonprofit/academic institutes work on potential future risks from advanced artificial intelligence. The Machine Intelligence Research Institute and the Future of Humanity Institute each have an annual budget of about $1 million, and a couple of other new organizations work on these issues as well.

Published: August 2015

Background and process

We have been engaging in informal discussions around this topic for several years, and have done a significant amount of reading on it. The debates are in some cases complex; this write-up focuses on reporting the primary factors we’ve considered and the strongest sources we know of that bear on our views.

For readers highly interested in this topic, we would recommend the following as particular useful for getting up to speed:

  • Getting a basic sense of what recent progress in AI has looked like, and what it might look like going forward. Unfortunately, we know of no single source for doing this, as our main source has been conversations with AI researchers. However, we have extensive conversation notes forthcoming for one such conversation that can provide some background.
  • Reading Superintelligence by Nick Bostrom, which gives a detailed discussion of some potential risks, with particular attention to a particular kind of accident risk. (Those seeking a shorter and more accessible introduction might prefer two highly informal posts by blogger Tim Urban,1 who largely attempts to summarize Superintelligence as a layperson. We do not necessarily agree with all of the content of his posts, but believe they offer a good introduction to the subject.) We don’t endorse all of the arguments of this book, but it is the most detailed argument for a particular potential risk associated with artificial intelligence, and we believe it would be instructive to review both the book and some of the response to it (see immediately below).
  • Reviewing an Edge.org online discussion responding to Superintelligence.2. We feel that the arguments made in this forum are broadly representative of the arguments we’ve seen against the idea that risks from artificial intelligence are important.

For our part, our understanding of the matter is informed by the following:

When we began our investigation of global catastrophic risks, we believed that this topic was worth looking into due to the high potential stakes and our impression that it was getting little attention from philanthropists. We were already broadly familiar with the arguments that this issue is important, and we initially focused on trying to determine why these arguments hadn’t seemed to get much engagement from mainstream computer scientists. However, we paused our investigations (other than keeping up on major new materials such as Bostrom 2014 and some of the critical response to it) when we learned about an upcoming conference specifically on this topic,5 which we attended. Since then, we have reviewed further relevant materials such as FLI’s open letter and research priorities document.6

What is the problem?


According to many machine learning researchers, there has been substantial progress in machine learning in recent years, and the field could potentially have an enormous impact on the world.7 It appears possible that the coming decades will see substantial progress in artificial intelligence, potentially even to the point where machines come to outperform humans in many or nearly all intellectual domains, though it is difficult or impossible to make confident forecasts in this area. For example, recent surveys of researchers in artificial intelligence found that many researchers assigned a substantial probability to the creation of machine intelligences “that can carry out most human professions at least as well as a typical human” in 10-40 years. Following Muller and Bostrom, who organized the survey, we will refer to such machine intelligences as “high-level machine intelligences” (HLMI).8

More information about timelines for the development of advanced AI capabilities is available here.

Loss of control of advanced agents

In addition to significant benefits, creating advanced artificial intelligence could carry significant dangers. One potential danger that has received particular attention—and has been the subject of particularly detailed arguments—is the one discussed by Prof. Nick Bostrom in his 2014 book Superintelligence. Prof. Bostrom has argued that the transition from high-level machine intelligence to AI much more intelligent than humans could potentially happen very quickly,9 and could result in the creation of an extremely powerful agent whose objectives are misaligned with human interests. This scenario, he argues, could potentially lead to the extinction of humanity.10

Prof. Bostrom has offered the two following highly simplified scenarios illustrating potential risks:11

  • Riemann hypothesis catastrophe. An AI, given the final goal of evaluating the Riemann hypothesis, pursues this goal by transforming the Solar System into “computronium” (physical resources arranged in a way that is optimized for computation)— including the atoms in the bodies of whomever once cared about the answer.
  • Paperclip AI. An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacture of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips.

Stuart Russell (a Professor of Computer Science at UC Berkeley and co-author of a leading textbook on artificial intelligence) has expressed similar concerns.12 While it is unlikely that these specific scenarios would occur, they are illustrative of a general potential failure mode: an advanced agent with a seemingly innocuous, limited goal could seek out a vast quantity of physical resources—including resources crucial for humans—in order to fulfill that goal as effectively as possible.13 To be clear, the risk Bostrom and Russell are describing is not that an extremely intelligent agent would misunderstand what humans would want it to do and then do something else. Instead, the risk is that intensely pursuing the precise (but flawed) goal that the agent is programmed to pursue could pose large risks.

The above argument is difficult to briefly summarize and highly speculative, but we think it highlights plausible scenarios that seem worth considering and preparing for. Some considerations that make this argument seem relatively plausible to us, and/or point to a more general case for seeing AI as a potential source of major global catastrophic risks:

  • Over a relatively short geological timescale, humans have come to have enormous impacts on the biosphere, often leaving the welfare of other species dependent on the objectives and decisions of humans. It seems plausible that the intellectual advantages humans have over other animals have been crucial in allowing humans to build up the scientific and technological capabilities that have made this possible. If advanced artificial intelligence agents become significantly more powerful than humans, it seems possible that they could become the dominant force in the biosphere, leaving humans’ welfare dependent on their objectives and decisions. As with the interaction between humans and other species in the natural environment, these problems could be the result of competition for resources rather than malice.14
  • In comparison with other evolutionary changes, there was relatively little time between our hominid ancestors and the evolution of humans. There was therefore relatively little time for evolutionary pressure to lead to improvements in human intelligence relative to the intelligence of our hominid ancestors, suggesting that the increases in intelligence may be small on some absolute scale. Yet it seems that these increases in intelligence have meant the difference between mammals with a limited impact on the biosphere and a species that has had massive impact. In turn, this makes it seem plausible that creating intelligent agents that are more intelligent than humans could have dramatic real-world consequences even if the difference in intelligence is small in an absolute sense.15
  • Highly capable AI systems may learn from experience and run at a much faster serial processing speed than humans. This could mean that their capabilities change quickly and make them hard to manage with trial-and-error processes. This might pose novel safety challenges in very open-ended domains. Whereas it is possible to establish the safety of a bridge by relying on well-characterized engineering properties in a limited range of circumstances and tasks, it is unclear how to establish the safety of a highly capable AI agent that would operate in a wide variety of circumstances.16
  • When tasks are delegated to opaque autonomous systems—as they were in the 2010 Flash Crash—there can be unanticipated negative consequences. Jacob Steinhardt, a PhD student in computer science at Stanford University and a scientific advisor to the Open Philanthropy Project, suggested that as such systems become increasingly complex in the long term, “humans may lose the ability to meaningfully understand or intervene in such systems, which could lead to a loss of sovereignty if autonomous systems are employed in executive-level functions (e.g. government, economy).”17
  • It seems plausible that advances in artificial intelligence could eventually enable superhuman capabilities in areas like programming, strategic planning, social influence, cybersecurity, research and development, and other knowledge work. These capabilities could potentially allow an advanced artificial intelligence agent to increase its power, develop new technology, outsmart opposition, exploit existing infrastructure, or exert influence over humans.18
  • Concerns regarding the loss of control of advanced artificial intelligence agents were included among many other issues in a research priorities document linked to in the open letter discussed above,19 which was signed by highly-credentialed machine learning researchers, scientists, and technology entrepreneurs.20 Prior to the release of this open letter, potential risks from advanced artificial intelligence received limited attention from the mainstream computer science community, apart from some discussions that we found unconvincing.21 We are uncertain about the extent to which the people who signed this open letter saw themselves as supporting the idea that loss of control of advanced artificial intelligence agents is a problem worth doing research to address. To the extent that they do see themselves as actively supporting more research on this topic, we see that as reason to take the problem more seriously. To the extent that they did not, we feel that signing the letter (without public comments or disclaimers beyond what we’ve seen) indicates a general lack of engagement with this question, which we would take as—in itself—a reason to err on the side of being concerned about and investing in preparation for the risk, as it would imply that some people in a strong position to be carefully examining the issue and communicating their views may be failing to do so.
  • Our understanding is that it is not clearly possible to create an advanced artificial intelligence agent that avoids all challenges of this sort.22 In particular, our impression is that existing machine learning frameworks have made much more progress on the task of acquiring knowledge than on the task of acquiring appropriate goals/values.23

Peace, security, and privacy

It seems plausible to us that highly advanced artificial intelligence systems could potentially be weaponized or used for social control. For example:

  • In the shorter term, machine learning could potentially be used by governments to efficiently analyze vast amounts of data collected through surveillance.24
  • Cyberattacks in particular—especially if combined with the trend toward the “Internet of Things”—could potentially pose military/terrorist risks in the future.25
  • The capabilities described above—such as superhuman capabilities in areas like programming, strategic planning, social influence, cybersecurity, research and development, and other knowledge work—could be powerful tools in the hands of governments or other organizations. For example, an advanced AI system might significantly enhance or even automate the management and strategy of a country’s military operations, with strategic implications different from the possibilities associated with autonomous weapons. If one nation anticipates such advances on the part of another, it could potentially destabilize geopolitics, including nuclear deterrence relationships. Our scientific advisor Dario Amodei suggested to us that this may be one of the most understudied and serious risks of advanced AI, though also potentially among the most challenging to address.

Our understanding is that this class of scenarios has not been a major focus for the organizations that have been most active in this space, such as the Machine Intelligence Research Institute (MIRI) and the Future of Humanity Institute (FHI), and there seems to have been less analysis and debate regarding them, but risks of this kind seem potentially as important as the risks related to loss of control.26

Other potential concerns

There are a number of other possible concerns related to advanced artificial intelligence that we have not examined closely, including social issues such as technological disemployment and the legal and moral standing of advanced artificial intelligence agents. We may investigate these and other possible issues more deeply in the future.

Uncertainty about these risks

We regard many aspects of these potential risks as highly uncertain. For example:

  • It seems highly uncertain when high-level machine intelligence might be developed.
  • Losing control of an advanced agent would seem to require an extremely broad-scope artificial intelligence, considering a wide space of possible actions and reasoning about a wide space of different domains. A “narrower” artificial intelligence might, for example, simply analyze scientific papers and propose further experiments, without having intelligence in other domains such as strategic planning, social influence, cybersecurity, etc. Narrower artificial intelligence might change the world significantly, to the point where the nature of the risks change dramatically from the current picture, before fully general artificial intelligence is ever developed.
  • Losing control of an advanced agent would also seem to require that advanced artificial intelligence will function as an agent: identifying actions, using a world model to estimate their likely consequences, using a scoring system (such as a utility function) to score actions as a function of their likely consequences, and selecting high- or highest-scoring actions. While it seems plausible that such agents will eventually be created, it also seems plausible that the creation of such agents could come after other artificial intelligence tools—which do not rely on an agent-based architecture—have been created. Elsewhere, Holden Karnofsky (co-founder of GiveWell) has argued that creating advanced non-agents before agents is plausible and could substantially change the strategic situation for those preparing for risks from advanced artificial intelligence.27
  • It isn’t a given that superior intelligence, coupled with a problematic goal, would lead to domination of the biosphere. It’s possible (though it seems unlikely to us) that there are limited benefits to having substantially more intelligence than humans, and it’s possible that an artificial intelligence would maximize a problematic utility function primarily via degenerate behavior (e.g., hacking itself and manually setting its reward function to the maximum) rather than behaving in a way that could pose a global catastrophic risk.
  • It seems highly uncertain to us how quickly advanced artificial intelligence will progress from subhuman to superhuman intelligence. For example, it took decades for chess algorithms to progress from being competitive with the top few tens of thousands of players to being better than any human.28

At the same time, these risks seem plausible to us, and we believe the extreme uncertainty about the situation—when combined with plausibility and extremely large potential stakes—favors preparing for potential risks.

We have made fairly extensive attempts to look for people making sophisticated arguments that the risks aren’t worth preparing for (which is distinct from saying that they won’t necessarily materialize), including reaching out to senior computer scientists working in AI-relevant fields (not all notes are public, but we provide the ones that are) and attending a conference specifically on the topic.29 We feel that the Edge.org online discussion responding to Superintelligence30 is broadly representative of the arguments we’ve seen against the idea that risks from artificial intelligence are important, and we find those arguments largely unconvincing. We invite interested readers to review those arguments in light of the reasoning laid out on this page, and draw their own conclusions about whether the former provide strong counter-considerations to the latter. We agree with Stuart Russell’s assessment that many of these critiques do not engage the most compelling arguments (e.g. by discussing scenarios involving conscious AI systems driven by negative emotions instead of scenarios where an advanced AI system causes harm by faithfully pursuing a badly specified objective).31 For a more comprehensive discussion of these and other critiques, see a collection of objections and replies created by Luke Muehlhauser, the former Executive Director of MIRI. Luke is a GiveWell research analyst, but he did not produce this collection as part of his work for us. We agree with much of Luke’s analysis, but we have not closely examined it and do not necessarily agree with all of it.32

What are possible interventions?

Potential research agendas we are aware of

Many prominent33 researchers in machine learning and other fields recently signed an open letter recommending “expanded research aimed at ensuring that increasingly capable AI systems are robust and beneficial,” and listing many possible areas of research for this purpose.34 The Future of Life Institute recently issued a request for proposals on this topic, listing possible research topics including:35

  • Computer Science:
    • Verification: how to prove that a system satisfies certain desired formal properties. (“Did I build the system right?”)
    • Validity: how to ensure that a system that meets its formal requirements does not have unwanted behaviors and consequences. (“Did I build the right system?”)
    • Security: how to prevent intentional manipulation by unauthorized parties.
    • Control: how to enable meaningful human control over an AI system after it begins to operate.
  • Law and ethics:
    • How should the law handle liability for autonomous systems? Must some autonomous systems remain under meaningful human control?
    • Should some categories of autonomous weapons be banned?
    • Machine ethics: How should an autonomous vehicle trade off, say, a small probability of injury to a human against the near-certainty of a large material cost? Should such trade-offs be the subject of national standards?
    • To what extent can/should privacy be safeguarded as AI gets better at interpreting the data obtained from surveillance cameras, phone lines, emails, shopping habits, etc.?
  • Economics:
    • Labor market forecasting
    • Labor market policy
    • How can a low-employment society flourish?
  • Education and outreach:
    • Summer/winter schools on AI and its relation to society, targeted at AI graduate students and postdocs
    • Non-technical mini-schools/symposia on AI targeted at journalists, policymakers, philanthropists and other opinion leaders.

FLI also requested proposals for centers focused an AI policy, which could address questions such as:36

  • What is the space of AI policies worth studying? Possible dimensions include implementation level (global, national, organizational, etc.), strictness (mandatory regulations, industry guidelines, etc.) and type (policies/monitoring focused on software, hardware, projects, individuals, etc.)
  • Which criteria should be used to determine the merits of a policy? Candidates include verifiability of compliance, enforceability, ability to reduce risk, ability to avoid stifling desirable technology development, adoptability, and ability to adapt over time to changing circumstances to prevent intentional manipulation by unauthorized parties.
  • Which policies are best when evaluated against these criteria of merit? Addressing this question (which is anticipated to involve the lion’s share of the proposed work) would include detailed forecasting of how AI development will unfold under different policy options.

This agenda is very broad, and open to multiple possible interpretations.

Research agendas have also been proposed by the Machine Intelligence Research Institute (MIRI) and Stanford One Hundred Year Study on Artificial Intelligence (AI100). MIRI’s research tends to involve more mathematics, formal logic, and formal philosophy than much work in machine learning.37

Some specific research areas highlighted by our scientific advisors Dario Amodei and Jacob Steinhardt include:

  1. Improving the ability of algorithms to learn values, goal systems, and utility functions, rather than requiring them to be hand-coded. Work on inverse reinforcement learning and weakly supervised learning could potentially contribute to this goal.
  2. Improving the calibration of machine learning systems, i.e., their ability to accurately distinguish between predictions that are highly likely to be right vs. predictions that are based on potentially confusing data and could be dramatically wrong.
  3. Making decisions/conclusions made by machine learning systems easier for humans to understand.
  4. Making the performance of machine learning systems more robust to changes in context.
  5. Improving the user interfaces of machine learning systems.

Sustained progress in these areas could potentially reduce risks from unintended consequences—including loss of control—of future artificial intelligence systems.38

Is it possible to make progress in this area today?

It seems hard to know in advance whether work on the problems described here will ultimately reduce risks posed by advanced artificial intelligence. At this point, we feel the case comes down to the following:

  • Currently, work in this field receives very little attention from researchers dedicated to addressing the issues we have described (see “Who else is working on this?”), and very little of this attention has come from researchers with substantial expertise in machine learning.
  • However, as mentioned above, many researchers in machine learning have recommended expanded research in this field. Moreover, FLI has received over 300 grant applications, requesting a total of nearly $100 million for research in this area.39.
  • It’s intuitively plausible to us (and to our main advisors on the topic at this time, Dario Amodei and Jacob Steinhardt) that success on some items on the above research agendas could result in decreased risk.

Because the largest potential risks are probably still at least couple of decades away, a substantial risk of working in this area is that, regardless of what we do today, the most important work will be done by others when the risks become more imminent and comprehensible, making early efforts to prepare for the problem redundant or comparably inefficient. At the same time, it seems possible that some risks could come on a faster timeline. For example, many researchers in Bostrom’s survey described above assigned a 10% subjective probability to the creation of machine intelligences “that can carry out most human professions at least as well as a typical human” within 10 years.40 We have not vetted these judgments and believe it would be challenging to do so. We are highly uncertain about how much weight to put on the specific details of these judgments, but they suggest to us that very powerful artificial intelligence systems could exist relatively soon. Moreover, it may be important to have a mature safety-oriented research effort underway years or longer before advanced artificial intelligence (including advanced narrow artificial intelligence) is created, and nurturing that research effort could be a long-term project. Alternatively, even if advanced AI will not be created for decades, it’s possible that building up and shaping the field could have consequences decades later. So it seems possible that work today could potentially increase overall levels of preparation for advanced artificial intelligence.41

Finally, much of the research relevant to long-term problems may overlap with short-term problems, such as the role of artificial intelligence in surveillance, autonomous weapons systems, and unemployment. Even if work done in this field today does not affect very long-term outcomes with artificial intelligence, it could potentially affect these issues in the shorter term.42

Could supporting this field lead to unwarranted or premature regulation?

Our opinion is that the potential risks and policy options in this field are currently poorly understood, and advocating for regulation would be premature. A potential risk of working in this field is that it could cause unwarranted or premature regulation to occur, which could be counterproductive. While supporting work in this space could potentially have that result, we would guess that working in this field would be more likely to reduce the risk of premature or unwarranted regulation for the following reasons:

  • We would guess that more thoughtful attention to policy options would reduce the risk of unwarranted regulation, and make regulation more likely to occur only if it turns out to be needed.
  • The field may eventually be regulated regardless of whether funders pay additional attention to it, and additional attention to this set of issues could potentially make the regulation more likely to be thoughtful and effective.
  • We would guess that technical research (in contrast with social science, policy, law, and ethics research) on these issues would be particularly unlikely to increase regulation. While such work could potentially draw attention to potential safety issues and thereby make regulation more likely, it seems more plausible that if computer science researchers were perceived to pay greater attention to the relevant potential risks, this would decrease the perceived need for regulation.

Who else is working on this?


In 2015, Elon Musk announced a $10 million donation to support “a global research program aimed at keeping AI beneficial to humanity.” The program is being administered by the Future of Life Institute, a non-profit research institute in Boston led by MIT professor Max Tegmark.43 FLI issued a first call for proposals from researchers at the beginning of the year; we have a forthcoming write-up that further discusses this work. The sort of research they are funding is described above (see “What are the possible interventions?).

Organizations working in this space

A few small non-profit/academic institutes work on risks from artificial intelligence, including:

Organization Mission Revenue or budget figure
Cambridge Center for the Study of Existential Risk “CSER is a multidisciplinary research centre dedicated to the study and mitigation of risks that could lead to human extinction.”44 Not available, new organization
Future of Humanity Institute “The Future of Humanity Institute is a leading research centre looking at big-picture questions for human civilization. The last few centuries have seen tremendous change, and this century might transform the human condition in even more fundamental ways. Using the tools of mathematics, philosophy, and science, we explore the risks and opportunities that will arise from technological change, weigh ethical dilemmas, and evaluate global priorities. Our goal is to clarify the choices that will shape humanity’s long-term future.”45 About $1 million annual budget for 201346
Future of Life Institute “We are a volunteer-run research and outreach organization working to mitigate existential risks facing humanity. We are currently focusing on potential risks from the development of human-level artificial intelligence.”47 Not available, new organization
Machine Intelligence Research Institute “We do foundational mathematical research to ensure smarter-than-human artificial intelligence has a positive impact.”48 $1,237,557 in revenue for 201449
One Hundred Year Study on Artificial Intelligence (AI100) “Stanford University has invited leading thinkers from several institutions to begin a 100-year effort to study and anticipate how the effects of artificial intelligence will ripple through every aspect of how people work, live and play.”50 Not available, new organization

CSER, FHI, and FLI work on existential risks to humanity in general, but all are significantly interested in risks from artificial intelligence.51

Questions for further investigation

Amongst other topics, our further research on this cause might address:

  • Is it possible to get a better sense of how imminent advanced artificial intelligence is likely to be and the specifics of what risks it might pose?
  • What kinds of technical research are most important for reducing the risk of unexpected/undesirable outcomes from progress in artificial intelligence? Who are the best people to do this research?
  • What could be done—especially in terms of policy research or advocacy—to reduce risks from the weaponization/misuse of artificial intelligence?
  • Could a philanthropist help relevant fields develop by supporting PhD, postdoctoral, and/or fellowship programs? What would be the best form for such efforts to take?
  • To what extent could approaches and funding models for other fields—such as international peace and security or nuclear weapons policy—successfully be adapted to the risks posed by artificial intelligence?
  • What is the comparative size of the risk from intentional misuse of artificial intelligence (e.g. through weaponization) vs. loss of control of an advanced artificial intelligence agent with misaligned values?


Document Source
AI 100 About Page Source (archive)
AI 100 Reflections and Framing Source (archive)
Bostrom 2014 Source
CSER about page, 2015 Source (archive)
FHI about page, 2015 Source (archive)
FLI about page, 2015 Source (archive)
FLI blog, AI grant results, 2015 Source (archive)
FLI conference page, 2015 Source (archive)
FLI International Grants Competition, 2015 Source (archive)
FLI Musk donation announcement, 2015 Source (archive)
FLI Open Letter, 2015 Source (archive)
FLI research priorities document, 2015 Source (archive)
FLI survey of research questions, 2015 Source (archive)
GiveWell’s conversations with Jaan Tallinn, 2011 Source (archive)
GiveWell’s non-verbatim summary of a conversation with Jasen Murray and others from the Singularity Institute for Artificial Intelligence, February 2011 Source
GiveWell’s non-verbatim summary of a conversation with Stuart Russell, February 28, 2014 Source
GiveWell’s non-verbatim summary of a conversation with Tom Dietterich, April 28, 2014 Source
GiveWell’s non-verbatim summary of a conversation with Tom Mitchell, February 19, 2014 Source
Grace 2015 Source (archive)
Holden Karnofsky, Thoughts on the Singularity Institute, 2012 Source (archive)
MIRI blog, 2014 in review Source (archive)
MIRI Existential Risk Strategy Conversation with Holden Karnofsky, 2014 Source (archive)
MIRI home page, 2015 Source (archive)
MIRI Research Agenda, 2015 Source (archive)
MIRI strategy conversation with Steinhardt, Karnofsky, and Amodei, 2013 Source (archive)
Muehlhauser objections and replies 2015 Source (archive)
Muller and Bostrom 2014 Source (archive)
Nick Beckstead’s non-verbatim summary of a conversation with Sean O hEigeartaigh, April 24, 2014 Source (archive)
Steinhardt 2015 Source (archive)
The Myth of AI Source (archive)
Wait But Why on AI, Part 1 Source (archive)
Wait But Why on AI, Part 2 Source (archive)
Yudkowsky 2013 Source (archive)
  • 1. Wait But Why on AI, Part 1 and Wait But Why on AI, Part 2
  • 2. The Myth of AI.
  • 3. MIRI strategy conversation with Steinhardt, Karnofsky, and Amodei, 2013.
  • 4. MIRI Existential Risk Strategy Conversation with Holden Karnofsky, 2014.
  • 5.

    FLI conference page, 2015

  • 6.
  • 7. “Artificial intelligence (AI) research has explored a variety of problems and approaches since its inception, but for the last 20 years or so has been focused on the problems surrounding the construction of intelligent agents - systems that perceive and act in some environment. In this context, “intelligence” is related to statistical and economic notions of rationality - colloquially, the ability to make good decisions, plans, or inferences. The adoption of probabilistic and decision-theoretic representations and statistical learning methods has led to a large degree of integration and cross-fertilization among AI, machine learning, statistics, control theory, neuroscience, and other fields. The establishment of shared theoretical frameworks, combined with the availability of data and processing power, has yielded remarkable successes in various component tasks such as speech recognition, image classification, autonomous vehicles, machine translation, legged locomotion, and question-answering systems.
    As capabilities in these areas and others cross the threshold from laboratory research to economically valuable technologies, a virtuous cycle takes hold whereby even small improvements in performance are worth large sums of money, prompting greater investments in research. There is now a broad consensus that AI research is progressing steadily, and that its impact on society is likely to increase. The potential benefits are huge, since everything that civilization has to offer is a product of human intelligence; we cannot predict what we might achieve when this intelligence is magnified by the tools AI may provide, but the eradication of disease and poverty are not unfathomable. Because of the great potential of AI, it is important to research how to reap its benefits while avoiding potential pitfalls.” FLI Open Letter, 2015.
  • 8. “We put this definition in the preamble of the questionnaire: “Define a ‘high–level machine intelligence’ (HLMI) as one that can carry out most human professions at least as well as a typical human.” Muller and Bostrom 2014, see the table on pgs 9-10, under section 3.2.
  • 9. Bostrom argues for this claim in chapter 4 of Superintelligence, summarizing his reasoning at the end of the chapter with the following paragraph:
    “In particular, it is unclear how difficult it would be to improve the software quality of a human-level emulation or AI. The difficulty of expanding the hardware power available to a system is also not clear. Whereas today it would be relatively easy to increase the computing power available to a small project by spending a thousand times more computing power or by waiting a few years for the price of computers to fall, it is possible that the first machine intelligence to reach the human baseline will result from a large project involving pricey supercomputers, which cannot be cheaply scaled, and that Moore’s law will by then have expired. For those reasons, although a fast or medium takeoff looks more likely, the possibility of a slow takeoff cannot be excluded.” Bostrom 2014, pg 77.
  • 10. “First, we discussed how the initial superintelligence might obtain a decisive strategic advantage. This superintelligence would then be in a position to form a singleton and to shape the future of Earth-originating intelligent life. What happens from that point onward would depend on the superintelligence’s motivations.
    Second, the orthogonality thesis suggests that we cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans— scientific curiosity, benevolent concern for others, spiritual enlightenment and contemplation, renunciation of material acquisitiveness, a taste for refined culture or for the simple pleasures in life, humility and selflessness, and so forth. We will consider later whether it might be possible through deliberate effort to construct a superintelligence that values such things, or to build one that values human welfare, moral goodness, or any other complex purpose its designers might want it to serve. But it is no less possible— and in fact technically a lot easier— to build a superintelligence that places final value on nothing but calculating the decimal expansion of pi. This suggests that— absent a special effort— the first superintelligence may have some such random or reductionistic final goal.
    Third, the instrumental convergence thesis entails that we cannot blithely assume that a superintelligence with the final goal of calculating the decimals of pi (or making paperclips, or counting grains of sand) would limit its activities in such a way as not to infringe on human interests. An agent with such a final goal would have a convergent instrumental reason, in many situations, to acquire an unlimited amount of physical resources and, if possible, to eliminate potential threats to itself and its goal system. Human beings might constitute potential threats; they certainly constitute physical resources.
    Taken together, these three points thus indicate that the first superintelligence may shape the future of Earth-originating life, could easily have non-anthropomorphic final goals, and would likely have instrumental reasons to pursue open-ended resource acquisition. If we now reflect that human beings consist of useful resources (such as conveniently located atoms) and that we depend for our survival and flourishing on many more local resources, we can see that the outcome could easily be one in which humanity quickly becomes extinct.” Bostrom 2014, pgs 115-116.
  • 11. Bostrom 2014, pg 123.
  • 12. “It is difficult to imagine how AI creators might be able to instill an AI with the ability to weigh options in an acceptable way. It is unclear how to encode a utility function that AI could use to determine the best outcome without the risk that it might make unacceptable decisions. Attempting to bypass the need to actually write the function by ‘teaching’ the machine through reinforcement learning may also be problematic. In order to teach through reinforcement, one must define a condition under which the reward signal (the indication that the system has done something good) is generated. The so-called “paperclip argument” described in Nick Bostrom’s work applies to both approaches.” GiveWell’s non-verbatim summary of a conversation with Stuart Russell, February 28, 2014.
  • 13. “This case of a wireheading AI exemplifies the malignant failure mode of infrastructure profusion, a phenomenon where an agent transforms large parts of the reachable universe into infrastructure in the service of some goal, with the side effect of preventing the realization of humanity’s axiological potential. Infrastructure profusion can result from final goals that would have been perfectly innocuous if they had been pursued as limited objectives.” Bostrom 2014, pg. 123.
  • 14. These comments are impressions we have, and we have not documented the factual claims. These arguments were first made (to our knowledge) by Eliezer Yudkowsky and Nick Bostrom. For a detailed discussion of these and related arguments, see Yudkowsky 2013.
  • 15. These comments are impressions we have, and we have not documented the factual claims. These arguments were first made (to our knowledge) by Eliezer Yudkowsky and Nick Bostrom. For a detailed discussion of these and related arguments, see Yudkowsky 2013.
  • 16. Our impression on this front is partly based on the following, by Jacob Steinhardt (a computer science PhD student and advisor to us):

    “Let me end with some specific ways in which control of AI may be particularly difficult compared to other human-engineered systems:
    1. AI may be “agent-like”, which means that the space of possible behaviors is much larger; our intuitions about how AI will act in pursuit of a given goal may not account for this and so AI behavior could be hard to predict.
    2. Since an AI would presumably learn from experience, and will likely run at a much faster serial processing speed than humans, its capabilities may change rapidly, ruling out the usual process of trial-and-error.
    3. AI will act in a much more open-ended domain. In contrast, our existing tools for specifying the necessary properties of a system only work well in narrow domains. For instance, for a bridge, safety relates to the ability to successfully accomplish a small number of tasks (e.g. not falling over). For these, it suffices to consider well-characterized engineering properties such as tensile strength. For AI, the number of tasks we would potentially want it to perform is large, and it is unclear how to obtain a small number of well-characterized properties that would ensure safety.” Steinhardt 2015.

  • 17.Opaque systems. It is also already the case that increasingly many tasks are being delegated to autonomous systems, from trades in financial markets to aggregation of information feeds. The opacity of these systems has led to issues such as the 2010 Flash Crash and will likely lead to larger issues in the future. In the long term, as AI systems become increasingly complex, humans may lose the ability to meaningfully understand or intervene in such systems, which could lead to a loss of sovereignty if autonomous systems are employed in executive-level functions (e.g. government, economy).” Steinhardt 2015.
  • 18. Bostrom 2014 argues along these lines at pg 94, Table 8.
  • 19. “Finally, research on the possibility of superintelligent machines or rapid, sustained self-improvement (‘intelligence explosion’) has been highlighted by past and current projects on the future of AI as potentially valuable to the project of maintaining reliable control in the long term. The AAAI 2008–09 Presidential Panel on Long-Term AI Futures’ ‘Subgroup on Pace, Concerns, and Control’ stated that

      There was overall skepticism about the prospect of an intelligence explosion… Nevertheless, there was a shared sense that additional research would be valuable on methods for understanding and verifying the range of behaviors of complex computational systems to minimize unexpected outcomes. Some panelists recommended that more research needs to be done to better define “intelligence explosion,” and also to better formulate different classes of such accelerating intelligences. Technical work would likely lead to enhanced understanding of the likelihood of such phenomena, and the nature, risks, and overall outcomes associated with different conceived variants [43].

    Stanford’s One-Hundred Year Study of Artificial Intelligence includes ‘Loss of Control of AI systems’ as an area of study, specifically highlighting concerns over the possibility that

      …we could one day lose control of AI systems via the rise of superintelligences that do not act in accordance with human wishes – and that such powerful systems would threaten humanity. Are such dystopic outcomes possible? If so, how might these situations arise? …What kind of investments in research should be made to better understand and to address the possibility of the rise of a dangerous superintelligence or the occurrence of an “intelligence explosion”? [42]

    Research in this area could include any of the long-term research priorities listed above, as well as theoretical and forecasting work on intelligence explosion and superintelligence [16, 9], and could extend or critique existing approaches begun by groups such as the Machine Intelligence Research Institute [76].” FLI research priorities document, 2015, pg 8.

  • 20. Our judgment of their credentials was based on conversation with Open Philanthropy Project scientific advisor Dario Amodei and our familiarity with this subject. The signatories included:
    “Stuart Russell, Berkeley, Professor of Computer Science, director of the Center for Intelligent Systems, and co-author of the standard textbook Artificial Intelligence: a Modern Approach.
    Tom Dietterich, Oregon State, President of AAAI, Professor and Director of Intelligent Systems
    Eric Horvitz, Microsoft research director, ex AAAI president, co-chair of the AAAI presidential panel on long-term AI futures
    Bart Selman, Cornell, Professor of Computer Science, co-chair of the AAAI presidential panel on long-term AI futures
    Francesca Rossi, Padova & Harvard, Professor of Computer Science, IJCAI President and Co-chair of AAAI committee on impact of AI and Ethical Issues
    Demis Hassabis, co-founder of DeepMind
    Shane Legg, co-founder of DeepMind
    Mustafa Suleyman, co-founder of DeepMind
    Dileep George, co-founder of Vicarious
    Scott Phoenix, co-founder of Vicarious
    Yann LeCun, head of Facebook’s Artificial Intelligence Laboratory
    Geoffrey Hinton, University of Toronto and Google Inc.
    Yoshua Bengio, Université de Montréal
    Peter Norvig, Director of research at Google and co-author of the standard textbook Artificial Intelligence: a Modern Approach
    Oren Etzioni, CEO of Allen Inst. for AI
    Guruduth Banavar, VP, Cognitive Computing, IBM Research
    Michael Wooldridge, Oxford, Head of Dept. of Computer Science, Chair of European Coordinating Committee for Artificial Intelligence
    Leslie Pack Kaelbling, MIT, Professor of Computer Science and Engineering, founder of the Journal of Machine Learning Research
    Tom Mitchell, CMU, former President of AAAI, chair of Machine Learning Department
    Toby Walsh, Univ. of New South Wales & NICTA, Professor of AI and President of the AI Access Foundation
    Murray Shanahan, Imperial College, Professor of Cognitive Robotics
    Michael Osborne, Oxford, Associate Professor of Machine Learning
    David Parkes, Harvard, Professor of Computer Science
    Laurent Orseau, Google DeepMind
    Ilya Sutskever, Google, AI researcher
    Blaise Aguera y Arcas, Google, AI researcher
    Joscha Bach, MIT, AI researcher
    Bill Hibbard, Madison, AI researcher
    Steve Omohundro, AI researcher
    Ben Goertzel, OpenCog Foundation
    Richard Mallah, Cambridge Semantics, Director of Advanced Analytics, AI researcher
    Alexander Wissner-Gross, Harvard, Fellow at the Institute for Applied Computational Science
    Adrian Weller, Cambridge, AI researcher
    Jacob Steinhardt, Stanford, AI Ph.D. student
    Nick Hay, Berkeley, AI Ph.D. student
    Jaan Tallinn, co-founder of Skype, CSER and FLI
    Elon Musk, SpaceX, Tesla Motors
    Steve Wozniak, co-founder of Apple
    Luke Nosek, Founders Fund
    Aaron VanDevender, Founders Fund
    Erik Brynjolfsson, MIT, Professor at and director of MIT Initiative on the Digital Economy
    Margaret Boden, U. Sussex, Professor of Cognitive Science
    Martin Rees, Cambridge, Professor Emeritus of Cosmology and Astrophysics, Gruber & Crafoord laureate
    Huw Price, Cambridge, Bertrand Russell Professor of Philosophy
    Nick Bostrom, Oxford, Professor of Philosophy, Director of Future of Humanity Institute (Oxford Martin School)
    Stephen Hawking, Director of research at the Department of Applied Mathematics and Theoretical Physics at Cambridge, 2012 Fundamental Physics Prize laureate for his work on quantum gravity.” FLI Open Letter, 2015.
  • 21. See, for example, The Myth of AI, which we see as representative of much of the discussion.
  • 22. Our reasoning behind this judgment cannot be easily summarized, and is based on reading about the problem and having many informal conversations. Bostrom’s Superintelligence discusses many possible strategies for solving this problem, but identifies substantial potential challenges for essentially all of them, and the interested reader could read the book for more evidence on this point.
  • 23. Our reasoning behind this judgment cannot be easily summarized, and is based on reading about the problem and having many informal conversations. Our scientific advisor Jacob Steinhardt, a PhD student in machine learning, has made a similar argument. “Existing machine learning frameworks make it very easy for AI to acquire knowledge, but hard to acquire values. For instance, while an AI’s model of reality is flexibly learned from data, its goal/utility function is hard-coded in almost all situations; an exception is some work on inverse reinforcement learning [5], but this is still a very nascent framework. Importantly, the asymmetry between knowledge (and hence capabilities) and values is fundamental, rather than simply a statement about existing technologies. This is because knowledge is something that is regularly informed by reality, whereas values are only weakly informed by reality: an AI which learns incorrect facts could notice that it makes wrong predictions, but the world might never “tell” an AI that it learned the “wrong values”. At a technical level, while many tasks in machine learning are fully supervised or at least semi-supervised, value acquisition is a weakly supervised task.” Steinhardt 2015.
  • 24. Based on conversation with Open Philanthropy Project scientific advisor Dario Amodei. Dario reviewed this section and approved this paragraph.
  • 25. “Cyber-attacks. There are two trends which taken together make the prospect of AI-aided cyber-attacks seem worrisome. The first trend is simply the increasing prevalence of cyber-attacks; even this year we have seen Russia attack Ukraine, North Korea attack Sony, and China attack the U.S. Office of Personnel Management. Secondly, the “Internet of Things” means that an increasing number of physical devices will be connected to the internet. Assuming that software exists to autonomously control them, many internet-enabled devices such as cars could be hacked and then weaponized, leading to a decisive military advantage in a short span of time. Such an attack could be enacted by a small group of humans aided by AI technologies, which would make it hard to detect in advance. Unlike other weaponizable technology such as nuclear fission or synthetic biology, it would be very difficult to control the distribution of AI since it does not rely on any specific raw materials. Finally, note that even a team with relatively small computing resources could potentially “bootstrap” to much more computing power by first creating a botnet with which to do computations; to date, the largest botnet has spanned 30 million computers and several other botnets have exceeded 1 million.” Steinhardt 2015.
  • 26. Based on conversation with Open Philanthropy Project scientific advisor Dario Amodei. Dario reviewed this section and approved this paragraph.
  • 27. Holden Karnofsky, Thoughts on the Singularity Institute, 2012, see “Objection 2.”
  • 28. Grace 2015, Figure 1.
  • 29. FLI conference page, 2015.
  • 30. The Myth of AI.
  • 31. “None of this proves that AI, or gray goo, or strangelets, will be the end of the world. But there is no need for a proof, just a convincing argument pointing to a more-than-infinitesimal possibility. There have been many unconvincing arguments – especially those involving blunt applications of Moore’s law or the spontaneous emergence of consciousness and evil intent. Many of the contributors to this conversation seem to be responding to those arguments and ignoring the more substantial arguments proposed by Omohundro, Bostrom, and others.

    The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

    1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

    2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

    A system that is optimizing a function of n variables, where the objective depends on a subset of size k < n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world’s information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.” The Myth of AI.

  • 32. Muehlhauser objections and replies 2015.
  • 33.

    See earlier footnote.

  • 34. “We recommend expanded research aimed at ensuring that increasingly capable AI systems are robust and beneficial: our AI systems must do what we want them to do. The attached research priorities document gives many examples of such research directions that can help maximize the societal benefit of AI. This research is by necessity interdisciplinary, because it involves both society and AI. It ranges from economics, law and philosophy to computer security, formal methods and, of course, various branches of AI itself.” FLI Open Letter, 2015.
  • 35. FLI International Grants Competition, 2015.
  • 36. FLI International Grants Competition, 2015.
  • 37. This statement is partly based on informal conversations. For a partial illustration of the point, see MIRI Research Agenda, 2015, pg 1, table of contents.
  • 38. Our inclusion of the items in this list are partly based on conversations with Open Philanthropy Project scientific advisors Dario Amodei and Jacob Steinhardt, as well as Steinhardt 2015. Specifically, our inclusion of points 3-5 are based on all three sources, our inclusion of point 1 is primarily based on Steinhardt 2015, and our inclusion of point 2 is primarily based on conversations with Dario Amodei. They reviewed this section and approved this paragraph.

    “Above I presented an argument for why AI, in the long term, may require substantial precautionary efforts. Beyond this, I also believe that there is important research that can be done right now to reduce long-term AI risks. In this section I will elaborate on some specific research projects, though my list is not meant to be exhaustive.

    1. Value learning: In general, it seems important in the long term (and also in the short term) to design algorithms for learning values / goal systems / utility functions, rather than requiring them to be hand-coded. One framework for this is inverse reinforcement learning [5], though developing additional frameworks would also be useful.
    2. Weakly supervised learning: As argued above, inferring values, in contrast to beliefs, is an at most weakly supervised problem, since humans themselves are often incorrect about what they value and so any attempt to provide fully annotated training data about values would likely contain systematic errors. It may be possible to infer values indirectly through observing human actions; however, since humans often act immorally and human values change over time, current human actions are not consistent with our ideal long-term values, and so learning from actions in a naive way could lead to problems. Therefore, a better fundamental understanding of weakly supervised learning — particularly regarding guaranteed recovery of indirectly observed parameters under well-understood assumptions — seems important.
    3. Formal specification / verification: One way to make AI safer would be to formally specify desiderata for its behavior, and then prove that these desiderata are met. A major open challenge is to figure out how to meaningfully specify formal properties for an AI system. For instance, even if a speech transcription system did a near-perfect job of transcribing speech, it is unclear what sort of specification language one might use to state this property formally. Beyond this, though there is much existing work in formal verification, it is still extremely challenging to verify large systems.
    4. Transparency: To the extent that the decision-making process of an AI is transparent, it should be relatively easy to ensure that its impact will be positive. To the extent that the decision-making process is opaque, it should be relatively difficult to do so. Unfortunately, transparency seems difficult to obtain, especially for AIs that reach decisions through complex series of serial computations. Therefore, better techniques for rendering AI reasoning transparent seem important.
    5. Strategic assessment and planning: Better understanding of the likely impacts of AI will allow a better response. To this end, it seems valuable to map out and study specific concrete risks; for instance, better understanding ways in which machine learning could be used in cyber-attacks, or forecasting the likely effects of technology-driven unemployment, and determining useful policies around these effects. It would also be clearly useful to identify additional plausible risks beyond those of which we are currently aware. Finally, thought experiments surrounding different possible behaviors of advanced AI would help inform intuitions and point to specific technical problems. Some of these tasks are most effectively carried out by AI researchers, while others should be done in collaboration with economists, policy experts, security experts, etc.

    The above constitute at least five concrete directions of research on which I think important progress can be made today, which would meaningfully improve the safety of advanced AI systems and which in many cases would likely have ancillary benefits in the short term, as well.” Steinhardt 2015.

  • 39. “We were quite curious to see how many applications we’d get for our Elon-funded grants program on keeping AI beneficial, given the short notice and unusual topic. I’m delighted to report that the response was overwhelming: about 300 applications for a total of about $100M, including a great diversity of awesome teams and projects from around the world.” FLI blog, AI grant results, 2015.
  • 40. Muller and Bostrom 2014, see the table on pgs 9-10, under section 3.2.
  • 41. Based on conversation with Open Philanthropy Project scientific advisor Dario Amodei.
  • 42. Based on conversation with Open Philanthropy Project scientific advisors Dario Amodei and Jacob Steinhardt. They reviewed this section and approved this paragraph.

    This overlap is also visible in the FLI survey of research questions, 2015, whose table of contents (pg 3) lists many overlapping topics in its short-term and long-term research priorities.

  • 43.
    • “We are delighted to report that technology inventor Elon Musk, creator of Tesla and SpaceX, has decided to donate $10M to the Future of Life Institute to run a global research program aimed at keeping AI beneficial to humanity.”
    • “The $10M program will be administered by the Future of Life Institute, a non-profit organization whose scientific advisory board includes AI-researchers Stuart Russell and Francesca Rossi. “I love technology, because it’s what’s made 2015 better than the stone age”, says MIT professor and FLI president Max Tegmark. “Our organization studies how we can maximize the benefits of future technologies while avoiding potential pitfalls.” FLI Musk donation announcement, 2015.
  • 44. CSER about page, 2015.
  • 45. FHI about page, 2015.
  • 46. “In the last year [2013], FHI had an annual budget of about £700,000.” Nick Beckstead’s non-verbatim summary of a conversation with Sean O hEigeartaigh, April 24, 2014.
  • 47. FLI about page, 2015.
  • 48. MIRI home page, 2015.
  • 49. “Fundraising: We raised $1,237,557 in contributions in 2014. This is slightly less than we raised in 2013, but only because 2013’s numbers include a one-time, outlier donation of ~$525,000.” MIRI blog, 2014 in review.
  • 50. AI 100 About Page.
  • 51. Based on material from informal conversations that is not documented in public notes.