About two years ago, I wrote that “it’s difficult to know which ‘intermediate goals’ [e.g. policy goals] we could pursue that, if achieved, would clearly increase the odds of eventual good outcomes from transformative AI.” Much has changed since then, and in this post I give an update on 12 ideas for US policy goalsMany of these policy options would plausibly also be good to implement in other jurisdictions, but for most of them the US is a good place to start (the US is plausibly the most important jurisdiction anyway, given the location of leading companies, and many other countries sometimes follow the … Continue reading that I tentatively think would increase the odds of good outcomes from transformative AI.For more on intermediate goals, see Survey on intermediate goals in AI governance.
I think the US generally over-regulates, and that most people underrate the enormous benefits of rapid innovation. However, when 50% of the (survey-responding) experts on a specific technology think there is a reasonable chance it will result in outcomes that are “extremely bad (e.g. human extinction),” I think ambitious and thoughtful regulation is warranted.This paragraph was added on April 18, 2023.
First, some caveats:
- These are my own tentative opinions, not Open Philanthropy’s.Besides my day job at Open Philanthropy, I am also a Board member at Anthropic, though I have no shares in the company and am not compensated by it. Again, these opinions are my own, not Anthropic’s. I might easily change my opinions in response to further analysis or further developments.
- My opinions are premised on a strategic picture similar to the one outlined in my colleague Holden Karnofsky’s Most Important Century and Implications of… posts. In other words, I think transformative AI could bring enormous benefits, but I also take full-blown existential risk from transformative AI as a plausible and urgent concern, and I am more agnostic about this risk’s likelihood, shape, and tractability than e.g. a recent TIME op-ed.
- None of the policy options below have gotten sufficient scrutiny (though they have received far more scrutiny than is presented here), and there are many ways their impact could turn out — upon further analysis or upon implementation — to be net-negative, even if my basic picture of the strategic situation is right.
- To my knowledge, none of these policy ideas have been worked out in enough detail to allow for immediate implementation, but experts have begun to draft the potential details for most of them (not included here). None of these ideas are original to me.
- This post doesn’t explain much of my reasoning for tentatively favoring these policy options. All the options below have complicated mixtures of pros and cons, and many experts oppose (or support) each one. This post isn’t intended to (and shouldn’t) convince anyone. However, in the wake of recent AI advances and discussion, many people have been asking me for these kinds of policy ideas, so I am sharing my opinions here.
- Some of these policy options are more politically tractable than others, but, as I think we’ve seen recently, the political landscape sometimes shifts rapidly and unexpectedly.
Those caveats in hand, below are some of my current personal guesses about US policy options that would reduce existential risk from AI in expectation (in no order).There are many other policy options I have purposely not mentioned here. These include: Hardware export controls. The US has already implemented major export controls on semiconductor manufacturing equipment and high-end chips. These controls have both pros and cons from my perspective, though … Continue reading
- Software export controls. Control the export (to anyone) of “frontier AI models,” i.e. models with highly general capabilities over some threshold, or (more simply) models trained with a compute budget over some threshold (e.g. as much compute as $1 billion can buy today). This will help limit the proliferation of the models which probably pose the greatest risk. Also restrict API access in some ways, as API access can potentially be used to generate an optimized dataset sufficient to train a smaller model to reach performance similar to that of the larger model.
- Require hardware security features on cutting-edge chips. Security features on chips can be leveraged for many useful compute governance purposes, e.g. to verify compliance with export controls and domestic regulations, monitor chip activity without leaking sensitive IP, limit usage (e.g. via interconnect limits), or even intervene in an emergency (e.g. remote shutdown). These functions can be achieved via firmware updates to already-deployed chips, though some features would be more tamper-resistant if implemented on the silicon itself in future chips.
- Track stocks and flows of cutting-edge chips, and license big clusters. Chips over a certain capability threshold (e.g. the one used for the October 2022 export controls) should be tracked, and a license should be required to bring together large masses of them (as required to cost-effectively train frontier models). This would improve government visibility into potentially dangerous clusters of compute. And without this, other aspects of an effective compute governance regime can be rendered moot via the use of undeclared compute.
- Track and require a license to develop frontier AI models. This would improve government visibility into potentially dangerous AI model development, and allow more control over their proliferation. Without this, other policies like the information security requirements below are hard to implement.
- Information security requirements. Require that frontier AI models be subject to extra-stringent information security protections (including cyber, physical, and personnel security), including during model training, to limit unintended proliferation of dangerous models.
- Testing and evaluation requirements. Require that frontier AI models be subject to extra-stringent safety testing and evaluation, including some evaluation by an independent auditor meeting certain criteria.See e.g. p. 15-16 of the GPT-4 system card report for an illustration.
- Fund specific genres of alignment, interpretability, and model evaluation R&D. Note that if the genres are not specified well enough, such funding can effectively widen (rather than shrink) the gap between cutting-edge AI capabilities and available methods for alignment, interpretability, and evaluation. See e.g. here for one possible model.
- Fund defensive information security R&D, again to help limit unintended proliferation of dangerous models. Even the broadest funding strategy would help, but there are many ways to target this funding to the development and deployment pipeline for frontier AI models.
- Create a narrow antitrust safe harbor for AI safety & security collaboration. Frontier-model developers would be more likely to collaborate usefully on AI safety and security work if such collaboration were more clearly allowed under antitrust rules. Careful scoping of the policy would be needed to retain the basic goals of antitrust policy.
- Require certain kinds of AI incident reporting, similar to incident reporting requirements in other industries (e.g. aviation) or to data breach reporting requirements, and similar to some vulnerability disclosure regimes. Many incidents wouldn’t need to be reported publicly, but could be kept confidential within a regulatory body. The goal of this is to allow regulators and perhaps others to track certain kinds of harms and close-calls from AI systems, to keep track of where the dangers are and rapidly evolve mitigation mechanisms.
- Clarify the liability of AI developers for concrete AI harms, especially clear physical or financial harms, including those resulting from negligent security practices. A new framework for AI liability should in particular address the risks from frontier models carrying out actions. The goal of clear liability is to incentivize greater investment in safety, security, etc. by AI developers.
- Create means for rapid shutdown of large compute clusters and training runs. One kind of “off switch” that may be useful in an emergency is a non-networked power cutoff switch for large compute clusters. As far as I know, most datacenters don’t have this.E.g. the lack of an off switch exacerbated the fire that destroyed a datacenter in Strasbourg; see section VI.2.1 – iv of this report. Remote shutdown mechanisms on chips (mentioned above) could also help, though they are vulnerable to interruption by cyberattack. Various additional options could be required for compute clusters and training runs beyond particular thresholds.
Of course, even if one agrees with some of these high-level opinions, I haven’t provided enough detail in this short post for readers to know what, exactly, to advocate for, or how to do it. If you have useful skills, networks, funding, or other resources that you might like to direct toward further developing or advocating for one or more of these policy ideas, please indicate your interest in this short Google Form. (The information you share in this form will be available to me [Luke Muehlhauser] and some other Open Philanthropy employees, but we won’t share your information beyond that without your permission.)
|↑1||Many of these policy options would plausibly also be good to implement in other jurisdictions, but for most of them the US is a good place to start (the US is plausibly the most important jurisdiction anyway, given the location of leading companies, and many other countries sometimes follow the US), and I know much less about politics and policymaking in other countries.|
|↑2||For more on intermediate goals, see Survey on intermediate goals in AI governance.|
|↑3||This paragraph was added on April 18, 2023.|
|↑4||Besides my day job at Open Philanthropy, I am also a Board member at Anthropic, though I have no shares in the company and am not compensated by it. Again, these opinions are my own, not Anthropic’s.|
|↑5||There are many other policy options I have purposely not mentioned here. These include:
|↑6||See e.g. p. 15-16 of the GPT-4 system card report for an illustration.|
|↑7||E.g. the lack of an off switch exacerbated the fire that destroyed a datacenter in Strasbourg; see section VI.2.1 – iv of this report.|