We recommended over $400 million worth of grants in 2021. The bulk of this came from recommendations to support GiveWell’s top charities and from our major current focus areas. [More]
We launched several new program areas — South Asian air quality, global aid policy, and effective altruism community building with a focus on global health and wellbeing — and have either hired or are currently hiring program officers to lead each of those areas. [More]
We revisited the case for our US policy causes — spinning out our criminal justice reform program as an independent organization, making exit grants in US macroeconomic stabilization policy, and updating our approaches to land use reform and immigration policy. [More]
We shared our latest framework for evaluating global health and wellbeing interventions, as well as several reports on key topics in potential risk from advanced AI. [More]
In biosecurity and pandemic preparedness, major grants included the MIT Media Lab, as well as Californians Against Pandemics to support a ballot initiative that would fund research and development on pathogen genomics. We also provided scholarship support to a number of early-career people pursuing work and study related to global catastrophic biological risks.
We aim to roughly double the amount we recommend this year relative to last year, and triple it by 2025.
New program areas
We launched two new program areas this year: South Asian Air Quality and Global Aid Policy. We’re thrilled to have hired Santosh Harish and Norma Altshuler to lead these programs, and we look forward to sharing some of the grants they’re making in next year’s annual review.
This year, our global health and wellbeing cause prioritization team aims to launch three more new program areas where we can find scalable opportunities above our bar, and to continue laying the groundwork for more growth in future years.
Revisiting our older US policy causes
We made our initial selection of US policy causes in 2014 and 2015. We chose criminal justice reform, macroeconomic stabilization policy, immigration policy, and land use reform in order to try to get experience across a variety of causes that stood out on different criteria (immigration on importance, CJR on tractability, land use reform and macro on neglectedness).
We’ve learned a lot from our experience funding in these fields, but over time have updated towards a more unified ROI framework that lets us more explicitly compare across causes. (Also, the world has changed a lot over the last 7 years.) We gave an initial update on our revised thinking back in 2019, and we are still evaluating our performance and plans for the future as of 2022.
On our new website, we are no longer referring to these causes as full “focus areas” because we do not have full-time staff leading any of them. But our particular plans for the future vary across the four causes:
After the rapid recovery of the U.S. from the most recent recession, we decided to wind down our giving to U.S. grantees in macroeconomic policy. (We made some exit grants, as we often do when we decide not to renew support to organizations we’ve supported for a long time.) We currently expect to continue to support regranting on this issue within Europe via Dezernat Zukunft, but will revisit depending on how economic and policy conditions evolve. We hope to write more about our thinking on and lessons learned in this area in the future.
On land use reform: we recently completed an updated review on the performance of our grantees and the valuation of a marginal housing unit in key supply-constrained regions, which made us think that our returns to date have been well above our bar and that there is room for expansion. We’ve commissioned an outside review of our analysis; pending the results of that review, we’re considering hiring someone to lead a bigger portfolio in this space.
On immigration policy:
We have never had a clear theory of how to change the political economy to be supportive of substantially larger immigration flows, which is what would be necessary to achieve the global poverty improvements that motivate our interest in this issue. Accordingly, our recent spending has been lower than in macro or land use reform.
Over the last few years, the US political climate for immigration reform has come to look even less promising than when we initially explored this space.
We’re currently planning to continue supporting Michael Clemens’ work (which is what motivated our interest in this cause), make occasional opportunistic grants that fit with our overall ROI framework, and explore whether we should have a program around STEM immigration. But we are not planning to do more on US immigration policy per se.
New approaches to funding
This year, we created a number of new programs to openly solicit funding requests from individuals, groups, and organizations. This represents a different approach from the proactive searching and networking we use to find most of our grants, and we are excited by the potential for these programs to unearth strong opportunities we wouldn’t have found otherwise.
The largest such program is our Regranting Challenge, which will allocate $150 million in funding to the grantmaking budgets for one to five outstanding programs at other foundations. That program is closed to new submissions, but we’ve listed many programs that are open to submissions on our “How to Apply for Funding” page.
Groups eligible for at least one open program include:
High school students who hope to start undergraduate degrees at a top university in the US/UK, who are not domestic students at those universities, and who are interested in using their careers to do as much good as possible. (Undergraduate scholarships)
Our worldview investigations team is now working on:
More thoroughly assessing and writing up what sorts of risks transformative AI might pose and what that means for today’s priorities.
Updating our internal views of certain key values, such as the estimated economic value of a disability-adjusted life year (DALY) and the possible spillover benefits from economic growth, that inform what we have thus far referred to as our “near-termist” cause prioritization work.
In June, Alexander was promoted to co-CEO alongside Holden. Alexander currently oversees the grantmaking in our Global Health and Wellbeing portfolio, while Holden oversees the grantmaking in our Longtermism portfolio.
These portfolios represent a new way of describing our work:
“Global Health and Wellbeing” (GHW) describes our work to increase health and/or wellbeing worldwide
“Longtermism” describes our work to raise the probability of a very long-lasting and positive future.
This represents massive growth compared to past years, which is an exciting opportunity and an immense challenge. If you’d like to help us achieve our hiring goals — and thus, all of our other goals — we are hiring recruiters and senior recruiters!
The growth also means that if you want to work at Open Phil, you’ll have more chances to apply this year than ever before. We will continue to post open positions on our Working at Open Phil page, and we encourage you to check it out! If you don’t see something you want to apply for, you can fill out our General Application, and we’ll reach out if we post a position we think might be a good fit.
Finally, we’re always looking for referrals. If you refer someone and we hire them, we’ll pay you $5,000!
Open Philanthropy is seeking proposals from those interested in contributing to a research project informing and estimating biosecurity-relevant numbers and ‘base rates’. We welcome proposals from both research organizations and individuals (at any career stage, including undergraduate and postgraduate students). The work can be structured via contract or grant.
The application deadline is June 5th.
How likely is a biological catastrophe? Do the biggest risks come from states or terrorists? Accidents or intentional attacks?
These parameters are directly decision-relevant for Open Philanthropy’s biosecurity and pandemic preparedness strategy. They determine the degree to which we prioritize biosecurity compared to other causes, and inform how we prioritize different biosecurity interventions (e.g. do we focus on lab safety to reduce accidents, or push for better DNA synthesis screening to impede terrorists?).
One way of estimating biological risk that we do not recommend is ‘threat assessment’—investigating various ways that one could cause a biological catastrophe. This approach may be valuable in certain situations, but the information hazards involved make it inherently risky. In our view, the harms outweigh the benefits in most cases.
A second, less risky approach is to abstract away most biological details and instead consider general ‘base rates’. The aim is to estimate the likelihood of a biological attack or accident using historical data and base rates of analogous scenarios, and of risk factors such as warfare or terrorism. A few examples include:
Estimating the rate at which states or terrorist groups have historically sought biological, chemical, nuclear, or radiological weapons.
Forecasting the risk of great power war over the next 50 years (combining historical data with current geopolitical trends).
Estimating the rate at which lab leaks have occurred in state programs.
Enumerating possible ‘phase transitions’ that would cause a radical departure from relevant historical base rates, e.g. total collapse of the taboo on biological weapons, such that they become a normal part of military doctrine.
This information allows us to better estimate the probability of biological catastrophe in a variety of scenarios, with some hypothetical concrete examples described in the next section.
The biosecurity team at Open Philanthropy (primarily Damon) is currently working on developing such models. However, given the broad scope of the work, we would be keen to see additional research on this question. While we are interested in independent models that attempt to estimate the overall risk of a global biological catastrophe (GCBR), we are particularly keen on projects that simply collect huge amounts of relevant historical data, or thoroughly explore one or more sub-questions.
One aspect of this approach we find particularly exciting is that it is threat-agnostic and thus relevant across a wide range of scenarios, foreseen and unforeseen. It also helps us to think about the misuse of other dangerous transformative technologies, such as atomically precise manufacturing or AI.
We are therefore calling for applications from anyone who would be interested in spending up to the next four months (or possibly longer) helping us better understand these aspects of biorisk. We could imagine successful proposals from many different backgrounds—for example, a history undergraduate looking for a summer research project, a postdoc or superforecaster looking for part-time work, or a quantitative research organization with an entire full-time team.
What is our goal?
We are interested in supporting work that will help us better quantitatively estimate the risk of a GCBR without creating information hazards. To do this, we can imagine treating the biological details of a threat as a ‘black box’ and instead quantifying risk within hypothetical scenarios like the following:
Scenario 1 (states or big terrorist groups): In January 2025, scientists publish a paper that inadvertently provides the outlines for a biological agent that, if created and released, would kill hundreds of millions of people. Creating the pathogen would require a team of 10 PhD-level biologists working full time for 2 years, and a budget of $10 million.
Scenario 2 (small terrorist groups or lone wolf): Same as Scenario 1, but creating the pathogen requires a single PhD-level biologist working full time for 2 years, and a budget of $1 million.
For each of these scenarios, what is the ‘annual rate’ at which the biological agent is created and released (either accidentally or intentionally)?
We are interested in research proposals that aim to estimate this rate or estimate rates in similar scenarios. Proposals don’t need to do this directly, but could instead aim to quantitatively understand ‘upstream’ aspects of the world that could affect the risk of catastrophe.
The scenarios serve as a concrete litmus test for the kinds of proposals we are interested in—if the proposal wouldn’t directly help a researcher estimate the annual risk rate in these toy scenarios, it is unlikely to be of interest to us. In particular, purely qualitative approaches, such as trying to understand specific case studies in detail, or trying to understand the ideological drivers of terrorism, are unlikely to be a good fit. Similarly, very theoretical approaches, such as those with empirical unverifiable parameters or unfalsifiable elements, are also unlikely to be successful.
What a successful proposal might look like
An ideal proposal could propose things like one or more of the following:
Create a database of (a well-scoped class of) terrorist attacks, particularly those that required a significant degree of planning, expertise, and/or resources
Forecast the number of wars over the next few decades
Estimate the likelihood that WMDs are used in those wars
Estimate the likelihood that biological weapons specifically are used in those wars
Create a database of (a well-scoped class of) dangerous actors:
This might include terrorist groups, cults, paramilitary groups, or rebel factions
The database would include, for each group, estimates of their size, budget, ideological commitments, and other similar such information
Doing ‘all’ of them could be overly ambitious and it may make more sense to more narrowly scope—a complete database for a limited time period and geographic region (for instance) may be more useful than an incomplete one with greater scope
Estimate the amount of resources, such as money, person-hours, equipment, and expertise, that have historically been used by state bioweapon programs. Could include further breakdowns based on:
Purpose of the spending (e.g. offensive vs defensive, strategic vs tactical vs assassination)
Technical focus of the work (e.g. on pathogens themselves vs on delivery systems)
Nature of the pathogens (e.g. contagious vs non-contagious, targeting humans vs agriculture).
Estimate the future fraction of resources spent in bioweapons programs devoted to contagious vs non-contagious weapons.
Perhaps one could analyze cyberweapon development, comparing the fraction of targeted weapons to those that are designed to create widespread economic havoc.
Estimating the fraction of military resources that get spent on ‘absolutely insane stuff’, e.g. mind control, slowing down the Earth’s rotation, or weapons that could be catastrophic and have very unclear military use (even if they were possible).
Create a database of historical biological accidents
Conduct large expert surveys asking which countries worldwide would seek nuclear weapons if they didn’t require rare materials, only cost $10 million (for the whole program), and required only 10 scientists
A version of this, but asking how many countries would pursue an omnicidal ‘cobalt bomb’ for similar costs (both with and without the assumption that the regular $10 million nuke option is available)
Perhaps also repeated for historical eras to get a larger ‘sample size’
Quantitatively scope ‘fads in terrorism’ both in ideology and methodology. For example, analyzing the extent to which tactics like suicide bombing, vehicle ramming, plane hijacking, etc. ‘took off’ after one or two successful demonstrations, or the extent to which ISIS inspired lone wolves.
Create a database of the most impressive technical feats accomplished by terrorist groups, or non-state actors such as criminal gangs (e.g. bank heists, drug smuggling with submarines, etc.)
Quantitatively estimate how likely the taboo on biological weapons is to totally collapse, perhaps based on past taboos collapsing
Estimate the rate at which terrorist groups become compromised by government surveillance, destroyed, or disbanded
Survey experts to assess the likelihood that a military will follow omnicidal orders or other catastrophic actions in various situations, such as a nuclear first strike, or in response to a nuclear attack
Estimate the rate at which state secrets, such as information on biological or nuclear weaponry, are leaked. This includes both information about the existence of programs, and also leaks of technical information, research, or blueprints.
Create a database of actions committed by countries or para-state groups that strongly violate international norms and treaties (eg, genocide, seeking of WMDs, sponsoring of terrorist attacks, violating arms control)
Estimate the fraction of individuals who, if given the opportunity, would choose to commit very destructive acts
Estimate the number of biologists worldwide with various different technical skills, and their level of access to funds and equipment
Estimate the fraction of motivated people who could spend years of their time doing something “impressive” by themselves (e.g. building a complicated technical item like a nuclear reactor, or hacking a secure target without being traced).
Forecast the size and budgets of major biotech industry players, by country, company, and/or specific R&D focus
Model the probability that a regime develops and/or deploys biological weapons. This might entail:
Making a database of countries under strong “existential pressure” (real or perceived), and investigating which did and did not seek deterrence of a similar nature.
Numerating historic dictatorships to categorize their decision-making, particularly with regards to acquiring WMDs or committing atrocities.
Creating a historical database getting at the question of what fraction of wars have at least one faction that would ‘take the world with them’ if given the opportunity (e.g. Hitler in bunker scenarios).
Note that negative examples may be very informative, in which there may have been strong pressure to develop or use WMDs or other deplorable strategies, but warfare stayed conventional (Saddam Hussein in 2004, or Ukraine in 2022).
We are interested in proposals of any length or scope, ranging from a full time 4-6+ month commitment to a small 10-hour project. In some instances, we might respond to a proposal by suggesting a closer ongoing collaboration.
How do I apply?
Applications are via this Google form, and are due on Sunday, June 5th, at 11:59 pm PDT. You’ll be required to submit:
CVs of any project team members
A research proposal, up to two pages, outlining what you would like to investigate and why. This should include a rough estimate of the project timeline and a budget proposal to account for your time along with any project costs.
If applying as an organization, information about your research organization. Organizations can submit multiple separate proposals if desired; please use one application but keep the budgets separate for each project in the budget document.
We expect to fund between $500,000 and $2 million worth of proposals, depending on the quality and scope of proposals. In exceptional circumstances, we could expand this amount substantially.
You should hear back from us by June 26th. Please contact us at [email protected] if you have any further queries.
If you would like to provide anonymous or non-anonymous feedback to Open Philanthropy’s Biosecurity & Pandemic Preparedness team relevant to this project, please use this form.
Thank you to Carl Shulman for initially suggesting this research approach and providing comments. We appreciate additional comments/advice from many others, particularly Chris Bakerlee, Rocco Casagrande, and Gregory Lewis.
We have an existing program in effective altruism community growth. Like our new program, the existing program is focused on community building — increasing the number of people working to do as much good as possible with their time and resources, and helping them to work more effectively. However, the existing program evaluates grants through the lens of longtermism, focusing on projects that aim to raise the chance of a very long-lasting and positive future (including by reducing risks from existential catastrophes).
We view the EA community as pursuing a similar endeavor to Open Philanthropy; people in the community aim to do as much good as possible, and consider a broad range of approaches. We think there are many projects within this community that don’t fit the longtermist focus of our existing program, but do have an expected impact above the “bar” we use to evaluate GHW grants.
We’ve already made some grants to EA projects that were highly promising from a GHW perspective1, including Charity Entrepreneurship (supporting the creation of new animal welfare charities) and Founders Pledge (increasing donations from entrepreneurs to outstanding charities).
However, hiring a program officer to focus on this category will give us the capacity to significantly expand our grantmaking, develop our knowledge and strategy in this area, and evaluate the progress of our grantees.
We’re looking to hire someone who is very familiar with the EA community, has ideas about how to grow and develop it, and is passionate about supporting projects in global health and wellbeing. To see more details and apply, visit our job description.
How the program will operate
Our founding program officer will have at least $10 million in available funds to allocate in their first year, and funding could grow significantly from there depending on the volume of good opportunities they find.
The new program will not impact our funding for EA community projects with a longtermist focus, and we don’t expect to reduce our grantmaking in that area.
While both EA programs will operate independently, they may co-fund grants when there is a strong case for impact as assessed through both our GHW and longtermist frameworks.
Open Philanthropy is running a $150 million Regranting Challenge, aiming to add funding to the grantmaking budgets of one to five outstanding programs at other foundations. We believe there are some excellent individual programs and whole foundations out there and we want to experiment with giving them more money to allocate rather than trying to copy their approaches.
We are looking to support high-impact programs that improve human health, facilitate economic development, and/or address climate change. By default, we will roughly aim to double a selected program’s annual grantmaking budget for three years, subject to the overall size of the Regranting Challenge and allocating funding as effectively as we can across one to five total recipients.
To learn more about the Regranting Challenge, or to apply for funding, go here.
Gathering insights from a wide range of grantmakers, with different approaches, focused on different problems. By creating an open call, we hope to identify highly effective foundations and program areas to support that we would not have known about otherwise.
Adding funding to high-impact work that is already underway, rather than reinventing the wheel ourselves. We are confident there are some highly effective grantmakers out there doing better work than we could in their respective spaces. Our goal is to help others as much as we can, so we want to try allocating funding to the best funders we can find rather than just trying to copy their approaches.
Piloting a mechanism that enables the most impactful programs to grow. We see the lack of feedback mechanisms that ensure effective grantmakers get more money to allocate as a major shortcoming in the existing philanthropic ecosystem. We’re excited about the Regranting Challenge as a chance to experiment with changing that.
Last year, I wrote that Open Philanthropy was expanding and we were recruiting to help us direct philanthropic funding in new causes:
We’re hiring two new Program Officers, in South Asian air quality and global aid policy. Each of these Program Officers will identify specific grants and grantees that we believe can beat our 1,000x social return on investment bar. We expect these positions to be filled by grantmakers who combine deep expertise in their area, strategic vision, and a quantitative mindset. We’re looking for people who already know many potential grantee organizations and can make reasoned and balanced arguments about why their approach is likely to clear our high bar for giving. We think finding the right grantmaker is a key ingredient to our potential impact in these causes, so we may not end up going into them if we can’t find the right people.
Today, I’m excited to announce two new hires who we believe combine these qualities, and that we will be launching South Asian Air Quality and Global Aid Policy as our first two new causes in more than five years when these new hires join Open Philanthropy early this year.
South Asian Air Quality
Our new South Asian Air Quality program will be led by Santosh Harish. Santosh was until recently a Fellow at the Centre for Policy Research in New Delhi, where he was a leading voice on the governance of air quality. He previously worked at the India Center of the Energy Policy Institute at the University of Chicago (EPIC-India). Before that, he was a Post-Doctoral Fellow with Evidence for Policy Design India and J-PAL South Asia and received a B. Tech from IIT Madras and a PhD in Engineering & Public Policy from Carnegie Mellon.
As described in this cause report, we think that South Asian Air Quality is an unusually promising space for philanthropy aimed at improving global health.
In short, despite the significant health impacts of air pollution — the Institute for Health Metrics and Evaluation, for example, suggests that air pollution in South Asia is responsible for almost 3% of all DALYs lost worldwide — philanthropic efforts to improve air quality in South Asia appear limited. We hope this new program will significantly grow the field and help improve the health of millions of people over the coming decades.
Areas of potential interest include:
Strengthening public goods like air quality monitoring data, emission inventories, source apportionment, and information resources in Indian languages.
Providing technical assistance to South Asian governmental actors on policy development and implementation.
Increasing awareness of health impacts and support for pollution mitigation actions.
Piloting interventions to change incentives and reduce pollution (e.g., around cookstove use or crop burning).
Growing the ecosystem of research, practitioner, and policy groups engaged in air quality.
Engaging academics within South Asia to increase the evidence base on the health impacts of air pollution.
Global Aid Policy
Our new Global Aid Policy program will be led by Norma Altshuler. Norma is currently a program officer in Gender Equity and Governance at the William and Flora Hewlett Foundation. She manages two portfolios of grants. The first is designed to increase the use of data and evidence to improve public policies in low and middle income countries, particularly sub-Saharan Africa; the second to improve women’s economic empowerment. Her previous work experience includes time at the Global Innovation Fund, USAID, and GiveDirectly. Norma received a BA from Bryn Mawr and a Master of Public Policy from UC Berkeley.
This program aims to create a future in which wealthy countries increase the welfare impact of their foreign aid, by increasing levels and/or by allocating existing aid more cost-effectively. To illustrate the types of aid we can imagine receiving more funding, the PEPFAR program represents only ~15% of the U.S. aid budget, excluding military aid, and has plausibly saved tens of millions of life-years since it was created in 2003.
Areas of potential interest include:
Political and policy advocacy for new, cost-effective global health programs (e.g., PEPFAR for X).
Advocacy within OECD countries other than the U.S. that may not have the same degree of policy infrastructure already developed, or may be more ripe for policy change.
Supporting expansion of high-return programs and investments within existing aid institutions.
Supporting investments in improving the cost-effectiveness or quality of existing aid programs.
Supporting research on the comparative cost-effectiveness of different aid programs and strategies.
Developing new strategies for increasing high-level political support for aid investments in the U.S. and elsewhere.
Funding demand-driven technical assistance to select departments in aid agencies, when that has the potential to result in more cost-effective spending.
Working to reduce low and middle income debt burdens, e.g. by supporting governments in negotiating more favorable terms from development finance loans.
We expect to work in South Asian Air Quality and Global Aid Policy for at least five years. At that point we will conduct reviews of our progress that could result in continuing the programs, significantly expanding them, or winding down our support to these areas (with a careful transition).
We still have a lot of growth ahead of us and will be expanding to start more programs in the coming months and years — check out our jobs page if you’re interested in helping drive that growth!
Today we are excited to announce our largest-to-date support for GiveWell’s recommendations: $300 million for 2021, up from $100 million last year, with tentative plansWe would not increase or decrease relative to the planned $500M/year level just because of modest fluctuations in GiveWell’s fundraising from other sources, but we could revisit either direction if there were material updates to our available assets, our “bar”, or other opportunities we … Continue reading to donate an additional $500 million per year in 2022 and 2023.
Like last year, some of this new $300 million has already been allocated to specific recommended charities, and some will be held in reserve until GiveWell has more information about which charities have the greatest remaining need for funds.Last year, $70 million was allocated ~immediately, and $30 million allocated in ~January. This year’s planned rollover is both larger in scale and also longer in duration. Additionally, we expect to defer to GiveWell more fully in terms of which organizations the rolled-over funds are distributed … Continue reading Grants we’ve already made or are planning to make imminently include:
$27 million to Malaria Consortium for insecticide-treated bednet distribution campaigns in two Nigerian states. We also contributed an additional $2 million for more extensive research on the use and durability of the nets and an attempt to collect better data on local malaria incidence and local mosquitoes’ susceptibility to the insecticide used in the bednets, to inform continual improvement in GiveWell’s cost-effectiveness estimates.
$20 million to the International Rescue Committee (IRC) for malnutrition treatment in Chad, Niger, Somalia, Burkina Faso, and the Democratic Republic of the Congo and $7 million to The Alliance for International Medical Action (ALIMA) for malnutrition treatment in Chad. This is a new intervention for GiveWell, and they wrote about the scale of the need, evidence base, and open questions here.
Up to $25 million (best guess $16 millionThe total amount granted will be a function of the amount needed for the incentives as the program is implemented.) to IRD Global to support an electronic immunization registry and associated mobile phone-based conditional cash transfer program for immunizations in Sindh Province, Pakistan.
This list reflects GiveWell’s impressive progress in identifying more cost-effective giving opportunities: none of these individual programs at these organizations had received funding from GiveWell prior to this year, and four of the five of the organizations and three of the four interventions are new to GiveWell overall.
As funds are distributed from this year’s $300 million allocation, they will be listed in our grants database.
The rest of this post gives more context on our reasoning for the decisions above, structured as a Q&A.
Why are you giving more than in previous years?
Based on the framework we explained last week, we’re giving more going forward because of:
Dramatic growth in GiveWell’s identified cost-effective opportunities this year relative to last year, and a further increase in their expectation of giving opportunities they’ll find in the future.
Significant growth in the assets we expect to eventually distribute, and in how much giving similar to ours that we expect others to do in the future.
An increase in how we value saving lives relative to increasing income.
More thorough work on how we expect these and other factors to interact, leading us to lower our “bar” (for how cost-effective grants need to be to spend now rather than save to donate later), by ~20% for GiveWell recommendations.
Much more technical detail on the interplay of these factors is available in our previous post.
Will GiveWell distribute all $300M this year?
No, we don’t expect them to. When combined with projected growth in donations from other donors, our increased allocation means that GiveWell expects to roll over roughly $100 million of funds raised this year into next year to grant in 2022. From GiveWell’s blog post:
We aim to find and fund around $1 billion of highly cost-effective giving opportunities annually by 2025. There’s an enormous amount of good that we can accomplish with our team and donors, and we’re excited to take on this ambitious challenge.
Our confidence in our ability to increase the amount of cost-effective funding we find—and our rapid growth—is driving some changes this year. While we could spend all of the funding we expect to raise in 2021 on opportunities now, we don’t think we should. We plan to roll over about $110 million (~20% of our forecasted funds raised) into 2022 because we expect the opportunities this funding would be spent on now are much less cost-effective than those we expect to find over the next few years.
There are tradeoffs to using funds later, but we think they strongly favor waiting for better opportunities. Our core mission is to help donors maximize their impact, not to get funding out the door as quickly as possible.
We expect GiveWell will likely be in a similar situation of rolling forward a meaningful amount of funding next year into 2023, but we nonetheless plan to increase our funding further.
Why are you committing so much in advance?
In addition to GiveWell’s reasoning above, we see several practical arguments for exceeding GiveWell’s already-identified opportunity set and publicly articulating our tentative expected giving to their recommendations in future years:
This is a vote of confidence in GiveWell as a mature organization, and will give them more autonomy and predictability. Previously, we roughly planned around giving GiveWell recommendations 10% of our eventual giving, but also in principle wanted to fund all opportunities GiveWell would discover above our “bar”. That was administratively difficult because it made it hard for GiveWell to plan and could require that GiveWell and Open Philanthropy keep our different systems for assessing cost-effectiveness constantly in sync. With funding level plans, we can be a bit more loosely coupled, which is practically simpler, while still trying to optimize by keeping our bars roughly in sync over time.We’re also changing the internal approval process for these grants to have a stronger presumption of deference to the GiveWell recommendations, and aiming to move to more of an annual review (rather than grant-specific) process for deep engagement.
We have increasingly explicit goals for long-term spending levels in Global Health and Wellbeing. This has two implications: (a) it helps us manage our workload to directly delegate responsibility for some of the goals to GiveWell; and (b) setting explicit spending levels (rather than having annual spending respond to a fixed “bar”) better reflects the likelihood of correlated updates (e.g., in a world where GiveWell discovers tons of great opportunities, it seems more likely that our other work will too, and that the bar should rise, rather than us just filling everything GiveWell discovers at a fixed barFor example, consider a scenario in which GiveWell discovered that they had $5B rather than $500M of annual room for more funding (RFMF) at our bar. We would interpret that as suggestive evidence that we should expect to find lots of cost-effective opportunities in our non-GiveWell Global Health … Continue reading).
We want GiveWell to be able to communicate clearly to their donors that marginal contributions are likely to effectively be disbursed in future years rather than immediately. We think these other donors should continue to support GiveWell if they seek to maximize the impact of their giving, but want them to have full context on our plans.While this new model means that marginal contributions this year are likely to effectively be disbursed in future years, we do not think that effective delay mechanically reduces the total nearterm support for GiveWell charities (whereas it could plausibly do so if we hadn’t made plans for future … Continue reading (For what it’s worth, I still expect the bulk of my personal giving this year to go to the GiveWell Maximum Impact Fund, so I don’t think Open Philanthropy’s plans should be answer-changing for other individual donors.)
If this level of funding means that GiveWell should lower their cost-effectiveness bar, we want them to be empowered to do so proactively. Lead times for developing and investigating future funding opportunities can be long, which is why GiveWell continues to hire more researchers.
We would not increase or decrease relative to the planned $500M/year level just because of modest fluctuations in GiveWell’s fundraising from other sources, but we could revisit either direction if there were material updates to our available assets, our “bar”, or other opportunities we discovered.
Last year, $70 million was allocated ~immediately, and $30 million allocated in ~January. This year’s planned rollover is both larger in scale and also longer in duration. Additionally, we expect to defer to GiveWell more fully in terms of which organizations the rolled-over funds are distributed to. This includes a commitment to distribute the allocated funds to GiveWell’s recommendations.
We’re also changing the internal approval process for these grants to have a stronger presumption of deference to the GiveWell recommendations, and aiming to move to more of an annual review (rather than grant-specific) process for deep engagement.
For example, consider a scenario in which GiveWell discovered that they had $5B rather than $500M of annual room for more funding (RFMF) at our bar. We would interpret that as suggestive evidence that we should expect to find lots of cost-effective opportunities in our non-GiveWell Global Health and Wellbeing (GHW) grantmaking. If we just filled everything GiveWell discovered at a fixed bar, then our annual GiveWell support would increase so much that we wouldn’t have much funding available for other GHW grantmaking. Under our new approach, we would instead just support $500M of these GW recommendations, and then have resources left over for the highly-effective opportunities we’d expect to find elsewhere, while also revisiting the bar in the future. If, on the other hand, GiveWell could only find $200M above our bar, it’d be more likely that we would need to lower the bar rather than only wanting to give $200M, and the near-term funding plan would give us time to adjust. We think it’s valuable to allow the bar, more than funding amounts, to move flexibly in the near term, though we would expect to shift allocations in the long term in response to major updates.
While this new model means that marginal contributions this year are likely to effectively be disbursed in future years, we do not think that effective delay mechanically reduces the total nearterm support for GiveWell charities (whereas it could plausibly do so if we hadn’t made plans for future funding levels, since we could have just given less in response to GiveWell having more savings; see discussion of funging here). As noted above, our planned near-term support of GiveWell (i.e. $500M/year) will not change mechanically in reaction to modest positive or negative fluctuations in GiveWell’s fundraising from other sources, and part of our goal with articulating plans fairly far out in time is to mitigate concerns about mechanical funging for GiveWell overall. Our proposed structure limits how much other donors should expect their marginal funding to GiveWell top charities to cause a reduction in ours, though not necessarily to zero (since over the long run GiveWell’s expected funding from other sources could affect our calculation of our bar).
In 2019, we wrote a blog post about how we think about the “bar” for our giving and how we compare different kinds of interventions to each other using back-of-the-envelope calculations, all within the realm of what we now call Global Health and Wellbeing (GHW). This post updates that one and:
Explains how we previously compared health and income gains in comparable units. In short, we use a logarithmic model of the utility of income, so a 1% change in income is worth the same to everyone, and a dollar of income is worth 100x more to someone who has 100x less. We measure philanthropic impact in units of the welfare gained by giving a dollar to someone with an annual income of $50,000, which was roughly US GDP per capita when we adopted this framework. Under the logarithmic model, this means we value increasing 100 people’s income by 1% (i.e. a total of 1 natural log unit increase in income) at $50,000. We have previously also valued averting a disability-adjusted life year (DALY; roughly, a year of healthy life lost) at $50,000, so we valued increasing income by one natural-log unit as equal to averting 1 DALY. This would imply that a charity that could avert a DALY for $50 would have a “1,000x” return because the benefits would be $50,000 relative to the costs of $50. (More)
Reviews our previous “bar” for what level of cost-effectiveness a grant needed to hit to be worth making. Overall, having a single “bar” across multiple very different programs and outcome measures is an attractive feature because equalizing marginal returns across different programs is a requirement for optimizing the overall allocation of resources1, and we are devoted to doing the most good possible with our giving. Prior to 2019, we used a “100x” bar based on the units above, the scalability of direct cash transfers to the global poor, and the roughly 100x ratio of high-income country income to GiveDirectly recipient income. As of 2019, we tentatively switched to thinking of “roughly 1,000x” as our bar for new programs, because that was roughly our estimate of the unfunded margin of the top charities recommended by GiveWell (which we used to be part of and remain closely affiliated with), and we thought we would be able to find enough other opportunities at that cost-effectiveness to hit our overall spending targets. (More)
Updates our ethical framework to increase the weight on life expectancy gains relative to income gains. We’re continuing to use the log income utility model, but, after reviewing several lines of evidence, we’re doubling the weight on health relative to income in low-income settings, so we will now value a DALY at 2 natural log units of income or $100,000. We’re also updating how we measure the DALY burden of a death; our new approach will accord with GiveWell’s moral weights, which value preventing deaths at very young ages differently than implied by a DALY framework. (More)
Articulates our tentative “bar” for giving going forward, of roughly 1,000x (which is ~20% lower than our old bar given new units – explanation in footnote2). The (offsetting) changes in the bar come from new units, our available assets growing, more sophisticated modeling of how we expect cost-effectiveness and asset returns to interact over time, growth in GiveWell’s other funding sources, and slightly increased skepticism about our ability to spend as much as needed at much higher levels of cost-effectiveness. Due to the increased assets and lower bar, we’re planning to substantially increase our funding for GiveWell’s recommended charities, which we will write more about next week. However, we still expect most of our medium-term growth in GHW to be in new causes that can take advantage of the leveraged returns to research and advocacy, and could imagine that we’ll eventually find enough room for more funding in those interventions that we will need to raise the bar again. (More)
This post focuses exclusively on how we value different outcomes for humans within Global Health and Wellbeing; when it comes to other outcomes like farm animal welfare or the far future, we practice worldview diversification instead of trying to have a single unified framework for cost-effectiveness analysis. We think it’s an open question whether we should have more internal “worldviews” that are diversified over within the broad Global Health and Wellbeing remit (vs everything being slotted into a unified framework as in this post).
This post is unusually technical relative to our others, and we expect it may make sense for most of our usual blog readers to skip it.
1. How we previously compared health and income
We often use “marginal value of a dollar of income to someone with baseline income of $50K” as our unified outcome variable, so by definition giving a dollar to someone with $50k of annual income has a cost-effectiveness of 1x.3 We value income using a logarithmic utility function, so $100 in extra income for a rich person generates as much utility as $1 in extra income for a person with 1/100th the income of that rich person. In order for a grant’s cost-effectiveness to be, say, 1000x, it must be 1000 times more cost-effective than giving a dollar to someone with $50k of annual income.
We quantify health outcomes using disability-adjusted life years (DALYs). The DALY burden of a disease is the sum of the years of life lost (YLL) due to the disease, and the weighted years lived with disability (YLD) due to the disease. If you save the life of someone who goes on to live for 10 years, your intervention averted 10 YLLs. If you prevent an illness that would have caused someone to live in a condition 20% as bad as death for 10 years, your intervention averted 20%*10=2 YLDs. (We don’t necessarily endorse the disability weights used to measure YLDs, and in principle we might prefer other methodologies, such as quality-adjusted life years (QALYs) or just focusing on YLLs. We use DALYs because global-health data is much more widely available in DALYs than in other frameworks, especially from the Global Burden of Disease project.)
The health interventions that we support are primarily lifesaving interventions (with the exception of deworming where the modeled impacts run entirely through economic outcomes) – so although we talk in terms of DALYs, most of our health impact is in YLLs. (For instance, according to the GBD, ~80% of all DALYs in Sub-Saharan Africa are from YLLs, and amongst under-5 children the same figure is 97%. For malaria DALYs the share from YLLs is even higher, at 94% overall and 98% amongst under-5 children.) We also sometimes use quasi-disability-weights for valuing other outcomes (e.g., the harm of a year of prison).
There are different approaches to calculating the YLLs foregone with a death at a given age. For instance, the life-expectancy tables in Kenya suggest that an average child malaria death shortens the child’s life by 68 years (i.e. that is the average remaining life expectancy of a 0-5 year old in Kenya).4 This is not a floor on plausible YLLs – one could imagine that those prevented from dying by some intervention are unusually sick relative to the broader population, and accordingly not likely to live as long as a population life table would suggest – and as explained below GiveWell uses moral weights for child deaths that would be consistent with assuming 51 years of foregone life in the DALY framework (though that is not how they reach the conclusion). On the other extreme, the GBD takes a normative approach to life expectancy, saying in effect that everyone’s life expectancy at birth should be 88 years.5 Therefore any death under age 5 has a DALY burden of at least 84 years under the GBD approach.6 We have previously been inconsistent in this regard – following the methodology of the GBD in much of our own analysis, while deferring to GiveWell’s approach (and moral weights) on their recommendations. (In the analysis further below of our current bar, we assume for simplicity that our status quo approach splits the difference between the GiveWell and GBD approaches, and uses expected life years remaining for someone who lives to age 5 based on national-level life tables for Kenya.) Below, we explain our plan to follow GiveWell’s approach more closely going forward.
Historically we haven’t written much about how we came to value a DALY at $50,000 (or equivalently to one unit of log-income), which is less than many other actors would suggest. We don’t have a well-documented account of this historical choice, but the recollection of one of us (Alexander) is:
We had seen numbers like $50K cited in the (older) health economics literature. It was also close to the British government’s widely-cited cost-effectiveness threshold of £20k-£30k (roughly $50k as of 2007; less now) per QALY (though that is more based on an opportunity cost frame than a valuation frame).
A $50K DALY valuation roughly reconciled our relative valuation on the life-saving GiveWell top charities against GiveDirectly (i.e., we had GiveDirectly at 100x in our units and a cost per DALY averted of ~$50 for GiveWell top charities, which at $50K/DALY, implied a 1,000x ROI, or a ~10x difference with GiveDirectly) with GiveWell’s (which thought their life-saving charities were ~10x as cost-effective as GiveDirectly at the time).7
Given that we were already prioritizing health heavily relative to income in terms of the total set of interventions we were supporting, opting for a lower valuation than some other parts of the literature seemed conservative.
Using a number that was significantly larger than GDP per capita (which was ~$50K in the US at the time we adopted this valuation) implied there was more value at stake than total resources in the economy, and that seemed wrong to me at the time. We now think that I was mistaken and there’s no in-principle reason there can’t be substantially more value at stake than the sum of the world economy.
2. Our previous “bar”
It is useful for us to have a single “bar” across multiple very different programs, years, and outcome measures, because equalizing marginal returns across different programs and across time is a requirement for optimizing the overall allocation of resources, and our mission is to give as effectively as we can. The basic idea of this “bar” is that it tells us what level of cost-effectiveness of grant is “good enough” to justify spending on some specific grant or program vs. saving for future years or allocating more funding to another program instead. If instead we set different bars across programs or years (in present value terms, i.e., after adjusting for investment returns), then that would mean we could have more impact by changing the allocation of resources (to put more into the program with the higher bar and less into the program with the lower bar), and we’d, in principle, want to do that. (However, in practice, we often see a case for practicing worldview diversification.)
Prior to 2019, we often used a “100x” bar, based on the units above and the very roughly 100x ratio of $50K to GiveDirectly recipient income (net of transaction costs). We thought “that such giving was quite cost-effective and likely extremely scalable and persistently available, so we should not generally make grants that we expected to achieve less benefit per dollar than that.”
As of 2019, we switched to tentatively thinking of “roughly 1,000x” as our bar for new programs, because that was roughly our estimate of the unfunded margin of the GiveWell top charities, and we thought we would be able to find enough other opportunities at that cost-effectiveness to hit our overall spending targets. We wrote: “Overall, given that GiveWell’s numbers imply something more like “1,000x” than “100x” for their current unfunded opportunities, that those numbers seem plausible (though by no means ironclad), and that they may find yet-more-cost-effective opportunities in the future, it looks like the relevant “bar to beat” going forward may be more like 1,000x than 100x.” However, that bar was rough, and we never made it very precise in large part because we don’t expect to be able to make back-of-the-envelope calculations that could reliably distinguish between, say, 800x and 1,000x.
3. New moral weights
We now think our previous approach to valuing health placed too little value on lifesaving interventions relative to income interventions. Our new approach values a DALY averted twice as highly, equal to a 2-unit (rather than 1-unit) increase in the natural log of any individual’s income. (This is equivalent to increasing 200 people’s incomes by 1% – i.e., in our favored units, equal to $100K in units of marginal dollars to individuals making $50K.) This updated value is more consistent with empirical estimates of beneficiary preferences, the subjective wellbeing literature, and the practice of other actors in the field.
These are all judgment calls subject to uncertainty, and we could readily imagine revising our weights again in the future based on further argument.
3.1 Beneficiary preferences
There is a large body of research on how people tend to trade off mortality risks against income gains, in particular the Value of a Statistical Life (VSL) literature. Some analyses in this literature use the stated preferences of individuals – i.e. responses to survey questions like, “would you rather reduce your risk of dying this year by 1%, or increase your income this year by $500?” Others use the revealed preferences of individuals: are the study subjects, on average, willing to take a job with a 1% higher rate of annual mortality, in exchange for $500 higher annual income?8
The research findings are clearest in high-income countries, where they tend to find that respondents value a year of life expectancy 2.5 to 4 times more than annual income.9 (Since these valuations are mostly based on marginal tradeoffs, and since we model utility as a logarithmic function of income, we can interpret these findings to say that “respondents value a year of life expectancy as much as the utility gained from increasing income by 2.5 to 4 natural-log units.”)
In low-income countries, the evidence is sparser and the findings vary widely.10 For example, in this chart we plot all the estimates found by the literature search in Robinson et al. 2019’s meta-analysis – they searched for all VSL analyses, whether stated or revealed, in any country that had been classified as low- or middle-income in the last 20 years.11 (We divide the VSLs by adult12 life expectancy to get the VSLY – value of a statistical life-year. We express this in terms of local income, which tells us how many units of log-income are worth as much to the respondents as an extra year of life expectancy. We plot stated-preference studies as circles, and revealed-preference studies as diamonds.)
As you can see, the results vary extremely widely. One paper (Mahmud 2009) finds that subjects in rural Bangladesh traded off mortality risk against income in a way that suggests they valued a year of life expectancy at, very roughly, just 62% of annual income. Another paper (Qin et al. 2013) finds that subjects in China (who reported only $840 in annual per-capita income) valued a year of life expectancy at roughly 7x annual income.
In the face of this huge variation, some sources13 recommend estimating LMIC preferences via an “elasticity” approach: statistically estimate a function mapping VSL to income, anchoring it off the better-validated VSL figures in high-income countries. This elasticity approach is reviewed in Appendix A. Mainstream versions of it predict that individuals at the global poverty line would trade off 1 year of life expectancy for anywhere between 0.5x-4x of a year’s income14 (which we can interpret as 0.5 to 4 units of log-income).
Our beneficiaries (especially for lifesaving interventions) are mostly in low and lower middle income countries, and for now we’re focusing on this context in setting our moral weights. (As we’ll discuss in Appendix A, there are empirical and theoretical reasons to think that the exchange rate at which people trade off mortality risks against income gains differs systematically across income levels, with richer people valuing mortality more relative to income.)
Rather than estimate beneficiary preferences based on the academic literature, why not ask beneficiaries directly? In 2019, GiveWell commissioned a survey among individuals who are demographically similar to their beneficiaries: families in rural Kenya and Ghana, with annual consumption of roughly $300 per capita.15 IDInsight conducted this survey, and their analysis suggested that these respondents would value an extra year of life expectancy as much as 2.1-3.8 years of income, which we can interpret as 2.1-3.8 units of log-income. (See GiveWell’s post for a discussion of the limitations of this study).16
3.2 Subjective wellbeing
Research on subjective wellbeing suggests that we can improve life satisfaction by increasing incomes, but it also seems to indicate that such gains are small compared to the baseline wellbeing experienced by individuals even at low levels of income. We interpret this as suggestive evidence that a year of life expectancy is worth substantially more than 1 unit of log-income.
For example, this chart from Stevenson and Wolfers 201317 attempts to harmonize different surveys into one measure of “life satisfaction”. It suggests that you’d have to increase income by a factor of at least 64 in order to double reported life satisfaction on the harmonized scale (starting from the baseline level of life satisfaction experienced by the average individual at the global poverty line).18 Note, of course, that the units of this scale are somewhat arbitrary, so you have to add some strong assumptions to think that a doubling on this scale indicates a doubling in actual subjective wellbeing.19
We could use these “life satisfaction” units to estimate an exchange rate between increases in life expectancy and increases in income. The chart suggests that, for someone near the global poverty line, you’d have to increase income by roughly 64x in order to get twice as much life satisfaction at any moment. You could conclude that a 64x increase in income (for one year) is worth as much life satisfaction as an extra year of life. In that case, a logarithmic utility function suggests valuing a DALY at roughly 4 units of log-income.20
We don’t want to over-interpret this – especially since the measure of “life satisfaction” uses a somewhat arbitrary scale and you could get different results depending on your assumptions about how reported satisfaction translates to actual subjective wellbeing.21 But it does seem like suggestive evidence that the old weights placed too much weight on income, and too little value on saving lives.
3.3 Other actors’ values
Doubling the weight we place on health also brings us into better alignment with other actors in the field. For example, GiveWell’s implied valuation of a YLL from preventing an adult death from malaria is 1.6 units of marginal log-income.22 The Lancet Commission on Investing in Health recommended valuing DALYs in LMICs at 2.3 times per-capita GDP.23 Until recently, the World Health Organization recommended that health interventions be considered “cost-effective” if the cost of averting a DALY was less than 3x per-capita GDP.24 Other actors in high-income countries use widely-ranging methodologies; some would give numbers as high as 4x per-capita income25 or as low as 0.7x per-capita income.26
In particular, we don’t want our moral weights to stray too far from GiveWell’s. This is substantially for pragmatic reasons: most of our lifesaving interventions run through them, and a statement of our valuations that was out of line with our spending on the GiveWell recommendations doesn’t seem like it would be accurate. But it is also partially due to some epistemic deference: GiveWell’s moral weights are the result of thoughtful deliberative processes, and we share some meta-ethical and epistemic approaches with them. While we are generally aware of each other’s reasoning, if we invested more time in understanding each other’s thinking, we expect we would likely come closer to agreeing on moral weights. Thus, some preemptive deference to their moral weights makes sense to us.
3.4 Aggregating these considerations
The literature on subjective wellbeing, and our attempts to estimate beneficiary preferences, both suggest to us that we should rate lifesaving impacts significantly more highly than under our old weights. On the other hand, pushing in favor of a less aggressive increase:
We don’t want to be too far off from GiveWell’s current valuation of 1.6 units of log-income.
We don’t want to change our moral weights too much in any one year, to avoid “whipsawing” if we later determine that this change was mistaken.
Given that we already prioritize health interventions more highly than other philanthropists, erring towards a lower valuation seems “conservative.”
There is huge uncertainty/disagreement across and between lines of evidence – including between us and on the broader GHW cause prioritization research team – and any given choice of ultimate valuations seems fairly arbitrary, so we also prefer a visibly rough/round number that reflects the arbitrariness/uncertainty.
Taking all of these considerations together, we’re doubling our DALY valuation to $100,000, i.e. 2 units of log-income. We expect to continue to revisit this in future years and could readily envision major further updates.
3.5 Measurement of DALYs
We’re also switching our approach to be more consistent with GiveWell’s framework in how we translate deaths into DALYs. GiveWell assigns moral weights to deaths at various ages, rather than to DALYs. But we can use their moral weights to derive a mapping of deaths to DALYs, by dividing GiveWell’s moral weight for each death by GiveWell’s moral weight for a year lived with disability (which is defined by WHO so as to be equivalent to a DALY).27
The resulting model cares about child mortality more than adult mortality, but not by as much as remaining-population-life-expectancy would suggest. For example, GiveWell places 60% more weight on a child malaria death than on an adult death, and we can fairly straightforwardly interpret their process as counting an average of 32 DALYs per adult malaria death,28 so the GiveWell-based DALY model would implicitly count 32*160% = 51 DALYs for an under-5 malaria death. In contrast, a direct remaining-population-life-expectancy approach in Kenya would count 68 DALYs for an under-5 malaria death29, and the Global Burden of Disease approach (explained above) would count more than 84 DALYs.
GiveWell has written about the process for reaching their current moral weights here.
We see a normative case for the Global Burden of Disease’s uniform global approach to DALY attribution, but given our commitment to maximizing (expected) counterfactual impact, we think the national life table approach represents a plausible upper bound on attributable DALYs, and even those seem aggressive as an estimate of counterfactual lifespan for children whose lives are saved on the margin (who are presumably less advantaged and less healthy than the national average). Overall, we’re not sure where precisely this consideration should leave us, but it seems to argue for lower numbers.
We also haven’t reached any settled thoughts on the impact of population ethics or the second-order consequences of saving a life (e.g., on economic or population growth) on how to translate between deaths and DALYs.
For now, in order to be more consistent in our practices, we’re going to defer to GiveWell and start to use the number of DALYs that would be implied by extrapolating their moral weights. (In practice, we already defer to GiveWell for their own recommendations, so this would mainly change how we use GBD figures in our BOTECs for other grants, especially in science. In the section immediately below comparing our new values to old values, we assume for simplicity that we were using the national life table approach, which splits the difference between the GiveWell approach and the GBD in describing our status quo practices.) This means fewer DALYs averted per child death averted, which offsets some of the apparent gains from doubling our value on health. We expect to revisit this and try to form a more confident independent view about the balance of all these considerations in the future.
4. Expanding our spending, and modestly lowering our cost-effectiveness bar
We think there are four major buckets of updates that affect our “bar” going forward:
The change to our weight on health, described just above.
Other secular changes to GiveWell’s expected cost-effectiveness.
Cross-cutting changes to our estimate of future available assets.
Updates to our estimate of the likely “last dollar” cost-effectiveness of our non-GiveWell spending.
We walk through more detail below, but overall these factors leave us with a bar of very roughly 1,000x going forward for now.
4.1 Increased value on health
Overall, our new moral weights put more emphasis on health than we did before, which in some sense increases the amount of value at stake according to our moral framework, and should raise our cost-effectiveness bar, at least expressed in terms of marginal dollars to someone making $50K.
This table shows how we would rate the impact of various interventions, according to our new moral weights. (You can see the calculations in Google Sheets here.) Here’s how to read the first column:
GiveWell estimates that HKI’s vitamin A supplementation program in Kenya averts 14 child deaths per $100k granted, and also increases incomes by 500 units of log-income.30
Per GiveWell’s moral weights, this is 7 times more cost-effective than cash transfers to the global poor.
Open Philanthropy’s old moral weights would have rated this intervention as 747 times more cost-effective than giving a dollar to someone with $50K of income.
Under our new moral weights, we would rate this intervention as 977 times more cost-effective than giving a dollar to someone making $50K.
Note that the update in cost-effectiveness depends on the mix of beneficial outcomes an intervention generates. A charity that just increases income (as in the second column of this table) will have the same cost-effectiveness under our new or old moral framework. An intervention that simply averted DALYs (say, averting 1 DALY for every $100 spent) would be twice as cost-effective under our new moral weights. Because of the change in how we measure DALYs, an intervention that averts child deaths at a fixed rate (say, averting 1 child death for every $5000 spent) would only be roughly 1.5x more cost-effective under our new moral weights than under the old framework.31 The program in column 1 gets some of its moral impact from income interventions and some from averting child deaths, so its cost-effectiveness changes by a weighted average of 1x and 1.5x.
The third and fourth columns show the mix of outcomes that could be achieved by a typical dollar to GiveWell top charities – that is, roughly 50% of its impact from income effects and 50% from mortality effects.32 Our new weights rate the cost-effectiveness of this mix of outcomes ~25% higher than our old weights did.33 (To foreshadow a bit, this means that if we lower the bar by ~20%, simultaneously with changing our moral weights, then our nominal bar will not change.)
For the past few years GiveWell’s rough margin has been interventions that they rate as 10x more cost-effective than cash transfers to the global poor – the third column shows how a dollar could be spent at this margin. This was our prior bar, and you can see that in our old units it was ~1000x. If the only updates were to our valuation on DALYs (and our framework for translating deaths into DALYs), our bar would go to roughly 1300x (our rough old bar expressed in new units).
However, this change to our valuations is not the only change here; below we address others which, taken together, lower our expected cost-effectiveness of our last dollar (assuming the GiveWell mix of outcomes) by roughly 20%. This means our nominal bar is staying roughly constant.
4.2 Changes to GiveWell’s expected cost-effectiveness
Our expectation of future funding for GiveWell top charities, including both our support and that from others, has grown much faster than we would have expected in 2019. We didn’t have a precise model of expected future funding to GiveWell at that point, but very roughly we think it’s reasonable to model expected future funding for GiveWell’s recommendations as having doubled relative to our 2019 expectations. We currently model the GiveWell opportunity set as isoelastic with eta=.375,34 which implies that a doubling of expected funding should reduce marginal cost-effectiveness (and the bar) by 23% (1 – 2^-.375 ≈ 23%).
On the other hand, GiveWell has found slightly more cost-effective opportunities over the last year than we would have expected them to. This year, they’re expecting roughly $400M of spending capacity at least ~8x as cost-effective as GiveDirectly according to their modeling. This is roughly as much as we would have expected if they had already explored the space of “8x” opportunities as thoroughly as they’ve explored the opportunities that are ~10x as cost-effective as GiveDirectly.35 Given that they have only recently focused on exploring this space of slightly-less-cost-effective interventions, this is a very promising amount of spending capacity, and suggests the potential for even more capacity in the near future. This should marginally raise our bar.
We’ve also done more sophisticated modeling work on how we expect the cost-effectiveness of direct global health aid and asset returns to interact over time, and how we should optimally spread spending across time to maximize expected impact while hitting Cari and Dustin’s (our main funders) stated goal of spending down within their lifetimes. We’re still hoping to share that analysis and code, but the top level conclusion is that for opportunities like the GiveWell top charities and with a ~50 year spenddown target, it’s optimal to spend something like 9% of assets per year. That would imply a significantly faster pace of spending for the assets we expect to recommend to the GiveWell top charities than we’ve reached in the past, which would in turn imply a lowering of the bar. Two other interesting implications of the model for GiveWell spending are that: (a) we should be trying to get to our optimal spending and then spending down with a decreasing dollar amount each year (which may be a more a flaw/simplification of the model than accurate conclusion); and (b) we should expect our bar to fall by roughly 4% per year in nominal terms (largely reflecting asset returns – this helps equalize our “real”36 bar over time).
Overall, we currently expect GiveWell’s marginal cost-effectiveness to end up around 7-8x GiveDirectly (in their units), which, assuming their current distribution across health and income benefits, translates to ~900-1,100x in our new units,37 though our understanding is that GiveWell does not necessarily endorse this extrapolation. Assuming that we continue to support the GiveWell recommendations and have a correctly-implemented uniform bar, that implies a similar bar across all our other work, though it could turn out to be too low if we’re able to find many more cost-effective opportunities in other work.
One complication for extrapolating from the GiveWell bar to our other work is that GiveWell is much more thorough in their cost-effectiveness calculations than we typically are in our back-of-the-envelope calculations, which might mean that the results aren’t really comparable. We linked to some examples of our back-of-the-envelope calculations from our 2019 post, and they compare very unfavorably to the thoroughness of GiveWell’s cost-effectiveness analyses. That said, GiveWell also counts some second-order benefits (e.g., the expected income benefits of health interventions) that we typically don’t, so it isn’t totally obvious which direction this adjustment would end up pointing on net. (It’s also not clear how we would want to make the appropriate adjustment even in principle since there’s some division of labor going on where GiveWell has more conservative/skeptical epistemics, but we intentionally don’t consistently apply those epistemics across our work.) Overall, we think we should probably use a somewhat higher bar for our other BOTECs rather than just applying the same bar from GiveWell, but we’re not currently making a mechanical adjustment for this and don’t have a good sense of how big it should be.
4.3 Increases to our estimate of future available assets
Our available funding has increased significantly as a result of stock market moves over the last few years. (This is not independent of the assumption above of GiveWell’s available resources doubling, since that assumes a substantial increase in our giving to their top charities.)
We’ve also become more optimistic about future funders contributing to highly-effective opportunities of the sort we may recommend, which would also lower our current bar on the margin. Some of this is driven by the emergence of other billionaires self-identifying with effective altruism, but it also reflects GiveWell’s increased funding from other donors, increasingly concentrated wealth at the top end of the global distribution, and the cryptocurrency boom. (Yes, that is weird, we know.)
By the same logic as above, increasing expected resources should lower the bar, but we don’t have as good of a model for how cost-effectiveness scales with resources in other Global Health and Wellbeing causes as we do for GiveWell recommendations, especially not for causes that we haven’t identified yet.
If the only thing that changed were GiveWell autonomously lowering its bar and accordingly having less cost-effective marginal recommendations, we should in principle marginally reallocate away from GiveWell and to other opportunities. But GiveWell isn’t independently lowering its bar; our overall plans and assessment of our bar contribute to the update. And given the composition of GiveWell’s top charities, made up of scalable, commodities-driven global health interventions, we expect them to have a lower eta (i.e., decline less in cost-effectiveness with more funding) than opportunities like R&D or advocacy that are more people-intensive (where we have a prior that returns tend to be more like logarithmic, which is more steeply declining than our model of the GiveWell top charities). That should mean that as resources rise, a larger portion of the total should flow to GiveWell. And that is reflected in our most recent plans: we previously wrote that we expected something like 10% of Open Phil assets/spending to go to “straightforward charity” exemplified by the GiveWell top charities, but now anticipate likely giving a modestly higher proportion (which, combined with asset increases, will mean substantially increasing our support for them in dollar terms).
4.4 Updates to our estimate of the likely “last dollar” cost-effectiveness of our non-GiveWell spending
Above, we argued that the GiveWell bar should go down in terms of our old weights and stay roughly nominally flat (at ~1,000x) in our new units. While the first-order implication is that the bar should uniformly decline across all of our Global Health and Wellbeing work, there are some complicating considerations.
Our new higher valuation on health and the GiveWell bar going down makes non-GiveWell global health R&D and advocacy opportunities look more promising than before, and we really don’t know how many of these opportunities we could find. If, hypothetically, we could find billions of dollars a year in global health R&D opportunities with marginal cost-effectiveness above the new GiveWell bar, then that should be the bar instead (and, by implication, we shouldn’t fund the GiveWell recommendations going forward). That said, it seems unlikely a priori that marginal cost-effectiveness for billions of dollars of global health R&D or advocacy spending would end up right in between the old and new GiveWell bars (which fall by one third for child health interventions and 50% for adult health interventions38), so the most likely implications of this are that either (less likely) our bar has always been too low and we should have always been doing this hypothetical global health R&D or advocacy instead of supporting GiveWell top charities, or (more likely) we will find many good opportunities better than the marginal GiveWell dollar but not enough that they independently drive our marginal dollar. Our new valuation on health (relative to income) is also a little higher than GiveWell’s, which in principle means that there could be things above our bar but below theirs, though in practice the valuations are close enough that we don’t think this is likely to be a big deal.
The a priori case for the existence of large scale leveraged interventions more cost-effective than the GiveWell margin continues to seem compelling to us. The Bill and Melinda Gates Foundation spends well over half a billion dollars a year on global health research and development39 and we find it plausible (though far from obvious, and haven’t been able to get great data on the view) that the marginal dollar there is better than the GiveWell margin.
But we haven’t found anything obviously more cost-effective than the GiveWell margin and scalable to billions of dollars a year. We’re far from being done looking, and have only covered a small part of the conceptual space, but we’re also not prepared to bet that we will succeed at that scale in the future.
For a more pessimistic prior, consider that at GiveWell’s bar of 8x GiveDirectly, the cost per outcome-as-good-as-averting-an-under-5 death is about $4000-$4500, and the cost per actual under-5 life saved (ignoring other benefits) for a charity focused on saving kids’ lives, is about $6000-$7000.40 There are about 5 million under-5 deaths per year, which implies that they would all be eliminated for $20-35B/year if GiveWell’s cost-effectiveness level could be maintained to an arbitrary scale. Total development assistance for health is more than that. If there were more than $3B/year of room-for-more-funding an order of magnitude more cost-effective than GiveWell’s margin, it would need to be as effective as eliminating all child mortality. This argument isn’t decisive by any means but can help give a sense of how hard it would be to beat the GiveWell margin by a lot at massive scale: there just aren’t that many orders of magnitude to work with.
We still don’t think all of our existing grantmaking necessarily hits this bar, and are continuing to try to move flexible portfolios towards higher expected-return areas while learning more about our returns and considering reducing our spending in others.
Our current best guess is that we won’t be able to spend as much as we need to within the Global Health and Wellbeing frame at a significantly higher marginal cost-effectiveness rate (though average might be higher) than the GiveWell top charities, so the marginal cost-effectiveness of the GiveWell recommendations continues to be a relevant metric for our overall bar. However, we still expect most of our medium-term growth in GHW to be in new causes that can take advantage of the leveraged returns to research and advocacy, and could imagine that we’ll eventually find enough room for more funding in those interventions that we will need to raise the bar again.
We haven’t done a thorough analysis of the costs of over- vs under-shooting “the bar” for all of our causes, but one important takeaway from our analysis of that for the GiveWell setting, which may or may not properly extrapolate, is that saving and collecting investment returns isn’t “free” (or positive) from an impact perspective. That is because, very roughly, the world is getting better (and accordingly opportunities to improve the world are getting worse) over time, and saved funding doesn’t have that long to compound and has to be spent later at a higher rate given the “spend down within our primary donors’ lifetimes” constraint (which in turn likely means it will be spent at a lower cost-effectiveness further out the annual spending curve). We also think we will be in a much better place to raise more funding from other donors if we’re spending down our existing resources, and the expected benefits of that, along with the ex ante possibility of raising a lot more money, makes it theoretically ambiguous whether the costs of under- or over-estimating the “true” bar are higher. Accordingly we’re just going with our very rough and round current best guess for the bar for now, rather than doing a full expected-value calculation, and we will revisit in the future as we learn more.
4.5 Bottom line
We are now treating our bar as “roughly 1,000x” (with our new weights on health) for the GiveWell top charities and in our new cause selection and grantmaking, though we retain considerable uncertainty and expect to continue to revisit that over the coming years. For the typical mix of GiveWell interventions, this bar is about 20% lower given our new moral weights.41
We think it’s important to note that the bar is very rough – we aren’t very confident that it, or the BOTECs we consider against it, are even within a factor of 2 of correct – and we will continue to put considerable weight on factors not included in our rough back-of-the-envelope calculations in making major decisions.
Due to this analysis and the lower forward-looking bar, we’re planning to give more to the GiveWell top charities this year and going forward – more on that next week.
5. Appendix A
The available direct service interventions in health, like the ones GiveWell recommends, are far more cost-effective in low-income countries than high-income countries, so the discussion above focuses on what value we should place on lifesaving interventions in low-income countries. If we were focused instead on saving lives in the developed world, likely via advocacy of some sort, we might trade off differently between lifesaving vs income-enhancing interventions – we are uncertain over a range of rich-world DALY valuations between 2-6 units of log-income.
There are theoretical and empirical reasons to think that the exchange rate at which people trade off mortality risks against income gains differs systematically across income levels, with richer people valuing mortality more relative to income.
5.1 VSL elasticity to income
The main empirical evidence is from the VSL literature described above. As discussed, economists often attempt to statistically estimate a function mapping VSL to income, anchoring it off the better-validated VSL figures in high-income countries.
This literature generally finds that individuals’ willingness to pay for life expectancy increases with income, which is unsurprising – a dollar matters a lot more to a rich person, so if everyone valued a DALY equally then VSLYs would increase linearly with income. Most reviews also find that this willingness to pay increases at a faster pace than income. For example, Robinson et al. 2019 review the mainstream literature, which finds that the elasticity of VSL to income is between 1.0-1.2 across LICs (and a bit below 1 across the developed world), though Robinson’s own analysis suggests an elasticity of 1.5 for extrapolating to LMICs.42 An elasticity of 1.2 would mean that if two individuals’ income differs by 10%, then on average the dollar value they place on a year of life expectancy will differ by 12%.
Lisa Robinson chaired a commission, sponsored by the Gates Foundation, that recommended the following ensemble approach:
VSL is anchored to US at 160x income, with an income elasticity of 1.5, and a lower bound of 20x income.
VSL is anchored to US at 160x income, with an income elasticity of 1.0
VSL is anchored to OECD at 100x income, with an income elasticity of 1.0
These yield the following VSLY estimates for someone at an income 1/100th that of the US43:
Here again is the chart from our discussion of LIC VSLYs above. As noted above, in this chart we plot all the estimates found by the literature search in Robinson et al. 2019’s meta-analysis – they searched for all VSL analyses, whether stated or revealed, in any country that had been classified as low- or middle-income in the last 20 years.47
I’ve added a few lines to show various elasticities you could use to predict LMIC VSLYs based on US VSLYs. An elasticity of 1 would say that willingness to pay for a year of life expectancy varies 1:1 with income – combining that with the US VSLY of 4 years of income, we’d predict that individuals in any country are willing, on average, to trade 4 units of log-income for 1 year of life expectancy. If instead we combined an elasticity of 1.1 with the US VSLY, we’d predict that individuals at the global poverty line would be willing to trade roughly 2.5 years of income for a year of life expectancy – this is roughly in line with the VSLYs from IDInsight’s surveys of communities demographically similar to GiveWell beneficiaries.
It seems to me that any elasticity between, say, 0.9 and 1.3 is potentially compatible with this data.
Analysts are often interested in the relationship between VSL and national-level income statistics like per-capita GDP, even though we often have measures of the respondents’ incomes, because it is often more practical to apply heuristics to national-level income statistics, especially when deriving VSL estimates to inform national policies. When we analyze VSLY measured in multiples of GNI per capita, rather than respondents’ income, we see a stronger relationship between income and VSLY – this data could be compatible with elasticities between, say, 1.0 and 1.5. This seems to be because the respondents in LMIC VSL studies report lower average incomes than their respective national average – presumably because the social scientists performing these studies are interested in somewhat poorer populations.48 We suspect that comparing LMIC VSL estimates to national-level income averages, rather than to respondents’ (lower) incomes, biases analyses like Robinson et al. toward finding higher elasticities of VSL to income.
5.2 Theoretical arguments for DALY valuations to vary by income
One theoretical argument would go like this: If you think we can improve individuals’ lives by improving their incomes, and you also think the moral impact of saving a life varies somewhat with the quality of that life (i.e. it’s better to extend a happy life than a miserable life), then it follows that it is more valuable in theory to extend the life of a typical high-income country resident than that of a typical person at the global poverty line.49 Many people (including us) find this a deeply concerning line of reasoning – and critically, this theoretical dynamic is in reality swamped by the fact that it is dramatically less expensive to extend the life of someone at the global poverty line, which is why the overwhelming majority of our GHW portfolio is focused on extending lives and increasing incomes in low and lower middle income countries.
5.3 Bringing these considerations together
We’ve been unsettled about how to aggregate these lines of argument, but ended up concluding that we didn’t need to reach a resolution on this because the expected cost of mistakes if we were wrong (i.e., if we assumed 2x and the true answer were 6x, or vice versa) were low.50
For now, we haven’t decided specifically how to weigh lives-vs-income tradeoffs in high-income countries, and when we face decisions that might depend on the specifics, will test a range of values between 2 and 6 units of log-income.
After hundreds of grants totaling more than $130 million over six years, one of our first programs — criminal justice reform (CJR) — is becoming an independent organization.
The team that had been leading our CJR program, Chloe Cockburn and Jesse Rothman, is transitioning to Just Impact, which describes itself as “a criminal justice reform advisory group and fund that is focused on building the power and influence of highly strategic, directly-impacted leaders and their allies to create transformative change from the ground up”.
We’ve had internal discussions around the possibility of a different structure for more than a year, and have spent the past few years continuing to search for new potential causes that might yield cost-effective giving opportunities. That has led to some important updates:
As we wrote in 2019, we think the top global aid charities recommended by GiveWell (which we used to be part of and remain closely affiliated with) present an opportunity to give away large amounts of money at higher cost-effectiveness than we can achieve in many programs, including CJR, that seek to benefit citizens of wealthy countries. Accordingly we’re shifting the focus of future grantmaking from our Global Health and Wellbeing portfolio (which CJR has been part of) further towards the types of opportunities outlined in that post — specifically, efforts to improve and save the lives of people internationally (including things like distributing insecticide-treated bednets to prevent the spread of malaria in Sub-Saharan Africa, and fighting air pollution in South Asia).
At the same time, we have been impressed with the CJR team’s work, which we believe has significantly influenced the field’s priorities, attracted other major donors, and contributed to some notable wins. Rather than shutting down the portfolio entirely, we are instead helping to launch Just Impact so Chloe and her team can continue to make grants that seek to safely reduce incarceration. We hope that other donors interested in criminal justice reform in the United States will join.
The $50 million seed grant to Just Impact is intended to support work for 3.5 years. We will continue to follow progress and continually revisit the right level of support in light of both Just Impact’s impact and our understanding of our alternative giving opportunities, and may continue our support beyond this initial seed grant. It is important to us to make this transition in a way that positions the CJR work to maintain its successes, navigate the transitional period smoothly, and hopefully raise enough from other funders to have even more impact in the future.
Additionally, we think there are other advantages to a spinout:
This is a natural progression for the CJR program and Chloe, who has partnered with other donors for several years, providing advice and supporting donors to make effective investments in criminal justice reform. We believe that an independent organization will allow Chloe to build on that work, and we hope Just Impact will be attractive to donors who want to support these important efforts.
Independence will also better position Just Impact to implement its vision and strategy.
More generally, we see this as a valuable experiment for Open Philanthropy. In the long run, we could imagine that the optimal structure for us is to focus on cause selection and incubation of new programs, regularly spinning out mature programs in order to give them more autonomy as we focus on our core competency of cause selection and resource allocation. We hope to learn about the costs and benefits of that approach from Just Impact’s experience.
We’re grateful for and proud of all the work Chloe and Jesse have done, and we believe criminal justice reform remains an important, valuable, and broadly underfunded cause. For donors interested in criminal justice reform in the United States, we think that the Just Impact team is a strong bet, and we hope Just Impact’s strong work will spark substantial commitments from other donors. We’re excited to see what Just Impact will be able to achieve in the coming years!
Open Philanthropy is expanding and we are recruiting a number of talented new hires to help us direct philanthropic funding in new-to-OP causes and to join the team that identifies new areas for grant-making. Our Global Health and Wellbeing team – which works to improve life through causes like global development, scientific research, and farm animal welfare – is ramping up its grantmaking. The GHW team directed more than $200M in grants in 2020 and we expect that number to rise substantially in the years to come.
We’re looking for two types of roles to help us direct billions of dollars of new giving over the coming years. First, we’re looking for experts who will lead Open Philanthropy’s giving in new cause areas we’ve identified as potential focus areas. We’re hiring two new Program Officers, in South Asian air quality and global aid policy. Each of these Program Officers will identify specific grants and grantees that we believe can beat our 1,000x social return on investment bar.1 You can see examples of our giving in other cause areas in our grants database We expect these positions to be filled by grantmakers who combine deep expertise in their area, strategic vision, and a quantitative mindset. We’re looking for people who already know many potential grantee organizations and can make reasoned and balanced arguments about why their approach is likely to clear our high bar for giving. We think finding the right grantmaker is a key ingredient to our potential impact in these causes, so we may not end up going into them if we can’t find the right people.
Second, we’re looking for talented generalists to help us find the next focus areas Open Philanthropy should fund in. We’re hiring Research Fellows and Strategy Fellows to join the Global Health and Wellbeing cause prioritization team and help identify future cause areas, grants, and grant-makers. Research Fellows will usually have graduate social science experience and comfort going deep on key academic papers to evaluate them and extract and synthesize the most accurate and actionable estimates. Strategy Fellows may come from a variety of backgrounds, but will combine comfort with back of the envelope calculations under uncertainty with the ability to engage potential grantees or others in the field to generate new insights.
Both Research Fellows and Strategy Fellows will work on new cause investigations – our goal is to identify new cause areas that are important, neglected, and tractable. We are entering South Asian air quality and global aid policy based on the work done by people currently in these roles: check out an example of our work in the South Asian air quality cause report.2 You can also find older examples of our cause investigation work here. Over time, we see lots of room for both roles to develop and expand within Open Philanthropy – becoming Program Officers, managing other cause prioritization team members, doing deeper research on strategic priorities, or building new functions within the organization.
Overall, we are looking for people who are passionate about doing the most good possible in a rigorous way. If you care greatly about improving lives around the world and enjoy thinking deeply, talking to experts or practitioners, and/or synthesizing information in a transparent and actionable way, we hope you’ll apply for a job with us. Please reach out to [email protected] if you have questions about any of these positions that aren’t answered by the job description pages or our “Working at Open Philanthropy” general Q&A.
Since 1900, the global economy has grown by about 3% each year, meaning that it doubles in size every 20–30 years. I’ve written a report assessing whether significantly faster growth might occur this century. Specifically, I ask whether growth could be ten times faster, with the global economy growing by 30% each year. This would mean it doubled in size every 2–3 years; I call this possibility ‘explosive growth’.
The report builds on the work of my colleague, David Roodman. Although recently growth has been fairly steady, in the distant past it was much slower. David developed a mathematical model for extrapolating this pattern into the future; after calibration to data for the last 12,000 years, the model predicts that the global economy will grow ever faster over time and that explosive growth is a couple of decades away! My report assesses David’s model, and compares it to other methods for extrapolating growth into the future.
At first glance, it might seem that explosive growth is implausible — that it is somehow absurd or economically naive. Contrary to this view, I offer three considerations from economic history and growth theory that suggest advanced AI could drive explosive growth. In brief:
The pace of growth has increased significantly over the course of history. Absent a deeper understanding of the mechanics driving growth, it would be strange to rule out future increases in growth.
One important mechanism that increased growth over long-run history is the ideas feedback loop: more ideas → more people → more ideas. Sufficiently advanced AI systems could increase growth further via an analogous feedback loop: more ideas → more AI systems → more ideas.
When you plug the assumption that AI systems can replace human workers into standard growth models (designed to explain growth since 1900), they often predict explosive growth.
These arguments don’t prove that advanced AI would drive explosive growth, but I think they show that it is a plausible scenario.
For AI to drive explosive growth, AI systems would have to be capable enough to replace human workers in most jobs, including cutting-edge scientific research, starting new businesses, and running and upgrading factories.
We think it’s plausible that sufficiently capable AI systems will be developed this century. My colleague Joe Carlsmith’sreport estimates the computational power needed to match the human brain. Based on this and other evidence, my colleague Ajeya Cotra’sdraftreport estimates when we’ll develop human-level AI; she finds we’re 80% likely to do so by 2100. In a previous report I took a different approach to the question, drawing on analogies between developing human-level AI and various historical technological developments. My central estimate was that there’s a ~20% probability of developing human-level AI by 2100. These probabilities are consistent with the predictions of AI practitioners. Grace et al. (2017). Note, the precise definition of ‘human-level AI’ in these different forecasting methodologies discussed in this paragraph is slightly different.
Overall, I place at least 10% probability on advanced AI driving explosive growth this century. Roughly speaking, this corresponds to > 30% probability that human-level AI is developed in time for growth to ramp up to 30% by 2100, and > 1/3 that explosive growth actually happens conditional upon human-level AI being developed.
The report also discusses reasons to think growth could slow; I place at least 25% probability on the global economy only doubling every ~50 years by 2100.
This research informs Open Phil’s thinking about what kinds of impact advanced AI systems might have on society, and when such systems might be developed. This is relevant to how much to prioritize risks from advanced AI relative to other focus areas, and also to prioritizing within this focus area.
We elicited a number of reviews of drafts of the report.
The structure of this blog post is as follows:
I clarify the question the report is answering: what would explosive growth actually look like? More.
I say more about why Open Phil is interested in the question of explosive growth. More.
I discuss the three reasons to think that explosive growth could occur this century. More.
I briefly discuss objections to explosive growth occurring this century. More.
Note, many issues discussed in the report are not included in this blog. For example, the reasons for thinking the pace of growth might slow this century, the apparent difficulty of finding a convincing theory implying 21st century growth will be exponential, discussion of many of the objections to explosive growth occurring this century, and a model I developed for extrapolating GWP into the future.
I’d recommend that readers with a background in these issues read the report instead.
Acknowledgements: My thanks to Holden Karnofsky for prompting this investigation; to Ajeya Cotra for extensive guidance and support throughout; to Ben Jones, Dietrich Vollrath, Paul Gaggl, and Chad Jones for helpful comments on the report; to Anton Korinek, Jakub Growiec, Phil Trammel, Ben Garfinkel, David Roodman, and Carl Shulman for reviewing drafts of the report in depth; to Harry Mallinson for reviewing code I wrote for this report and helpful discussion; to Joseph Carlsmith, Nick Beckstead, Alexander Berger, Peter Favaloro, Jacob Trefethen, Zachary Robinson, Luke Muehlhauser, and Luisa Rodriguez for valuable comments and suggestions; and to Eli Nathan for extensive help with citations and the website.
What exactly do you mean by ‘explosive growth’?
One popular measure of the size of the global economy is Gross World Product (GWP). It generalizes the country-specific notion of Gross Domestic Product (GDP) to apply to the whole world. GWP is equal to the global population multiplied by the average annual global income:
\( GWP=(number of people in the world)×(average income per person) \)
To double GWP you could, for example, double the number of people while keeping the average income the same. Or you could double the average income while keeping the number of people the same. In practice, GWP growth has historically involved increases in both the number of people and their incomes.
There’s another way to think about GWP. It also measures the total amount of stuff that the global economy produces each year, with each thing weighted by its value. This ‘stuff’ includes goods (food, clothes, books, gadgets) and services (Spotify, haircuts, doctors appointments, counseling, teaching). So to calculate GWP, just add up the value of all those goods and services bought during the year. Why are these two ways of defining GWP equivalent? GWP is complicated, but the rough idea is that all the money spent on goods and services ends up contributing to someone’s income.
With this way of thinking about it, there are again a few ways to double GWP. You could, for example, make twice as many things, holding the value of those things fixed (e.g. produce more food and more clothes each year). Or you could make the same number of things, but make them twice as valuable (e.g. make higher quality phones and laptops). Again, in practice GWP growth has involved both increases in the number of goods and services and increases in their quality.
So what would explosive growth look like? I define explosive growth as 30% GWP growth each year. As mentioned above, GWP currently doubles every 20–30 years; with explosive growth it would double every 2–3 years. Growth would be ten times faster. All the economic growth that currently happens in ten years worth of improvements to housing, medical care, computers, and software would all be crammed into just one year. Similarly, ten years’ worth of progress in physical sciences, engineering, life sciences and social sciences, agriculture, and manufacturing techniques, would on average happen every year. Certain sectors might see bigger increases in growth than others, as long as total growth is ten times faster. For example, fundamental physics might not grow any faster than today, but other sectors grow faster enough that overall economic growth is ten times faster than it is today.
Within ten years, we’d make as much technological and economic progress as has happened in the last 100 years.
This includes both developing new technologies and products and integrating them into the economy. For example, the process of electrification happened between 1880 and 1950 in the UK and the US. This involved moving from steam and water power to electricity in factories, and bringing electricity to individual households. If growth were ten times as fast, this process would have taken only 7 years.
Another example: the first commercial mobile phone was released in 1983; it cost about $10,000, a full charge took 10 hours, and it offered 30 minutes talk time. Today (2021) around 3.5 billion people use smartphones. See here. If growth were 10X faster, this change would have taken only 4 years.
Essentially, with explosive growth it would look as if technological change was happening much faster.
I should be clear that explosive growth would not necessarily be a good outcome; this depends on its effects on human welfare and the planet more generally.
Why are we interested in explosive growth?
Our interest in explosive growth mostly relates to one of our focus areas: potential risks from advanced AI. As part of this area, we want to know about the size and the timing of the impact of AI on society. The larger the impact, and the sooner the impact will be felt, the stronger the case for working on this focus area.
This report has relevance for both the size and the timing of the impact from advanced AI.
The relevance to size is clear. I conclude that an AI-driven growth explosion is a plausible scenario; so the size of AI’s impact could be very large indeed.
The relevance to timing is slightly more complex.
In her draft report, my colleague Ajeya Cotra uses the phrase ‘transformative AI’ (TAI) to mean ‘AI which drives Gross World Product (GWP) to grow at ~20–30% per year’. She estimates a high probability of TAI by 2100 (~80%), and a substantial probability of TAI by 2050 (~50%).
Intuitively speaking, these are very high probabilities to assign to an ‘extraordinary claim’. Are there strong reasons to dismiss these estimates as too high? One possibility is economic forecasting. If economic extrapolations gave us strong reasons to think GWP will grow at ~3% a year until 2100, this would rule out explosive growth and so rule out TAI being developed this century.
My report suggests economic considerations of this kind don’t provide a good reason to dismiss the possibility of TAI being developed in this century. In fact, there is a plausible economic perspective from which sufficiently advanced AI systems are expected to cause explosive growth. Explaining this perspective is the focus of the next section.
Why think explosive growth could occur this century?
I discuss three reasons to take the prospect of explosive growth seriously.
Growth has become much faster
The global economy, measured as GWP, currently doubles in size roughly every 30 years. But it used to grow much more slowly. Estimates suggest that, ten thousand years ago, GWP took around 3000 years to double. See, for example, data series from De Long (1998), McEvedy and Jones (1978), and Roodman (2020). The data on global population and living standards this early is highly uncertain. What is clear, however, is that growth was significantly slower at this time.
At earlier times, growth was even slower. This means that growth has perhaps accelerated 100X over the course of human history.
If growth has already become 100X faster, perhaps it will become another 10X faster in the future.
It is tempting to dismiss this possibility, and say that we cannot imagine anything that could drive such fast growth. But we should be wary of making arguments that would have led us astray in the past. Could hypothetical economists a thousand years ago, who saw the economy grow at a snail’s pace, have imagined how the processes of industrialization and technological innovation would allow the modern economy to double every 30 years? Probably not. We may be in a similar situation, unaware of mechanisms that could lead to growth becoming significantly faster.
One counter-argument is that growth of the richest countries’ economies hasn’t become faster since 1900, and in fact seems to have been slowing down over the last 20 years. This suggests that growth has reached its peak, and is now declining.
This argument has some appeal. However, 120 years of constant or slowing growth isn’t enough to confidently rule out growth increasing again in the future. For example, growth seems to have increased in the period 10,000 BCE to 1 CE, but then slowed over the next 1000 years (perhaps relating to the decline of the Roman Empire). But after this period growth picked up again, and eventually became much faster.
This first argument appeals to our humility. If we don’t understand why growth has become so much faster over long-run history, we should be open to it becoming faster still in the future. The next two arguments refer to specific theories of growth, arguing that they suggest that advanced AI could drive explosive growth.
A good explanation of long-run growth patterns implies advanced AI could drive explosive growth
The very long-run history of growth is roughly as follows. Until around 1700, GWP growth was very slow; it took hundreds of years for the economy to double in size. By 1900, growth was much faster, with the economy doubling every 20–30 years. Since then, growth has stayed pretty constant. See the following graph: Notice the y-axis is on a log-scale, and I have spaced the x-axis unevenly to show data going back to 10,000 BCE. The data is taken from the GWP series in Roodman (2020). The growth rates are calculated by assuming constant exponential growth between each pair of data points.
(Note: the graph seems to show growth increasing fairly smoothly over the last 10,000 years. However, the data is very uncertain, and it is possible that growth was roughly constant in the period 5,000 BCE to 1500 CE.)
Ten thousand years ago there was a relatively small human population that was very poor. Some of those people came up with ideas that allowed the population to grow in size. For example, one idea might be a new farming technique that allows you to feed a larger population; another might be a custom that reduces the chance of becoming unwell. To be concrete, let’s imagine that every 100 new ideas allows you to double the size of the population. Initially, the small population takes a long time — 3000 years — to come up with the 100 ideas needed to double the size of the population. But as the population increases, there are more people coming up with ideas, so it takes less time for the whole group to accumulate 100 new ideas. After a while there are enough people that it only takes 1000 years to come up with another 100 ideas, and so the population doubles in size more quickly. Later there’s even more people, and so it only takes 300 years to come up with the ideas needed to double the size of the population again. And so on.
So there is a feedback loop: more ideas → better farming techniques (or other innovations) → more food → more people → more ideas → … that leads growth to speed up over time. And if the population grows more and more quickly over time, so does GWP.
Let’s call this dynamic the ideas feedback loop. Its essence is more ideas → more people → more ideas. Idea-based theories of long-run growth claim that the ideas feedback loop caused growth to speed up over the last 10,000 years.
These theories are confirmed by the pattern of growth over the last 150 years.
Since around 1880, people in the richer countries have chosen to have fewer children even as they become wealthier. Since that point, “more ideas” have not led to “more people”, but instead to richer people: more ideas → richer people→ more ideas. So the ideas feedback loop was broken.
Idea-based theories claim that the ideas feedback loop caused growth to speed up throughout history. The feedback loop was broken in ~1880, so these theories expect growth to stop increasing shortly after this time. Indeed, this is what happened. Since 1900, growth has been roughly constant, More precisely, growth of GDP/capita in countries on the economic frontier has been pretty constant since 1900. Growth of GWP actually increased from 1900 to 1960 due to increasing amounts of catch-up growth.
as idea-based theories would expect. Even though population has been increasing since 1900, idea-based theories do not imply that growth should have increased over this period. These theories can incorporate diminishing returns to efforts to generate more ideas, so that an exponentially growing population generates constant growth.
New ideas have caused GWP to increase in this period by increasing people’s wealth, but GWP growth itself has not increased as the ideas feedback loop is broken.
So idea-based theories provide a plausible explanation for why growth increased historically, and why it is now roughly constant.
Idea-based theories imply that advanced AI could cause growth to increase again.
Imagine if AIs could generate new ideas just as well as humans. They could come up with better computer designs (better hardware), and more efficient ways of running AIs on those computers (better software). As a result, more AIs could run on each computer. In addition, the AIs’ ideas could create wealth that is invested into creating more computers on which to run AIs.
The feedback loop would become: more ideas → better software, better hardware, and more wealth → more AI systems → more ideas →… The essence of this is another ideas feedback loop: more ideas → more AI systems → more ideas. This is closely analogous to the ideas feedback discussed above: more ideas → more people → more ideas. Before, the ideas feedback loop led to growth speeding up over time. It is natural to expect the same thing to happen in the case of AI.
To recap, the ideas feedback loop caused growth to speed up until ~1880, when it was broken. Since then, growth has been roughly constant. But advanced AI could cause the ideas feedback loop to apply once more. If this happens, growth should start speeding up again.
Ideas feedback loop?
Pattern of growth
Yes: more ideas → more people → more ideas
GWP growth becomes faster over time
1880 – present
No: more ideas → richer people → more ideas
GWP grows at a constant rate
If human-level AI is developed
Yes: more ideas → more AI systems → more ideas
GWP growth becomes faster over time
For this new ideas feedback loop to occur, AI would have to be capable enough to replace humans in a very wide-range of tasks relating to the discovery and implementation of new ideas. Examples include running start-ups, doing cutting edge scientific research, and making factories more efficient. If there are tasks essential to the discovery and implementation of new ideas that require humans, then humans may end up bottlenecking the growth process. With plausible parameter values (including the diminishing returns to idea discovery), these models predict explosive growth if you add in the assumption that AI can replace humans in all jobs. For interim cases, where AI can automate some but not all jobs, the growth outcome depends on the degree of diminishing returns to idea discovery and the importance of automated jobs to idea discovery and to economic output.
It implies that growth increased continuously over hundreds and thousands of years as the feedback loop gradually gathered momentum. However, most papers on long-run growth emphasize a different story, in which a structural change around the time of the industrial revolution causes a one-off increase in growth. For example, Hansen and Prescott (2002) discuss a model in which a phase transition in the methods of production increases growth. Initially the economy faces diminishing returns to labor due to the fixed factor land. But once the level of technology is high enough, it becomes profitable for firms to use less land-intensive production processes; this phase transition increases growth. Other examples of theories featuring one-off structural changes around the industrial revolution include Goodfriend and McDermott (1995), Lucas (1998), Stokey (2001), Tamura (2002) and Hanson (2000).
The mechanisms in these papers have a lesser tendency to suggest that advanced AI would increase growth.
The pre-modern data points are highly uncertain, so it’s hard to use them to assess the importance of the ideas feedback loop relative to other mechanisms. Kremer (1993) provides some additional evidence for the ideas feedback loop. Kremer looks at the development of five isolated regions and finds that the technology levels of the regions in 1500 are perfectly rank-correlated with their initial populations in 10,000 BCE. This is just what the ideas feedback loop mechanism would predict.
The report discusses the evidence that we do have in more detail. Overall, I think that while significant structural changes did happen around the industrial revolution, the ideas feedback loop played an important role in causing growth to accelerate over the last 10,000 years. For this reason, I take idea-based theories seriously, including their implication that sufficiently advanced AI would drive explosive growth.
When you plug the assumption that AI systems can replace human workers into standard growth models, they often predict explosive growth
Imagine if I built 1 billion laptops of the highest quality, and gave them to the human workers whose work productivity would most benefit from a new laptop. The laptops would make the workers more productive, raising GWP. Then suppose I made another 1 billion laptops, again distributing them to people whose productivity would benefit the most. This would still raise productivity somewhat, as some people could still be made more productive with new laptops. But by the time I’ve already made 10 billion laptops, the next billion will make little difference to global productivity. This is because the laptops need human workers in order to be useful, and there’s a limited supply of human workers. This puts a limit to how much I can boost GWP just by making laptops. The same is true of many other physical machines and gadgets I might make to boost productivity.
But now suppose that, as well as making laptops, I can also make AI systems that can use a laptop to do any work that a human could do with it. In this scenario, there would not be a limited supply of workers. So there may be a much higher limit on how much I can boost GWP by making more laptops and making more AI systems together. In this scenario, the returns to creating more physical machines (in this case AI systems and laptops) are much higher because human workers have been removed as a bottleneck.
Economic growth models used to explain growth since 1900 reinforce this point. They typically show diminishing returns to physical equipment and machines (‘capital‘), holding the number of human workers fixed. This means that each new machine adds less and less value to the economy. These diminishing returns limit the amount of growth you can have solely by producing more physical machines.
But when you plug into these models the assumption that AI can replace human workers in a very wide range of tasks, In many models, the specific assumption required is that the elasticity of substitution between capital and labor rises above 1.
The diminishing returns to physical machines can disappear, as machines can now play the role previously played by human workers. Many models predict that explosive growth occurs in this scenario. If AIs allow us to automate all jobs, both in goods and services production and in idea discovery, these models predict explosive growth for realistic parameter values. If we can only automate goods and services production (but not idea discovery), or only idea discovery (but not goods and services), then these models predict explosive growth for some realistic parameter values but not for others. In interim cases, where we can automate only a subset of the tasks in goods and services production and idea discovery, then the pace of growth depends on the parameter values and on how quickly new tasks become automatable.
To put things another way, consider the following feedback loop: machines produce goods and services, increasing our wealth; we invest some of this wealth into making more machines; we now have even more machines with which to create even more wealth; and so on. In short: more machines → more wealth → more machines →…. At the moment, this feedback loop is fairly weak because there are diminishing returns to additional machines. But if the machines included AI systems that can replace human workers (a key assumption), then this feedback loop becomes more powerful and can drive explosive growth.
Recap of reasons to expect explosive growth
So we’ve seen three arguments to think explosive growth is plausible:
The pace of growth has increased significantly over the course of history. Absent a deeper understanding of the mechanics driving growth, it would be strange to rule out future increases in growth.
Idea-based theories claim that growth increased in the past due to an ideas feedback loop: more ideas → more people → more ideas. Sufficiently advanced AI systems could drive explosive growth via an analogous feedback loop: more ideas → more AI systems → more ideas.
When you plug the assumption that AI systems can replace human workers into standard growth models (designed to explain the last 100 years of growth), they often predict explosive growth.
The first argument will be more convincing if you’re skeptical of the specific models of growth underlying the second and third arguments.
The second and third arguments unite in pointing to advanced AI as a possible cause of faster growth. In fact, they are closely linked. Both stem from the fact that when you change standard growth models by introducing the strong assumption that AIs can replace human workers in wide-ranging tasks, these models typically predict that growth will accelerate. Though the mathematical models underlying the second and third arguments are partially overlapping, the arguments differ in their emphasis. The second argument emphasizes very long-run data in which growth has in fact increased over time; we then draw an analogy between the mechanisms driving the growth increase and the mechanisms of advanced AI. To doubt this second argument, you can either doubt its explanation of why growth increased in the past, or the analogy with AI. The third argument emphasizes models based on recent data in which growth is constant. It augments these models with an extreme assumption (machines can replace human workers) and notes that the models then predict explosive growth. To doubt this third argument, you can either question the models’ explanation of the recent data, or whether the models’ predictions can be trusted when such an extreme assumption has been made.
How likely is explosive growth to actually happen?
Predicting the pattern of long-run growth is inevitably speculative. Though I think the above arguments are suggestive, they do not prove that advanced AI would cause explosive growth. There are many possible reasons for skepticism:
Perhaps some unanticipated bottleneck will slow down growth.
For example, economic growth might require extracting and transporting raw materials (e.g. to make new computers). If this process can’t be sped up beyond a certain point, this could bottleneck growth.
Alternatively, growth might require conducting experiments to make scientific and technological progress. If these experiments take a long time, this could bottleneck growth.
I view this as one of the strongest objections to explosive growth occurring.
Perhaps we will choose to grow slowly and sustainably, even if AI gives us the ability to grow much faster.
There is evidence that ideas are becoming harder to find. If this trend continues, perhaps it will prevent AIs finding ideas quickly enough to drive explosive growth? The models I have seen predict that, if AI systems can replace human workers, there will be explosive growth despite ideas becoming harder to find. However, if this effect becomes significantly more pronounced, it could prevent explosive growth.
Perhaps there will be some essential tasks that advanced AI never automates, and these will bottleneck the growth process.
Perhaps there are fundamental limits to how good our technology can become, and we will approach these limits before explosive growth occurs.
Perhaps the accumulation of physical or human capital has been the most important driver of historical growth, and advanced AI will not significantly accelerate this process. I argue in favor of idea-based theories over these accumulation theories in the full report, but still assign some weight to accumulation theories (10–20%). Though some accumulation theories imply that advanced AI could drive explosive growth (if they can replace human workers), some do not straightforwardly imply this (e.g. it would depend on how advanced AI affects the rate of human capital accumulation, or how quickly AIs accumulate their own equivalent to human capital).
Perhaps our understanding of the determinants of growth is very poor, and the true determinants simply will not lead to explosive growth regardless of the AI systems we develop.
I discuss these reasons for skepticism, and many others, in the full report. I find some of them partially convincing, and they reduce the probability I assign to explosive growth. However, I don’t think they justify ruling out explosive growth. I personally assign at least 10% probability to explosive growth occurring by 2100.Roughly speaking, this corresponds to > 30% probability that human-level AI is developed in time for growth to ramp up to 30% by 2100, and > 1/3 that explosive growth actually happens conditional upon human-level AI being developed. My central estimate of the probability of explosive growth is not stable, but it is currently around 25% — see Appendix G of the report.
The full report also discusses a contrasting possibility, that growth stagnates. I find this scenario to be highly plausible, and assign it at least 25% probability. Again, my central estimate is not stable, but it is currently around 40%.
There’s evidence that technological progress is becoming increasingly difficult. According to one plausible theory, we’ve only maintained steady growth since 1900 by increasing the number of researchers exponentially over time. But population projections suggest this exponential increase cannot be sustained (assuming we don’t develop AI systems to do the research for us). In this case, growth in living standards will slow.
Thus my view is that the possibilities for long-run growth are wide open. Both explosive growth and stagnation are plausible.