• Partner With Us
  • Focus Areas
    • Cause Selection
    • Global Health & Wellbeing
      • Abundance & Growth
      • Effective Giving & Careers
      • Farm Animal Welfare
      • Global Aid Policy
      • Global Health & Development
      • Global Health R&D
      • Global Public Health Policy
      • Scientific Research
    • Global Catastrophic Risks
      • Biosecurity & Pandemic Preparedness
      • Forecasting
      • Global Catastrophic Risks Capacity Building
      • Potential Risks from Advanced AI
    • Other Areas
      • History of Philanthropy
  • Grants
  • Research & Updates
    • Blog Posts
    • In the News
    • Research Reports
    • Notable Lessons
  • About Us
    • Grantmaking Process
    • How to Apply for Funding
    • Careers
    • Team
    • Operating Values
    • Stay Updated
    • Contact Us
  • Partner With Us
  • Focus Areas
    • Cause Selection
    • Global Health & Wellbeing
      • Abundance & Growth
      • Effective Giving & Careers
      • Farm Animal Welfare
      • Global Aid Policy
      • Global Health & Development
      • Global Health R&D
      • Global Public Health Policy
      • Scientific Research
    • Global Catastrophic Risks
      • Biosecurity & Pandemic Preparedness
      • Forecasting
      • Global Catastrophic Risks Capacity Building
      • Potential Risks from Advanced AI
    • Other Areas
      • History of Philanthropy
  • Grants
  • Research & Updates
    • Blog Posts
    • In the News
    • Research Reports
    • Notable Lessons
  • About Us
    • Grantmaking Process
    • How to Apply for Funding
    • Careers
    • Team
    • Operating Values
    • Stay Updated
    • Contact Us

[Closed] RFP on studying and forecasting the real-world impacts of systems built from LLMs

Table of contents

1. Example projects

2. Expression of interest process and other logistics

3. Acknowledgements

Update, 5/3/24: This RFP has closed after funding $2 million in grants, including a forthcoming experts panel forecasting the capabilities and economic impacts of AI.

 

There is no expert consensus about what systems built from LLMs will and won’t be capable of in the next few years. We think it’s important to change this, because the right approach to take to policy and safety depends crucially on what real-world impacts these systems could have in the near term.

To this end, in addition to our request for proposals to create benchmarks for LLM agents, we are also seeking proposals for a wide variety of research projects which might shed light on what real-world impacts LLM systems could have over the next few years. 

Anyone is eligible to submit a form, including those working in academia, nonprofits, or independently; we are also open to restricted grants to projects housed within for-profit companies.[1]We occasionally make restricted grants to research projects conducted within for-profit companies, but it is legally and logistically more challenging to make grants to for-profit organizations, and logistics processing may be delayed for such grants. If applicable, we would include funding for LLM API credits and other forms of compute.

We encourage you to consider the LLM agent benchmarks RFP instead of this one if you have a project idea that is an appropriate fit for it. This RFP is considerably broader than the other one — projects could span a wide range of fields and methodologies — but we also expect that proposals will vary more widely in how effectively they advance Open Philanthropy’s priorities in this space. As such, we are more likely to reject proposals coming through this RFP, and may take more time to investigate the proposals that we do fund. With that said, we can certainly imagine a number of highly impactful research projects in this space which don’t fit the mold of an LLM agent benchmark; we brainstorm some ideas below.

 

1. Example projects

Below are some examples of project ideas that could make for a strong proposal to this RFP, depending on details:

  • Conducting randomized controlled trials to measure the extent to which access to LLM products can increase human productivity on real-world tasks. For example: 
    • GitHub released a study in mid-2022 which found that having access to GitHub CoPilot halved the time that programmers needed to write an HTTP server in JavaScript (from ~2 hours to ~1 hour).
    • Fabrizio Dell’Acqua and others from Harvard Business school released a working paper in Sep 2023 which found that consultants with access to GPT-4 completed many tasks significantly more quickly and at a higher level of quality, but completed some tasks at a lower level of quality, compared to a control group of consultants working on their own.
  • Polling members of the public about whether and how much they use LLM products, what tasks they use them for, and how useful they find them to be. We have seen informal surveys from e.g. Business.com and FishBowl, but so far haven’t seen rigorous polls with random sampling. We would be especially interested in user surveys that conduct deeper interviewers about the types of tasks LLM products are helpful and unhelpful with. We’d also be interested in surveys targeted at understanding certain important use cases, e.g. the use of LLM agents to automate software development or AI research. 
  • In-depth interviews with people working on deploying LLM agents in the real world. There are multiple relevant parts of the AI value chain here, including the product teams of AI labs, organizations helping companies integrate AI in their workflows, VCs that focus on AI, and companies using AI. This ecosystem should contain a wealth of knowledge about the use cases, productivity benefits, and limitations of LLM agents. 
  • Collecting “in the wild” case studies of LLM use, for example by scraping Reddit (e.g. r/chatGPT), asking people to submit case studies to a dedicated database, or even partnering with a company to systematically collect examples from consenting customers. While there are a lot of individual case studies on the internet, we are not aware of existing work that collects and analyzes them. Even though they will not  constitute a representative sample, seeing thousands of case studies of people attempting to LLMs in the course of real jobs could be helpful for understanding qualitative patterns of language model strengths and weaknesses.
  • Estimating and collecting key numbers into one convenient place to support analysis. For example, HELM evaluates a wide variety of language models on a wide variety of existing benchmarks, and Papers with Code also provides a similar reference. Epoch similarly estimates or collects numbers such as hardware price performance, spending on large training runs, parameter count and FLOP/s of notable ML models, etc. We would be interested in similar data estimation and collection efforts for: 
    • Key AI-specific economic indicators, such as revenues of LLM products, valuations of LLM-exposed companies, number of users of LLM products, etc., as well as key statistics about the AI supply chain – R&D and CapEx spending throughout the supply chain, AI accelerator production and aggregation, data center construction, etc. 
    • An analysis of and collection of broader economic indicators that would capture the effects of and anticipation of effects from AI on the broader economy, such as real interest rates.
    • Performance on LLM agent benchmarks such as ARA, MLAgentBench, or forthcoming projects funded by our LLM agent benchmarks RFP. 
  • Creating interactive experiences that allow people to directly make and test their guesses about what LLMs can do,[2]As a side effect, this can create datasets of interactions that researchers can later analyze. such as Cameron Jones’ Turing Test game, Nicholas Carlini’s forecasting game,[3]Nicholas is open to sharing code for this game with someone who would extend and maintain it; you can indicate in your EOI whether you would be interested in that. and Joel Eriksson’s Parrot Chess — or enable people to more concretely understand AI progress as models grow in scale, such as Sage + FAR AI’s comparative demos.
  • Eliciting expert forecasts about what LLM systems are likely to be able to do in the near future and what risks they might pose, either via a survey such as AI Impacts’ 2022 survey or via a forecasting competition such as the Existential Risk Persuasion Tournament by the Forecasting Research Institute. We are especially interested in conditional forecasts, which ask about real-world impacts and risks conditional on certain benchmark performance.
  • Synthesizing, summarizing, and analyzing the various existing lines of evidence about what language model systems can and can’t do at present (including benchmark evaluations, deployed commercial uses, and qualitative case studies, etc) and what they might be able to do soon (extrapolations of scaling behavior, market projections, expert surveys, etc.) to arrive at an overall judgment about what LLM systems are likely to be able to do in the near term. There are existing overviews of the AI field, such as the AI 100 report or market reports like this from McKinsey, as well as occasional news articles like this recent one from TIME. We would be most excited about a systematic, frequently-updated qualitative overview which is narrowly focused on the capabilities of systems built out of LLMs (and multi-modal LLMs). For example, this article in Asterisk Magazine by forecaster Jonathan Mann reaches a bottom line conclusion on the likelihood of LLM-based systems replacing tech jobs.

The motivation section of the benchmark RFP goes into more detail on why we are interested in better understanding and forecasting the capabilities of LLM systems, with a special focus on autonomous agents built from LLMs. We strongly encourage you to read that section (and generally review the text of the other RFP) to maximize your chances of submitting a strong proposal.

 

2. Expression of interest process and other logistics

This RFP is currently closed, and we aren’t accepting new expressions of interest. If we re-open it, we will update this page with application materials.

 

3. Acknowledgements

This RFP text was largely drafted by Ajeya Cotra, in collaboration with Isabel Juniewicz and Tom Davidson; Javier Prieto also contributed to the initiative by formulating ideas and investigating potential grants. We’d like to thank Ezra Karger, Jonathan Mann, Helen Toner, and others for discussions that helped shape this RFP.[4]Names are listed alphabetically by last name.

Footnotes[+]Footnotes[−]

Footnotes
1 We occasionally make restricted grants to research projects conducted within for-profit companies, but it is legally and logistically more challenging to make grants to for-profit organizations, and logistics processing may be delayed for such grants.
2 As a side effect, this can create datasets of interactions that researchers can later analyze.
3 Nicholas is open to sharing code for this game with someone who would extend and maintain it; you can indicate in your EOI whether you would be interested in that.
4 Names are listed alphabetically by last name.
Open Philanthropy
Open Philanthropy
  • We’re Hiring!
  • Press Kit
  • Governance
  • Privacy Policy
  • Stay Updated
Mailing Address
Open Philanthropy
182 Howard Street #225
San Francisco, CA 94105
Email
info@openphilanthropy.org
Media Inquiries
media@openphilanthropy.org
Anonymous Feedback
Feedback Form

© Open Philanthropy 2025 Except where otherwise noted, this work is licensed under a Creative Commons Attribution-Noncommercial 4.0 International License.

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT