University of Pennsylvania — Philip Tetlock on Forecasting

University of Pennsylvania staff reviewed this page prior to publication.

The Open Philanthropy Project recommended a grant of $500,000 to Professor Philip Tetlock of the University of Pennsylvania to support his work on forecasting.

It is our impression that Professor Tetlock is among the leading researchers working on improving methods used for forecasting. Several Open Philanthropy Project staff members follow his research on expert judgment and geopolitical forecasting and, based on this familiarity with his work, we believe that he has a compelling track record in this area.

Professor Tetlock wrapped up a previous project, the Good Judgment Project, in the middle of last year. Subsequently, we learned that he was seeking funding for two new projects, currently called “Adversarial Collaboration Tournaments” and the “Alpha Pundit Challenge.” Professor Tetlock believes that these projects might contribute to making forecasting a more prominent component of policy discussions and debates; he describes the projects’ goal as depolarizing unnecessarily polarized debates.

1. Background

1.1 The cause

This grant does not fall within any of our previously defined focus areas. Although we are in the early stages of investigating the social sciences as a potential cause category, at present we consider this to be a one-off grant.

1.2 About Philip Tetlock

Professor Tetlock has extensive experience running and evaluating political forecasting tournaments. Beginning in the mid-1980s and continuing through the early 2000s, he organized forecasting tournaments for experts drawn from many different areas of expertise. His analysis of the performance of these experts found that they did not consistently make accurate predictions. Their results were frequently only slightly better than what one would expect from random chance and were often worse than the performance of basic extrapolation algorithms.¹

It is our understanding that, after Professor Tetlock published these results, figures in the U.S. intelligence community learned of his research and were alarmed by its implications. In order to improve its forecasting, Intelligence Advanced Research Projects Activity (IARPA), the intelligence equivalent of the better known DARPA, created the Aggregative Contingent Estimation (ACE) program, which ran geopolitical forecasting tournaments. Professor Tetlock entered the ACE tournament with a team known as the Good Judgment Project (GJP). GJP easily outperformed all other tournament entrants, including U.S. intelligence analysts with access to classified data.²

1.3 About the Good Judgment Project

GJP’s method was relatively simple – aggregating predictions from many sources – but its process was also adaptive. During the course of the tournaments GJP ran randomized controlled trials on its own program and used the results to improve its forecasting model in future years. The principal findings included:

Creating teams of forecasters that share information among themselves and collaborate improves the accuracy of their predictions relative to simply aggregating the predictions of the same people.³
Providing forecasters with one hour of training on topics such as base rates, belief updating, and mathematical modeling improves accuracy.⁴
Forecasting contains a component of skill, and is not based entirely on luck. Some forecasters were consistently more accurate and did not regress to the mean as would be expected if their success were due only to chance.⁵
When grouped together into teams, the most accurate 2% of predictors, dubbed “superforecasters,” outperformed teams of other predictors by a greater margin than any had prior to being placed on a team with other top performers. One news article claims that predictions made by teams of superforecasters were 30% more accurate than predictions made by professional intelligence analysts with access to classified information.⁶

After the conclusion of the tournament, Professor Tetlock wrote a book, Superforecasting: The Art and Science of Prediction, about lessons learned from GJP.⁷Subsequently, GJP was changed into a for-profit entity known as Good Judgment, Inc.⁸

2. About the grant

2.1 Proposed activities

This funding is structured to be unrestricted within Professor Tetlock’s work on forecasting. At this stage, the main potential uses we have discussed with him revolve around two projects that he characterized to us as “public goods” projects. Both proposed projects aim to depolarize unnecessarily polarized debates.

The first project, “Adversarial Collaboration Tournaments,” would involve experts on both sides of a polarized issue (e.g., the Patient Protection and Affordable Care Act) proposing and answering forecasting-style questions about that issue. The second project, the “Alpha Pundit Challenge,” would systematically convert vague predictions made by prominent pundits into explicit numerical forecasts. Both of these projects are in the early stages of planning, so the details have not been worked out, but they share the goal of encouraging public figures with strong positions on important issues to convert those positions into concrete forecasts.

We expect that the funds provided by our grant will be used for a variety of activities that could be described as pilot projects for the two projects described above. If these early projects go well, we may consider funding a larger and more public effort at a later date.

Professor Tetlock expects to have limited time and attention available to spend on these projects. During our conversations with him, he expressed interest in hiring superforecasters for day-to-day roles relating to these projects.

2.2 Case for the grant

We see this as a fairly speculative grant; our intention is to support someone we believe has done very good work on a potentially important topic, by providing sufficient funding to allow him some freedom to plan and develop new projects.

When describing the motivation behind his “public goods” projects, Professor Tetlock has pointed to the extremely polarized rhetoric that is often used to discuss political issues. He notes that the positions of people on both sides of a given issue usually rely on implicit predictions about the future, but that such predictions are usually phrased so vaguely that it is difficult to say, even in retrospect, whether they were correct.⁹

As we understand it, Professor Tetlock believes that if making explicit predictions were a more expected part of publicly defending a strong position, pundits with polarized views would both (a) be held accountable to the implications of their views, and (b) have more incentive to listen to others with opposite opinions, in the interest of improving their own forecasts. This underlies the design of the two projects described above.

We share Professor Tetlock’s intuition that making forecasting a more normalized part of mainstream political discourse would be valuable, and find the proposed mechanism of the projects plausible (though we are not confident they will be successful).

The basic case for this grant, as we see it, is that:

Professor Tetlock has a very strong record of studying and improving forecasting techniques, and has garnered a reasonable amount of public attention for this work.
We believe that increasing the use of forecasting in public debate on policy questions (whether by increasing the amount that public figures make forecasts themselves, making it more common to refer to forecasts by others, or something else) would very likely be a positive development, potentially improving the quality of both discourse and decision-making.
Based on conversations with Professor Tetlock, our impression is that he would now like to develop projects specifically targeting this ‘public discourse’ angle, and that having funds available would make it easier for him to make progress with those projects.

As an alternate framing, we think it also makes sense to describe our interest in this grant in terms of the considerations we rely on when prioritizing between cause areas. Our current (informal and preliminary) take is that this grant is supporting work in an area that is important (in that it could affect many very important decisions if successful), neglected (in that Professor Tetlock, who stands out in the field for his past work, does not appear to be “fully funded” in our estimation), and tractable (in that Professor Tetlock’s work over the last several decades seems to us to have generated substantive insights about how to improve forecasting, and to have been somewhat successful in attracting public attention).

This grant is primarily about open-ended support of someone we believe has done impressive work in the past.

2.3 Risks and reservations

We see this grant as speculative by nature, and accordingly believe there is a reasonable chance that it does not have much impact; this could be either if the projects themselves do not amount to much, or if, in the absence of our grant, other funding sources would have been found.

One salient risk is that Professor Tetlock has other obligations and does not expect to personally be able to dedicate much time or attention to these projects. Another point that strikes us as relevant is that we are unaware of anyone involved with the projects with extensive experience running a large public-facing project (which these might eventually become), though Professor Tetlock does have experience fielding publicity for his work with the Good Judgment Project and his books, Expert Political Judgment and Superforecasting.¹⁰

2.4 Room for more funding

Prior to this grant from the Open Philanthropy Project, Professor Tetlock secured $200,000 from another funder and he has been in discussions with other potential sources of funding. Per the previous section, we do believe it is possible that our grant will simply displace funds that would have come from other sources. However, our impression based on conversations with Professor Tetlock is that our funding will allow him to develop the projects more quickly and potentially more ambitiously than might otherwise have been possible.

3. Plans for follow-up

3.1 Goals and expectations for this grant

We do not have settled expectations for this grant. We expect that there is a large range of possible outcomes that we would be happy with, and that would be hard to predict in advance.

We think that it is reasonable to expect that this grant enables Professor Tetlock’s future projects to launch, and perhaps to become more ambitious than they otherwise would have been. We would be pleased if these projects improve our understanding of forecasting and prediction-making in general, or if they provide a platform for Professor Tetlock to bring his existing findings to a more mainstream audience.

An example of a very positive outcome, from our perspective, would be if this grant contributes to a shift towards a world in which comparing policy claims to the best available forecasts becomes a standard component of evaluating such claims. We consider it very unlikely that this will be a direct outcome of this grant, but we would be excited if the grant laid some groundwork for further work in this direction.

An example of the kind of impact we would consider a moderate success for this type of effort in the longer term is the effect that Nate Silver’s statistical modeling at FiveThirtyEight¹¹ has had on the analysis of presidential elections. We would not characterize this impact as revolutionary, but we believe that it has been notable, and that his analyses and models are well-known and respected among people who think seriously about elections.

3.2 Internal forecasts

We’re experimenting with recording explicit numerical forecasts of events related to our decisionmaking (especially grantmaking), in part in connection with our interest in the topic of this grant. The idea behind this is to pull out the implicit predictions that are playing a role in our decisions, and make it possible for us to look back on how well-calibrated and accurate those are. For this grant, we are recording the following forecast:

The Alpha Pundit Challenge, or something like it, will have converted five or more vague predictions from pundits into numerical predictions, beyond those described in Tetlock, Alpha Pundit Challenge Proposal, by December 31, 2016: 50%

3.3 Key questions for follow up

Questions we anticipate asking Professor Tetlock as part of our follow-up on this grant include:

How was the funding spent?
What progress has been made on Adversarial Collaboration Tournaments, the Alpha Pundit Challenge, and any other relevant projects?
Did interesting or high profile people become involved with either project?
Was he able to attract significant public attention to either project?

3.4 Follow up expectations

We plan to speak to Professor Tetlock about these projects roughly every six months. We anticipate producing a written update on this grant after one or two years. If we decide at some point to make a follow up grant, we expect that we would produce a written update at that time.

4. Our process

Professor Tetlock posted on Twitter in December 2015 about the Alpha Pundit Challenge as a means of gauging interest for this project.¹² In response, we contacted him to inquire about his funding situation. We discussed several possible projects with him, including the Alpha Pundit Challenge and the Adversarial Collaboration Tournaments. After considering several options and combinations of options, we decided to provide him with $500,000 in unrestricted funding.

Other options we considered included:

Funding the operation costs of these projects during a pilot-year.
Organizing and funding a small planning convening to refine Professor Tetlock’s proposal.
Running a large attention-generating conference.

5. Sources

DOCUMENT	SOURCE
FiveThirtyEight Website	Source (archive)
Good Judgment Project, Six Lessons about Crowd Prediction	Source (archive)
Good Judgment website	Source (archive)
Philip E. Tetlock, Publications	Source (archive)
Spiegel 2014	Source (archive)
Superforecasting Website, In the Media	Source (archive)
Tetlock 2005	Source (archive)
Tetlock and Gardner 2015	Source (archive)
Tetlock and Scoblic 2015	Source
Tetlock, Alpha Pundit Challenge Proposal	Source

Expand Footnotes Collapse Footnotes

1. See Tetlock 2005 for full descriptions of these tournaments and results. Summary from Philip E. Tetlock, Publications:

“Defining and Assessing Good Judgment

My 2005 book, Expert Political Judgment: How Good Is It? How Can We Know?, traces the evolution of this project. It reports a series of relatively small scale forecasting tournaments that I started in 1984 and wound down by 2003. A total of 284 experts participated as forecasters at various points. They came from a variety of backgrounds, including government officials, professors, journalists, and others, and subscribed to [a] variety of political-economic philosophies, from Marxists to libertarians.

Cumulatively they made 28,000 predictions bearing on a diverse array of geopolitical and economic outcomes.

The results were sobering. One widely reported finding was that forecasters were often only slightly more accurate than chance, and usually lost to simple extrapolation algorithms. Also, forecasters with the biggest news media profiles tended to lose to their lower profile colleagues, suggesting a rather perverse inverse relationship between fame and accuracy.”

2. “In year 1, GJP beat the official control group by 60%. In year 2, we beat the control group by 78%. GJP also beat its university-affiliated competitors, including the University of Michigan and MIT, by hefty margins, from 30% to 70%, and even outperformed professional intelligence analysts with access to classified data.”

Tetlock and Gardner 2015, chapter 1

3. “Training was delivered in a 1-hour online module and focused on forecasting reasoning tips, such as using base rates, mathematical models and updating one’s beliefs. Teaming allowed forecasters to collaborate as members of 12-15 person teams who had online tools for allocating effort, sharing information and rationales with one another. As the figure shows, training and teaming significantly reduced forecasting error in the tournament. These results are replicated across all four seasons of the tournament.”

Good Judgment Project, Six Lessons about Crowd Prediction

4. “Training was delivered in a 1-hour online module and focused on forecasting reasoning tips, such as using base rates, mathematical models and updating one’s beliefs. Teaming allowed forecasters to collaborate as members of 12-15 person teams who had online tools for allocating effort, sharing information and rationales with one another. As the figure shows, training and teaming significantly reduced forecasting error in the tournament. These results are replicated across all four seasons of the tournament.”

Good Judgment Project, Six Lessons about Crowd Prediction

5. “The first two lessons concerned persistence of individual skill and the importance of environmental factors. We wondered what would happen if we introduce highly skilled forecasters to an enriched environment. To test whether such “tracking” would further improve performance, we promoted the top 2% most accurate forecasters to “superforecaster” status and placed them in teams. The resulting super teams had an elite-egalitarian structure: they were composed of top past-performers, all of whom had equal rights and responsibilities within their teams.

The performance of super-teams was extremely strong. We document this using a simple version of discontinuity analysis. Namely, we compare superforecasters (top 2%) with those who just missed the cut (3-5%). In Year 1, when the selection took place, both groups performed much better than average. In both Years 2 & 3, super-teams increased their lead over the comparison group. Rather than regressing toward the mean, super teams increased their level of engagement and produced highly accurate forecasts.”

Good Judgment Project, Six Lessons about Crowd Prediction

6. “In fact, she’s so good she’s been put on a special team with other superforecasters whose predictions are reportedly 30 percent better than intelligence officers with access to actual classified information.”

Spiegel 2014

7. Tetlock and Gardner 2015

8. Archived copy of link: Good Judgment website

9. “Consider the debate over the nuclear deal with Iran, which was one of the nastiest foreign policy fights in recent memory. There was apocalyptic rhetoric, multimillion-dollar lobbying on both sides and a near-party-line Senate vote. But in another respect, the dispute was hardly unique: Like all policy debates, it was, at its core, a contest between competing predictions.

Opponents of the deal predicted that the agreement would not prevent Iran from getting the bomb, would put Israel at greater risk and would further destabilize the region. The deal’s supporters forecast that it would stop (or at least delay) Iran from fielding a nuclear weapon, would increase security for the United States and Israel and would underscore American leadership.

The problem with such predictions is that it is difficult to square them with objective reality. Why? Because few of them are specific enough to be testable. Key terms are left vague and undefined. (What exactly does “underscore leadership” mean?) Hedge words like “might” or “could” are deployed freely. And forecasts frequently fail to include precise dates or time frames. Even the most emphatic declarations — like former Vice President Dick Cheney’s prediction that the deal “will lead to a nuclear-armed Iran” — can be too open-ended to disconfirm.”

Tetlock and Scoblic 2015

10. See, for example, Superforecasting Website, In the Media.

11. Archived copy of link: FiveThirtyEight Website

12. “What we propose is new, even revolutionary, and could with proper support evolve into a systemic check on hyperbolic assertions made by opinion makers in the public sphere. It is rigorous, empirical, repeatable, and backed by the widely-recognized success of the Good Judgment Project based at the University of Pennsylvania.

Of course, we will need to aggregate many judgments of many forecasters to assess the relative accuracy of pundits versus superforecasters. This project is a long-haul effort. But it offers long-term hope. We don’t just have to sit idly by and bemoan the polarization of unnecessarily polarized debates. There are tangible steps that can be taken to increase the supply of an essential public good in democracies: thoughtful empirically grounded debates over consequences of policy options.

GJP and GJ Inc. are eager to collaborate with major foundations and media outlets interested in implementing these ideas with us. Those who embrace the ideas first will also advance their own long-term self-interest. They will co-own the sites that hold the punditocracy — left, center and right — accountable for the claims they make, sites to which an informed citizenry can turn to spot pundits who value polemical posturing over analytical accuracy.”

Tetlock, Alpha Pundit Challenge Proposal

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

University of Pennsylvania — Philip Tetlock on Forecasting

Table of Contents

1. Background

1.1 The cause

1.2 About Philip Tetlock

1.3 About the Good Judgment Project

2. About the grant

2.1 Proposed activities

2.2 Case for the grant

2.3 Risks and reservations

2.4 Room for more funding

3. Plans for follow-up

3.1 Goals and expectations for this grant

3.2 Internal forecasts

3.3 Key questions for follow up

3.4 Follow up expectations

4. Our process

5. Sources

Related Items

Forecasting

University of Pennsylvania — Forecasting Tournament Planning

Forecasting

University of Pennsylvania — Geopolitical Forecasting Research (2021)

Forecasting

University of Pennsylvania — Geopolitical Forecasting Research (2020)