To better inform our thinking about long-term philanthropic investment and hits-based giving, I (Luke Muehlhauser) have begun to investigate the historical track record of long-range forecasting and planning. I hope to publish additional findings later, but for now, I’ll share just one example finding from this investigation,1 concerning one of the most famous and respected products of professional futurism: the 1967 book The Year 2000: A Framework for Speculation on the Next Thirty-three Years, co-authored by Herman Kahn and Anthony J. Wiener.
Background information on Herman Kahn
Herman Kahn was one of the most prominent futurists and military strategists of the 20th century, and is sometimes cited as a “father of scenario analysis.”2 During the 1950s and 60s he became well-known for his contributions to nuclear war strategy. His early research was conducted at RAND, the original research arm of the U.S. Air Force, which was in charge of the country’s nuclear arsenal.3
In 1961, Kahn co-founded his own “high-class RAND” called the Hudson Institute, which soon employed at least 80 research analysts, research aides, and other staff.4 For a while he continued to publish on nuclear strategy and consult for the Department of Defense, but in the mid-1960s he began to turn his attention from military strategy to long-term “futurology.” In 1967, Kahn and Wiener (hereafter “K&W”) cobbled together five years of future-oriented work by the Hudson Institute into the book The Year 2000. Kahn biographer (and occasional co-author) B. Bruce-Briggs later described it as “the fundamental text” of futurology and “the single most important work in the field,” and wrote that “as a result of the book, [Kahn] became established in a second public career… as the leader of futurology.”5
Richard Albright’s assessment of technology forecasts in The Year 2000
Perhaps the easiest-to-evaluate part of The Year 2000 is its list of 100 technology predictions (for the year 2000), which appear on pp. 51-55. Conveniently, Albright (2002) assessed the accuracy of these 100 forecasts using a method reasonable enough that I don’t think it would add much value for me to perform my own, independent assessment of them.
Albright assembled a panel of eight experts, who were “experienced in a range of scientific fields with a mix of industrial and academic backgrounds,” to (independently) judge the accuracy of the 100 forecasts on a 5-point scale:
1. Bingo: a truly remarkable prediction that has materialized.
2. Okay: a good prediction of innovation that has materialized.
3. Not Yet: a prediction that might occur but has not happened yet.
4. Oops: just wrong.
5. What?: as in: “What were they thinking?”
The final rating for each forecast was the average rating assigned by all panelists, such that a forecast was judged as accurate if it scored an average rating of 2 or lower.6
To illustrate some of Albright’s results, here are the four best-rated forecasts (I’ve preserved K&W’s original forecast numbering):
- 71. Inexpensive high-capacity, worldwide, regional, and local (home and business) communication (perhaps using satellites, lasers, and light pipes)
- 74. Pervasive business use of computers
- 82. Direct broadcasts from satellites to home receivers
- 1. Multiple applications for lasers and masers for sensing, measuring, communication, cutting, welding, power transmission, illumination, and destructive (defensive)
And here are the four worst-rated forecasts:
- 35. Human hibernation for relatively extensive periods (months to years)
- 27. The use of nuclear explosives for excavation and mining, generation of power, creation of high-temperature-pressure environments, or as a source of neutrons or other radiation
- 79. Inexpensive and reasonably effective ground-based BMD (ballistic missile defense)
- 19. Human hibernation for short periods (hours or days)
Unsurprisingly, panelists differed greatly on how many forecasts they rated as having occurred (see Albright Table 4), and some forecasts were rated with much more consensus than others (see Albright Table 6). The table of forecasts with greatest and least consensus makes intuitive sense to me: those with greatest consensus seem to have relatively straightforward interpretations (e.g. “Direct broadcasts from satellites to home receivers”), and those with least consensus are phrased ambiguously and thus easily allow many interpretations (e.g. “New techniques for keeping physically fit and/or acquiring physical skills”).
Overall, about 45% of the forecasts were judged as accurate.
How good are these results?
The obvious thing to say is that it’s hard to tell. First, we don’t have any baseline to which we can compare K&W’s performance, such as a contemporaneous poll of experts asked to predict the likelihood of each of these same 100 innovations occurring by 2000. Without such a baseline, it’s difficult for us to understand how surprising or obvious each of these forecasts would have seemed to informed experts at the time. Second, many of the forecasts were stated ambiguously and thus were difficult for the judges to assess for accuracy. Third, K&W’s book is somewhat ambiguous about the degree to which they were trying to forecast the future.7
One could argue that K&W seem to have been hugely overconfident in these forecasts, given their statement (p. 50) that for each of these forecasts, “a responsible opinion can be found to argue a great likelihood that the innovation will be achieved before the year 2000 — usually long before. (We would probably agree with about 90-95 per cent of these estimates.)” If we interpret this to mean that they thought 90-95% of the 100 forecasts would come true, and we now know that ~45% of them came true, then K&W exhibited extreme overconfidence in their forecasts — substantially worse than is typical of, say, untrained subjects asked general knowledge questions, or political pundits asked to make short- and medium-term geopolitical forecasts.8
It seems plausible (but not obvious) to me that K&W’s performance on these forecasts really was this poor: after all, I haven’t seen evidence that K&W engaged in any probability calibration training prior to making these forecasts, nor even that they were aware of the contemporary probability calibration literature. Also, it seems likely to be even more difficult to make well-calibrated long-term forecasts than it is to (e.g.) make well-calibrated estimates about general knowledge questions or about geopolitical events occurring in the short- to medium-term future.
Intuitively, it is somewhat hard to imagine K&W thinking that 90-95% of these predictions would come true, given how radical and specific many of them are. Perhaps when they wrote that they “would probably agree with about 90-95 per cent of these estimates,” what they meant is that they thought 90-95% of these predictions had “a great likelihood” of coming true by 2100, where a “great likelihood” meant something like 65%-90%. In that case, it would still seem that K&W were overconfident, though less grossly so than if they really expected 90-95% of these 100 forecasts to come true by 2000.
However, I am inclined to believe the interpretation that K&W expected 90-95% of these 100 forecasts to come true by the year 2000. This is because, immediately after their list of “one hundred technical innovations very likely in the last third of the twentieth century,” K&W provide a shorter list of 25 “less likely but important possibilities,” clarifying that by “less likely,” they mean that these are “areas in which technological success by the year 2000 seems substantially less likely (even money bets, give or take a factor of five)…” (p. 55). This seems to mean that they’d bet at odds somewhere between 1:5 to 5:1 (1:1 being an “even money bet”), which implies a confidence of 16.67% to 83.33% for each of the 25 “less likely” forecasts. For these forecasts to be strictly “less likely” than the previously-listed 100 forecasts, K&W must have considered each of the previous 100 forecasts to be >83.33% likely to occur, which seems to vindicate the interpretation that they thought 90-95% of those forecasts would come true.
And that, in turn, implies that K&W were hugely overconfident in those 100 forecasts.
- 1. An earlier draft of this page underwent a light vet by GiveWell Research Analyst Isabel Arjmand. I asked that she pay special attention to whether I (1) correctly represented Albright’s findings, and whether I (2) provided a reasonable interpretation of Kahn & Wiener’s level of confidence in their 100 predictions, given the textual evidence available. Isabel said she agreed with my representations on those topics. She also suggested some minor edits for clarity and typo-fixing, and I accepted those edits.
- 2. See e.g. Cooke (1991), p. 10; Kuosa (2012), p. 37.
- 3. My sources for these initial biographical details are Wikipedia and Menand (2005).
- 4. Bruce-Briggs (2000), pp. 177-178.
- 5. For the details in this paragraph, see Bruce-Briggs (2000), pp. 220, 284-295.
- 6. Technically, computing means from ordinal data is inappropriate, but I would guess that in this case it wouldn’t affect the big-picture results much, and I don’t have access to the raw data from the study, with which I could compute other statistics, such as the most common rating for each forecast.
- 7. In The Year 2000 K&W write that their book is not “an attempt to ‘predict’ any particular aspect of the future” (p. xix). Nevertheless, the book contains hundreds of statements that, to me, are most naturally interpreted as predictions, and K&W describe some future scenarios that they think are more useful to analyze than other scenarios. Because of the subtlety of their approach, I will explain the senses in which I think The Year 2000 is making predictions that can be (to some degree) evaluated for accuracy.
In the introduction, the prominent futurist Daniel Bell provides some context for how we should interpret The Year 2000:
Reviewing the prophets of the past, one finds lacking in almost all of them… any notion of how a society hangs together, how its parts are related to one another, which elements are more susceptible to change than others, and, equally important, any sense of method. They are not systematic, and they have no awareness of the nature of social systems: their boundaries, the interplay of values, motivation, and resources, the levels of social organization, and the constraints of custom and privilege on change. If there is a decisive difference between the future studies that are now under way and those of the past, it consists in a growing sophistication about methodology and an effort to define the boundaries — intersections and interactions — of social systems that come into contact with each other. [p. xxiv]
In October 1965, the Academy created the Commission on the Year 2000, composed of thirty individuals, to stimulate [research on “alternative futures”]. Discussions at the first plenary session of the Commission established the need for statistical and other “baselines for the future”; that is, a compilation of likely and possible future developments that the Commission could take as a starting point for more detailed consideration of policy consequences and alternatives. Mr. Kahn, a member of the Commission, was asked to undertake this task; and this research is the result. [p. xxvii]
No one pretends that single “events” can be predicted. These are often contingent and even irrational. Nor can one predict what historians call “turning points” in the lives of men or nations… But all such events are constrained by various contexts: of resources, of customs, of will. And they are shaped, as well, by basic trends in human society: the growth of science, literacy, economic interdependence, and the like. This volume, therefore, is not an exercise in prophecy; it is an effort to sketch the constraints of social choice. [p. xxviii]
To further illuminate how K&W conceived of their project at the time, I will quote their own words at length:
This book is simply what the subtitle says - “a framework for speculation.” It is far from an exhaustive set of conjectures about every important element of the future; still less is it an attempt to “predict” any particular aspect of the future. In subsequent work we intend to build upon this study by filling in the framework, here and there, and by enlarging upon, qualifying, discarding, or making a better case for various speculations that are merely sketched here. In this initial report, however, our emphasis is necessarily methodological, synoptic, and contextual… We have emphasized problems, not solutions. [p. xix]
There are many good reasons for trying to imagine what the world may be like over the next thirty-three years. The most important, of course, is to try to predict conditions in reasonable detail and to evaluate how outcomes depend on current policy choices. If only this were feasible, we could expect with reasonable reliability to change the future through appropriate policy changes today. Unfortunately, the uncertainties in any study looking more than five or ten years ahead are usually so great that the simple chain of prediction, policy change, and new prediction is very tenuous indeed.
…Nevertheless, at the minimum, such studies… contribute to interesting lectures, provocative teaching, and stimulating conversation… More important, these studies can affect basic beliefs, assumptions, and emphases. Probably most important, at least for us at Hudson Institute, is that long-range studies provide a context in which to do five- to ten-year studies that can and do influence policy choices.
…Another important, but unfortunately often unattainable, objective for a long-range study is to anticipate some problem early enough for effective planning. Whether this can be accomplished… depends, of course, on the issue and the question: some variables change much more slowly and reliably than others, and some questions need much better answers than others. Trends or events that depend on large, aggregative phenomena are often more amenable to long-range planning than those that depend on unique circumstances or special sequences of events. Projects, such as educating an individual, carrying out city planning, projecting recreational demands, formulating anti-pollution, or perhaps population control policies, can normally be usefully considered much further in advance than problems of international relations or subtle and complex national security issues. This is true because gross, long-term trends are far more recognizable and projectable than complex sequences of unique events, such as those that will determine tomorrow morning’s headlines.
One answer, a partial one, to these problems… is deliberately to build greater flexibility into both systems and programs…
Thus in policy research we are not only concerned with anticipating future events and attempting to make the desirable more likely and the undesirable less likely. We are also trying to put policy-makers in a position to deal with whatever future actually arises, to be able to alleviate the bad and exploit the good. In doing this, one clearly cannot be satisfied with linear or simple projections: a range of futures must be considered. [pp. 1-3]
What are we to make of this? On the one hand, K&W write that their book is not “an attempt to predict any particular aspect of the future.” On the other hand, Bell describes the project as an attempt to compile “likely and possible future developments,” and K&W seem to acknowledge that there are at least some cases in which “a long-range study” can “anticipate some problem early enough for effective planning” — in particular, they write that this is most feasible in cases of “trends or events that depend on large, aggregative phenomena,” and such trends are the primary focus of the book.
My own resolution to this seeming contradiction is to assume that K&W’s statement that they are not attempting to predict “any particular aspect” of the future is meant to apply only to the big-picture scenarios that are the main focus of the book, and not to the many relatively concrete and specific predictions they seem to make along the way.
Thus, I am tempted to interpret statements like these as uncertain predictions (emphasis added in all cases):
The basic trends of Western society… can be seen as part of a common, complex trend of interacting elements. For analytic purposes, however, we shall separate them into thirteen rubrics… From the point of view of looking toward the future, the important consideration is that, as basic trends, these elements seem likely to continue at least for the next thirty-three years, though some may saturate or begin to recede beyond that point. [p. 7]
…we list in Table XVIII one hundred areas in which it is probable that technological innovation will occur in the next thirty-three years… We would probably agree with about 90-95 per cent of these estimates. [p. 50]
[Table XIX lists] areas in which technological success by the year 2000 seems substantially less likely [than those in Table XVIII] (even money bets, give or take a factor of five), but where, if it occurred, it would be quite important… [p. 55]
[Table XX lists] ten radical possibilities… We do not believe that any of them will occur by the year 2000, or perhaps ever. [p. 56]
We suspect that [fast-breeder nuclear reactors] will be started in the mid-1970s and widely built in the 1980s and 1990s. [p. 73]
In other cases — the majority of the book — K&W seem to be aiming to construct “surprise-free projections” and “canonical variations,” which are explicitly not to be interpreted as “likely” or “probable” future scenarios:
One can think of a surprise-free projection as being as sophisticated a projection as it seems reasonable to make given the available understanding of current trends. It thus differs, but not in spirit, from the “naive” projection of the economists which take current tendencies as certain. For most of the projections that we are discussing, in which we are looking twenty to thirty or more years ahead, perhaps the most surprising thing that could actually happen would be an absence of surprises. Therefore, there is no implication that a surprise-free projection is likely. It may, in fact, be the most likely of the various possible projections; that is to say that when contemplating a thousand things which could happen, the surprise-free projection may have a probability of much less than one in a hundred, yet be more probable than any of the other 999 possible occurrences. It could be “most probable” and still be quite improbable. [p. 38, emphasis in original]
This study thus will present some fairly simple “surprise-free” political projections (Standard Worlds) and some nearly surprise-free Canonical Variations… The fact that they are surprise-free, or nearly surprise-free, makes them generally acceptable for use and discussion, but we should be quite clear that as we extrapolate these projections beyond the next decade into the mid-1980’s and to the year 2000…, the assumption that no grave surprises or great crises will occur becomes increasingly untenable. But this does not make these worlds entirely useless. They provide specific cases, or examples, with which one can disagree as well as agree, vary from as well as use without change, make use of to emphasize specific contradictions as well as to illustrate or elaborate hypotheses… Our most important caveat, however, is that almost any day has some chance of bringing up some new crisis or unexpected event that becomes a historical turning-point, diverting current tendencies so that expectations for the distant future must shift. Over any period of decades at least one such shift, and probably several, are likely — and the turning-point may come at any moment. Thus any study of the distant future may date rapidly. [pp. 12-13, emphasis in original]
Because K&W write that their “surprise-free” and “nearly surprise-free” projections might in fact be very unlikely to occur, we cannot simply check the accuracy of these projections and then, if the projections do not (in hindsight) match reality, conclude that K&W made poor forecasts. We can, however, use such an exercise as some evidence about the utility of K&W-style long-range scenario analysis. For example, if the real world of 2000 was more-or-less captured by one of K&W’s Standard Worlds or Canonical Variations, that would be a small but surprising (to me) update in favor of the utility of a certain kind of long-range scenario analysis. And if the real world of 2000 was radically different from any of K&W’s Standard Worlds and Canonical Variations, then we might question the plausible utility of such long-range scenario analysis, even if we cannot fairly say that K&W made poor forecasts. (But, evaluating the utility of K&W’s big-picture projections is not the aim of this page. This page is focused only on evaluating a portion of K&W’s technology predictions.)
Hence, on this page I evaluate the accuracy of some of K&W’s technology predictions — which, unlike some other parts of the book, do seem to be intended as predictions — while acknowledging that (a) in doing so, I might be misinterpreting K&W’s original intent, and that (b) their predictions seem secondary to their big-picture “surprise-free projections” and “canonical variations.”
Below, I collect additional textual evidence for my interpretation of K&W’s intentions.
On p. 7 of The Year 2000, K&W write:
In projecting beyond the next decade, whether studying general trends and contexts or very specific areas, we must choose — perhaps more or less arbitrarily — among many plausible alternatives those which ought to be studied in greater detail. We have discussed some of the methodology of such choices in [a document later republished as “The Alternative World Futures Approach,” pp. 83-136 in Kaplan (1968)].
In that book chapter (“The Alternative World Futures Approach”), Kahn provides a nuanced (and somewhat difficult to interpret) account of the aims and utility of his approach. To the reader who seeks an easier-to-interpret, less-nuanced reading of Kahn’s intentions, Kahn might appear to be flipping back and forth between saying that his approach involves some attempt to predict the future and saying that it does not, as I illustrate with the quotes below. (Note that some of this material is repeated in similar form in The Year 2000.)
First, Kahn lists the objectives of the “alternative world futures” approach (pp. 83-84 of the chapter mentioned above):
- To stimulate and stretch the imagination.
- To clarify, define, name, expound, and argue major issues.
- To design and study alternative policy “packages” and contexts.
- To create propaedeutic and heuristic expositions, methodologies, paradigms and frameworks…
- To improve intellectual communication and cooperation…
- To increase the ability to identify new patterns and crises and understand their significance.
Noticeably missing from this list are objectives such as “forecast plausible or likely futures” or “make policy recommendations.” Lest the reader be confused, a footnote at the end of the list adds:
The reader may be puzzled by not finding three of the more conventional objectives of policy research: (1) To furnish specific knowledge and generate and document conclusions, recommendations, and suggestions. (2) To clarify currently realistic policy choices, with emphasis on those that retain flexibility for a broad range of contingencies. (3) To improve the “administrative” ability of decision-makers and their staffs to react appropriately to the new and unfamiliar.
While these objectives are clearly important, they are not the primary objectives of the Hudson Institute study of Alternative World Futures.
Later (p. 86), amidst a long discussion of the difficulties of forecasting the future or even usefully planning for it, Kahn writes:
The problem, however, is not entirely hopeless. While it may be impossible to predict the future in detail, it is possible to speculate usefully on many aspects of the future and even predict some. And even moderate care and prudence — hedging — can have spectacularly useful results should the unlikely occur.
But then, in a section on scenario analysis, Kahn writes (pp. 105-106):
[One criticism of scenario analysis] is that scenarios may be so divorced from reality as to be not only useless but misleading, and therefore dangerous. However, one must remember that the scenario ought not to be used as a predictive device. The analyst is dealing with the unknown and to some degree unknowable future.
On the other hand, later on the same page Kahn writes:
Since plausibility is a great virtue in a scenario, one should, subject to other considerations, try to achieve it, even though it is important not to limit oneself to the most plausible, conventional, or probable situations and behavior.
Even more revealingly, in a section illustrating the “alternative world futures” approach with examples, Kahn explains (p. 121):
We are attempting to examine… the future. We are not, of course, trying to pick the winner of a horse race, only to describe most of the important horses that are running — important perhaps because the probability of winning is high or because the payoff for winning is so spectacular, or for an appropriate combination of probability and intrinsic importance…
In trying to examine the variables which might affect important issues of the future or even determine them to some degree, we find it convenient to divide them into six categories as indicated below:
- Relatively Stable: Climate, gross topography, language, religion, “national character, institutions and style,” many frontiers, etc.
- Slowly (Exponentially or Linearly?) Changing: Natural resources, demography, capital resources, skill and training, technology, GNP, welfare policies, etc.
- “Predictable”: Typical scenarios, prime movers, overriding problems, etc.
- Constrained: More political changes, alliances, business activity, defense budget, morale, military posture, military skill, etc.
- Accidental: Some outcomes of war or revolution, many natural calamities, some kinds of personalities, some kinds of foreign pressures and intervention, some kinds of other events.
- Incalculable: Excessively complex or sensitive or involving unknown or unanalyzed mechanisms of causes in an important way.
To the extent that one feels the future is more or less predictable, one tends to emphasize the importance of the first categories — particularly the first four. To the extent that one feels the future is unpredictable, one tends to emphasize the latter categories — particularly the fifth and sixth. We, of course, will adopt the position that many important aspects of the future are predictable — particularly if “other things are equal” — that many important aspects are not, and that the effect of the predictable things may be quite different from what we think because of the effects of the unpredictable variables — yet that it still may be worthwhile to try to “predict” that which can be predicted, or at least to describe the possibilities and turning points. Indeed, it is the purpose of policy to plan for that which is more or less predictable and hedge against that which is uncertain, both to be able to exploit favorable events and to guard against the consequences of unfavorable ones.
One can read this chapter as evidence that Kahn contradicted himself, but I am inclined to read it as attempting to communicate points such as:
- Some things are more predictable than others.
- Big-picture world scenarios involve a broad mix of relatively predictable and unpredictable phenomena.
- Hence, it is impossible to say which particular long-range scenario is most likely, and it is difficult or perhaps impossible even to say which set of scenarios jointly capture the most likely long-range possible futures.
- Even given this, there are other uses for long-range, big-picture world scenarios, such as to “stretch the imagination” and “improve intellectual communication and cooperation.”
Back to The Year 2000. Additional clarifications concerning the extent to which K&W are attempting to “predict” the future are given on pp. 37-38:
If we say that something should range between five and ten, we do not mean that it cannot be less than five or greater than ten. We simply mean that we would be willing to make a bet at say two to one, or five to one, or even twenty-to-one odds, as the case may be, that the variable under discussion will, in fact, range between five and ten. Sometimes it is useful to make even the subjective probability (i.e., the bet) precise and explicit as well. But again we would normally do so because there is widespread understanding and agreement (except perhaps among a few of the experts) about the meaning of the term subjective probability, and not because we have independently calculated or studied carefully and precisely the numerical probability put forth. Putting the assertion in terms of willingness to make a bet with a small sum of money is intended to make clear our degree of certainty; it does not usually suggest just how educated our educated guess is. It tries to communicate how much confidence we feel in the prediction but often not the confidence we feel in our confidence. (The amount we might be willing to risk often indicates the latter confidence.)
Furthermore, quantitative statements necessarily refer to variables that are quantifiable-or to quantifiable aspects or models of a variable. The frequent use of such quantifications does not imply that less quantifiable variables and issues may not be important or even dominant-or that there may not be difficulties of principle in defining the variable to be quantified. It implies only that we believe that it is still of interest to ask, “what if … ,” or to communicate, quantitatively, some aspect or approximation of an issue, even if not the whole of it. This kind of communication creates difficulties and errors-particularly of emphasis-but may still be more useful than saying that something will be “small” or “unlikely.” Thus we use precise and quantitative statements usually to improve precision in communication, not because the variable can be precisely measured or estimated, or even precisely defined.
…Weather forecasters know that the best single prediction in the absence of strong contrary indications is that current conditions will continue tomorrow. While this prediction would obviously be very often wrong, it would be better than any other simple doctrine for predictions. Similarly we would be willing to wager small sums, at even odds, that the next third of a century will contain fewer big surprises than either of the previous thirds (i.e., that in this respect the world is more like 1815 than 1914). Whether this increased optimism is due to having learned something from doing this study or merely to having become too used to its assumptions is a matter that the reader must judge for himself.
- 8. A much-cited review of studies on probability calibration up to 1980, Lichtenstein et al. (1982), found that untrained subjects tested on general knowledge questions of medium or hard difficulty tend to be right 70%-80% of the time when they state a confidence level of 90%; see figure 2. As for political pundits, I refer to Tetlock (2005)’s famous large-scale study of the forecasting accuracy of political pundits; see figure 3.3.