This is the second in a series of posts summarizing the Open Philanthropy review of the evidence on the impacts of incarceration on crime. Read the full report here.
As I explain in the intro post, in thinking about how incarceration affects crime rates, it is useful to distinguish between “before,” “during,” and “after” effects. The “before” effects of incarceration are deterrence: the prospect of jail or prison time may dissuade people from committing crime. Surely this must happen to some extent, but how much at current policy margins is a question for research. The experimental and quasi-experimental studies I read and reproduced mostly said: not much.
Below, I review research on:
- laws criminalizing driving under the influence of alcohol;
- a mass prison sentence suspension in Italy;
- whether young people commit less crime as they obtain the age of criminal majority, when they first face the risk of adult-level sanctions;
- California’s severe “Three Strikes and You’re Out” sentencing law;
- laws adopted in many states to increase minimum sentences for various crimes, or lengthen sentences for crimes involving guns.
For the last two, I obtained the data and computer code for the relevant studies and analyzed them afresh.
Deterring the drinking driver?
During my crawl through the web of literature on deterrence, I found a little book by H. Laurence Ross called Deterring the Drinking Driver, which reviews the history (through 1982) of research on the impacts of laws against driving under the influence (DUI). DUI has not contributed much to mass incarceration, the development that animates our interest in criminal justice reform. Barely 2% of people in prison in this country are there for DUI. Yet the DUI research still offers insight on the power of incarceration to deter.
Most of the studies Ross reviews relate to “Scandinavian” laws, which make driving under the influence a crime per se, regardless of whether harm is done. The approach originated in Norway in 1936, spread to Sweden in 1941, to much of Europe, Australia, New Zealand, and Canada in the 1960s and 1970s, and throughout the United States in the 1980s.
Ross concludes that most efforts to use punishment to deter drinking and driving have not clearly succeeded. Some did for a few months or years, especially when launched amid great publicity. But these effects tended to fade. The classic example took place in the UK. After the Road Safety Act of 1967 went into force, making it a crime to drive with a blood alcohol level above 0.08%, the national rate of fatalities and serious injuries from all causes dropped noticeably, especially during weekend nights, 10pm–4am, when drunk driving was likely most common. Here is Ross’s graph of night-time, weekend fatalities and serious injuries, by month, in the UK:
But as you can see, fatalities soon began returning toward the old level. Ross hypothesized that the publicity around the 1967 law initially led British drivers to overestimate the risk of getting caught. Over time, they recalibrated to the true risk, which was low. In 1970, British police administered one breath test for every 2 million vehicle miles driven.
Overall, Ross shows that threats of punishment can deter. But when the threat it is uncertain, as it often is, so is the deterrence.
A mass amnesty in Italy
On August 1, 2006, the Italian government released a stunning 36% of all prisoners, some 22,000 people. The Parliament approved the great release after years of debate prompted by the advocacy of Pope John Paul II, who was concerned about harsh prison conditions and overcrowding. One academic paper coming out of this episode finds strong deterrence; just how takes explaining.
The Italian clemency suspended the last three years of current sentences for most crimes. People with less than three years left to serve were freed immediately. Those with more than three were brought that much closer to release. However, anyone receiving this clemency who recidivated within five years had the suspended portion of the old sentence tacked onto the new one, provided that the new crime was serious enough to earn a sentence of at least two years.
Thus, at least at first, the clemency was provisional rather than permanent. And in that distinction, economists Francesco Drago, Roberto Galbiati, and Pietro Vertova spotted a natural experiment: If prisoner A received the same sentence as B, but happened to have been imprisoned closer to the moment of clemency, then A had a larger suspended sentence hanging over her head. So would A, facing a longer potential sentence going forward, avoid crime more than B?
Drago, Galbiati, and Vertova answer with a strong yes. Over the seven months following the release (all they had data for), each additional month of suspended sentence cut the chance of prison reentry by 0.16 percentage points. That impact may seem larger when expressed another way: each 10% increase in prospective sentence led to 7.4% fewer convictions, for an “elasticity” of -0.74.
But these numbers can be read another way. Prisoners A and B differ from each other in two respects, and we can’t be sure which matters more for recidivism. Yes, B emerged from prison with a shorter suspended sentence hanging over his head, and so may have been less deterred from reoffending. But before the release, B also spent more time in the potentially criminogenic environment of Italy’s overcrowded prisons. Maybe A committed less crime not for fear of reactivation of a larger suspended sentence, but because she lived for less time under the conditions that provoked the Pope. Then, the researchers’ findings would measure the harmful “after” effects of prison rather than the beneficial “before” effects of deterrence.
My best guess is that Drago, Galbiati, and Vertova are right to view deterrence as primary. Italy’s prisons refilled within a couple of years, yet the national crime rate stabilized below pre-release levels. (More on that in the next post.) That suggests that the released prisoners were less likely to commit crime than their replacements, despite probably having spent more time behind bars. And that makes it look less likely that doing time greatly increased crime.
Another source of doubt when deciding how to generalize from this study to the American criminal justice debate is the dramatic and personal nature of the threat of punishment. It is one thing to be vaguely aware that a legislature has raised or lowered sentences for a class of crimes—the sort of move that can feed or reverse mass incarceration. It is another to come suddenly but conditionally into grace—to be freed before your allotted time, yet be warned that the state will reinstate the residual of your time should you be reconvicted. That experience may have endowed the prospective punishment in Italy with unusual psychological power.
Coming of age
Like the heat-loving microbes that thrive near deep sea vents, researchers are drawn to cleavages in the firmament on which they subsist—which in their case is data. I found two studies that track what happen as young people become adults in the eyes of criminal law—in most states, at the 18th birthday. The studies compare kids a few weeks shy of the age of criminal majority to kids a few weeks past. The findings are especially pertinent since young adults commit the most crime.
David Lee and Justin McCrary studied arrest rates for adolescents in Florida, where the age of criminal majority is 18. Focusing on those who had already been arrested at least once by age 17, Lee and McCrary ask: what is the probability that a child’s first post-17 arrest occurs, if it has not already occurred, for each week of life? They get this graph, in which the dots are data and the curves on either side of the line are smoothed fits:
There is no sudden drop at 18: despite facing tougher sanctions after they cross the threshold of criminal responsibility, the young Floridians did not get arrested significantly less.
Randi Hjalmarsson did something similar on self-reported crime rather than police-reported arrests. Every year since 1997, the federally run National Longitudinal Survey of Youth has been interviewing some 9,000 people who were 12–16 years old as of December 31, 1996. The survey has included questions about whether respondents have committed common crimes such as stealing a car or selling drugs. From the answers to those questions, Hjalmarsson made these graphs of self-reported criminality as a function of months until or after local age of criminal majority:
Only the pettiest crime, theft of $50 or less, drops with much significance as young people become subject to the adult sentencing regime (with a p value of 0.11). Of course, the people responding to the survey may have withheld information even in a self-administered questionnaire backed by promises of anonymity. But it seems unlikely that they would suddenly have become more honest at the age of criminal majority, in a way that would hide real drops in crime.
Again, in the contemporary U.S. context, deterrence looks weak.
Two takes on Three Strikes
California’s “Three Strikes and You’re Out” policy was proposed by a wedding photographer whose daughter had been murdered by a parolee, and quickly became law in the heat of the 1994 gubernatorial campaign. The law was of a piece with the national “tough on crime” movement, yet singular in its severity. Criminologists Franklin Zimring, Gordon Hawkins, and Sam Kamin call it the “largest penal experiment in American history.” If researchers are going to find deterrence from long sentences, they should find it in California.
While avoiding the term, the law effectively defines a “strike” as a conviction for a serious or violent felony (defined in code sections §1192.7(c) and §667.5(c)). Having one strike doubles the sentence for a new felony even if the new one does not itself rate as a strike. Having two strikes triples the next sentence—or extends it to twenty-five-years-to-life, whichever is greater. And if convicted after one or two strikes, you could only be paroled after serving 80% of your lengthened sentence, compared to a more usual 50%.
As enacted, Three Strikes didn’t quite match the baseball metaphor. After serving sentences for two “strikeable” felonies, a person could be called “out”— get twenty-five-to-life—even for a minor felony that itself would not count as a strike. In 2012, after the periods of the studies I found, California voters approved Proposition 36, which made the baseball metaphor more accurate by ending twenty-five-to-life sentences for non-violent felonies.
The first paper I read on Three Strikes, by Radha Iyengar, is premised on a clever idea. It is that only serious or violent felonies started a person’s strike count, but after that first strike, any felony conviction would increase the count. As a result, people who committed the same crimes in a different order would find themselves at different stages in the law’s escalating schedule of sanctions, producing a good quasi-experiment. For example, since burglary is considered a more serious crime than theft (burglary entails forcible entry onto private property), someone who was convicted of a burglary and then a theft would earn two strikes while someone who was convicted of a theft and then a burglary would earn one. Burglary would start the strike count and theft would add to the count only if it came after. Perhaps the threat of greater sanction would deter some crime.
Unfortunately, the clever idea is also incorrect. If you go back to my description of the law, you’ll see that both would have one strike. If burglary started a strike count, theft would not add to it.
The other Three Strikes study, by Eric Helland and Alexander Tabarrok, also formulates a quasi-experiment on a clever observation about the operation of the law. This one looks plausible, though I ultimately came to question how well it works. The punishment under Three Strikes may be severe, but it is not certain. Its application is subject to the discretion of prosecutors, who decide which charges to bring, and subject to the judgement of judges, who typically decide which charges will stick and which will be downgraded or dropped. Thus two people might commit the same crimes, and even be charged for the same ones, yet be convicted of different ones, and end not only with different convictions but different strike counts. So Helland and Tabarrok compare people charged and convicted of two strikes to people charged with and convicted twice, but once for a lesser, non-strikeable offense. The authors argue that what distinguishes these two groups is substantially random. And however unfortunate for justice, arbitrariness is gold for researchers wanting to study the impacts of severe sentences, for it generates natural experiments.
The researchers ask: after they finish serving their time for the second offense, do the people at two strikes, who stand at the precipice of twenty-five-to-life for a third felony, commit less crime than the one-strikers? Does Three Strikes deter?
This graph from the paper, of rearrest rate vs. days since release after second conviction answers, “yes, a bit.”
The two curves show the share of one- or two-strikers who had not yet been rearrested, as a function of the number of days since their release after serving time for that second conviction. The curve for those carrying two strikes falls more slowly, meaning that people one felony conviction away from twenty-five-to-life indeed got arrested less—about 15% per year less per unit time. I translate that finding into an elasticity of –0.12, meaning that each 10% increase in prospective sentence reduced arrests 1.2%.
In early drafts of my review, I took this finding of mild deterrence at face value. For Helland and Tabarrok show in several ways that the two groups resembled each other statistically before the quasi-experiment began, making for a compelling comparison.
But then I decided to use this finding in my cost-benefit analysis, as the estimate of the deterrence benefit of incarceration. To do that, I needed to break the effect out by crime type. How much did Three Strikes deter murder or assault or robbery? That matters because different crimes impose different harms. Alex Tabarrok kindly shared the data and computer code even though the publishing journal’s three-year data sharing period had expired. Replicating and breaking out the results brought two small surprises that sufficed to shift my reading.
First, the treatment and control groups came to seem less “balanced”—less identical at the start of the quasi-experiment. An apparent error in Table 1 of the paper obscured a difference between the two groups: two-strikers had 7.57 prior arrests, compared to 8.53 for one-strikers, with a statistically significant p value of 0.01. And if the two-strikers came into the study with fewer priors, we don’t need deterrence to explain why they were arrested less after. Digging deeper, I found that the gap in priors was especially pronounced in two major crime categories not checked in the original paper: larceny (theft) and drug crimes. Since if one compares the two groups on enough traits—age, race, etc.—one is bound to get a few low p values by pure chance, I also ran a test of whether all the differences taken together could have easily arisen by chance. The data reject that hypothesis at p = 0.02. All these findings are captured in this revised table of characteristics of one- and two-strikers in the Helland and Tabarrok study:
The second surprise arrived when I split the study’s results out by crime type. The seeming impact on rearrest is largely confined to drug crimes (trafficking or possession). For the large subcategory called “index crimes”—the violent and property crimes that the FBI tracks in order to compute the national crime rate—there was essentially no apparent deterrence. Here are graphs analogous to the earlier one, of rearrest rate vs. days since release after second conviction, for one- and two-strikers, by type of charge upon rearrest:
I can construct theories for why Three Strikes would only deter drug crime. But they feel contrived. Why would twenty-five-to-life deter drug dealing, but not the thefts and burglaries that sometimes pay for it? The theory of deterrence presumes some rationality in deciding whether to commit crime; so why would Three Strikes most deter the crimes most driven by addiction?
One can explain the results more simply by assuming that the natural experiment was imperfect after all. For whatever reason, two-strikers had been less involved in the illegal drug business before the study (as far as arrest records go) and remained that way for the duration.
In the end, I did not find compelling evidence that the “largest penal experiment in American history” deterred crime.
Mandatory sentencing minimums and gun add-ons
As part of the “tough on crime” movement, U.S. states adopted several kinds of laws to increase incarceration. One approach was to impose or increase minimum sentences for certain crimes, curtailing the discretion of judges and parole boards over time served behind bars. As in California, these often focused on people convicted of repeat offenses. Another popular step has been to increase sentences specifically for crimes involving guns. Such “gun add-on” laws appeal across the political spectrum because they take aim at gun violence without limiting gun rights.
Since such laws mostly extend sentences for people who would be incarcerated anyway, they can especially shed light on deterrence, as distinct from incapacitation and aftereffects. In the first year or so after adoption, most of the people subject to such laws would have been in prison anyway. So in the short-term, the laws should not increase incapacitation or aftereffects much. But since the deterrence could start immediately, it could show up strongly in any short-term impacts found.
David Abrams studied both types of law, and from a wider-angle view than Helland and Tabarrok. In this study, the unit of observation is the city-year combination, where “cities” are the approximately 500 most populous law enforcement agency jurisdictions (LEAs) in the U.S., and years range from 1965 to 2002. The New York Police Department, for example, is the biggest LEA. Abrams analyzes whether, on average, robberies and assaults involving guns fell after states enacted mandatory minimums or gun add-ons.
Two types of law (mandatory minimums and gun add-ons) and two kinds of crime (robbery and assault) make four combinations. The paper finds a statistically significant impact in one: gun robberies fall after gun add-on laws are passed, with an elasticity of about –0.1. The paper stresses the non-zero finding in the abstract and conclusion. That looks a bit like data mining, which is the practice, often unconscious, of trying many things until a statistically significant result inevitably appears. Since mined results are statistical flukes filtered from noise, they tend to be fragile: modest changes in statistical method can make them go away.
I obtained the data and code for the paper and tested its fragility to a few modifications. The paper’s data set is posted on the journal’s website. In addition to working with that, I went back to the publicly available primary source to reconstruct it—or a version of it. I modified Abrams’s methods in three ways:
- I switched from annual to monthly data, in order to observe more precisely how the gun-involved crime rate evolved right after a sentencing law was passed.
- I filled gaps in the crime data in a more sophisticated way. The FBI began collecting crime totals from LEAs in 1929; participation remains voluntary. Over time, more LEAs have supplied data, which means that some trends in state totals could reflect expansion in coverage rather than evolution in criminality. In addition, many LEAs’ series contain holes, which are not reliably flagged: sometimes zero means zero and sometimes it means “unknown.” Often a person can tell which by looking—but not always, and at any rate the data set is too big for a person to review. Abrams tackled the missingness by focusing on the largest LEAs with (nearly) complete data, which covered about 40% of the US population as of 2000. He then hand-cleaned the data. Since the process involved guesswork for missing values, it was inevitably imperfect.
Instead, I took an algorithmic approach, also inevitably imperfect. For example, my code interprets a zero for gun-involved robberies as “missing” rather than a true zero if total robberies exceeded 100. The code then applies a sophisticated technique called Multiple Imputation (MI), which allows one to make educated guesses for missing values while also incorporating the uncertainty about those guesses into the final results.
- Having an automated method for identifying and filling missing values allowed me to double the population coverage from 40% to 80%.
The next graph shows how my changes affect the Abrams finding that passing a gun add-on sentencing law led to fewer gun-involved robberies. The upper-left pane provides the starting point. It corresponds exactly to a figure in the Abrams paper showing the average gun robbery rates in the years before and after adoption of a gun add-on, relative to the rate in the year of adoption. Each blue diamond depicts a 95% confidence interval for a year’s value. We see in this graph that on average the gun robbery rate held stable in the years before adoption, then fell right after, by about 25%. The rest of the top row shows how the picture evolves as I introduce my approach to filling in missing data (middle), and then double the sample (right). The bottom row does the same, but with monthly data, for precision.
Notice the sloped red lines I’ve superimposed on all the panes. I added those to help test what to me seems the decisive question: not whether gun robberies fell after gun add-ons were adopted, but whether they fell distinctly, rather than just continuing a long-term decline. The red lines show best fits to the data one, two, and three years forward and backward from the point of adoption. I tested whether corresponding lines have the same slope before and after, and put the p values at the bottom of the graph. For example, in the upper-left pane, the one that matches Abrams, the computer is unsure whether the one-year pre- and post-adoption trends differ (p = .52). But it is fairly certain that the two- and three-year trends bend distinctly downward at adoption, just as Abrams concluded. However, the p values rise as I introduce my changes, meaning that the trend break fades into statistical insignificance.
Laws meant to deter drunk driving mostly haven’t. Young people hardly offend less after they come of age under criminal law. U.S.-based studies of laws lengthening sentences find mild deterrence, with an elasticity of about –0.1, and my reanalyses call even those impacts into question. Italy’s extraordinary clemency in 2006 may have deterred crime by suspending rather than commuting sentences. But another theory competes to explain that result, and the unusual cognitive salience of the prospective punishment may make the Italian case a poor stand-in for most policy reforms.
All that is why I wrote in the opening post that “deterrence is de minimis.” My synthesis of the evidence is that at current policy margins, making incarceration sentences longer or more likely hardly deters crime. As I also mentioned in the opening post, my devil’s advocate interpretation accepts the Helland and Tabarrok Three Strikes study and the Abrams finding on gun add-ons and gun-involved robberies, and estimates the elasticity of deterrence at –0.1. In this contrarian reading, a 10% increase in sentences deters crime by a modest 1%.
If longer sentences do not deter, they may still reduce crime outside prisons by taking would-be criminals “off the streets.” That is incapacitation, and is the subject of the next post.
Code and data for all replications are here (800 MB). The cost-benefit spreadsheet is here.
 Section 1(e) of the law defines the key three-strike sentence enhancements, with phrases including “If a defendant has one prior felony conviction…” and “If a defendant has two or more prior felony convictions as defined in subdivision (d)…” As the latter quote makes explicit, section (d) defines “prior felony convictions” for purposes of the law; and it defines them as serious or violent felony convictions only: “[A] prior conviction of a felony shall be defined as…[a]ny offense defined in subdivision (c) of Section 667.5 as a violent felony or any offense defined in subdivision (c) of Section 1192.7 as a serious felony.” Less-serious felonies never count as strikes.