Virginia Judges, 1; Artificial Intelligence, 0

by James A. Bacon

It sounded like such a good idea: Develop a criminal-sentencing algorithm to help judges identify felons least likely to reoffend and either give them shorter jail sentences or divert them to probation or substance-abuse treatment programs. Virginia created just such an algorithm in 2001. Minimizing the subjective element in sentencing, it was thought, might even reduce sentencing disparities between the races.

The results didn’t turn out entirely like people hoped. In a deep dive into the data, Megan T. Stevenson, a George Mason University professor,  and Jennifer L. Doleac, of Texas A&M, authors of, “Algorithmic Risk Assessment in the Hands of Humans,” found that the Virginia algorithm does influence outcomes: Defendants with higher risk scores got longer sentences and defendants with lower risk scores got shorter sentences. However, they found “no robust evidence that this reshuffling led to a decline in recidivism.”

While they found no evidence of an increase in racial disparities statewide, the authors did find that among the judges most likely to factor the risk scores into their sentencing decisions, there was a “relative increase in sentences.”

“Judges have their own sets of priorities, Stevenson and Doleac write. In Virginia judges tend to be far more lenient to young offenders than the algorithm suggests is optimal. “Attempts to nudge them towards particular policy goals via the risk assessment could backfire; judges may ignore the risk assessment altogether or respond strategically, using it to advance their own agenda.”

Since the 1980s Virginia has used voluntary sentence guidelines; judges are recommended, but not required, to sentence within a particular range, the authors write. In 1995 the state adopted a “truth-in-sentencing” reform that abolished parole and mandated that offenders serve at least 85% of their sentence. To free upstate prison space, the state also set the goal of diverting 25% of nonviolent offenders from jail or prison. To accomplish that goal, the Virginia Criminal Sentencing Commission developed an algorithm that computed a risk score for nonviolent offenders.

The score was developed by analyzing a randomly selected sample of 1,500 nonviolent offenders, considering such factors as age, employment, marital status, recent arrests, prior felonies and incarcerations, and the nature of the conviction (drug, larceny or fraud). Those whose risk scores are in the bottom 25% are recommended for diversion from jail or prison. The (state uses a separate risk assessment tool for sex offenders.)

In Virginia, juries determine sentences for about 2% of all felony convictions. Judges determine sentencing for bench trials, about 10% of convictions. The rest result from negotiated guilty pleas. However, the authors note, all plea negotiations must be approved by a judge. In sum, they argue, while judges are not the sole decision-makers in sentencing, they are the primary actors.

The authors perform a counter-factual analysis, asking what would have been the likely outcomes if the algorithm determined all sentencing. The results for youthful offenders were striking.

The relative probability of incarceration for young defendants would have increased by 15 percentage points, and relatively sentence lengths for young defendants would have increased by approximately 45%.

These simulations suggest that, even though age disparities increased after risk assessment was adopted, judicial discretion minimized the full impact on young people. Young age is one of the most important predictors of future offending and, accordingly, is given large weight in virtually every risk assessment tool. If the goal at sentencing is to prevent future crime by incarcerating those who pose the highest risk of committing it, then jails and prisons should be full of young people. Sentencing by algorithm would achieve just that.”

But there is a long tradition of leniency for teenagers and young adults. In Virginia, the authors write, it appears that judges are pursuing goals “that are in conflict with risk-based sentencing.”

The authors’ what-if analysis also predicts that full compliance with the algorithm would increase the incarceration of black defendants by 3.7%. Likewise, hewing to the algorithm would have increased black sentences relative to white by 8%. The flip side of that conclusion is that, thanks to judicial discretion, black incarceration is 3.7% lower than it would have been if all complied with the algorithm scores. 

A statewide survey found that only half of judges “always” or “almost always” consider the results of the risk scores. By contrast, 38% rely “primarily on judicial experience” when making decisions. As one judge put it, “I also don’t go to psychics.” Among the judges who do rely upon the scores, Stevenson and Doleac find, the probability of black incarceration relative to white defendants increased by 4% and the length of sentences increased 17%. 

Bacon’s bottom line: Creating sentencing algorithms that successfully reduce recidivism in a racially unbiased fashion is more difficult than people originally thought. For starters, as the authors write, “future criminal activity is hard to predict.” Then there is the issue of which factors to consider without creating unintentional racial biases; factors like employment, marital status and previous encounters with the criminal justice system are correlated with race. Next, there is the reality that many judges value their own judgment in specific cases over an algorithm score, with the potential that creates for bias — although the data suggest that the judges are less biased than the algorithm.

Finally, in age when everything is viewed through a racial prism, it is unavoidable that algorithms will be judged by the degree to which they aggravate or diminish racial disparities in sentencing.

Seeking an unbiased approach to nonviolent sentencing is a worthy objective, and so is the goal of reducing the incarceration rate without endangering the public. Just because Virginia’s system is imperfect doesn’t mean we should abandon it. Rather, studies like this remind us that we need to continue tinkering and refining the algorithm.