Virginia Judges, 1; Artificial Intelligence, 0

by James A. Bacon

It sounded like such a good idea: Develop a criminal-sentencing algorithm to help judges identify felons least likely to reoffend and either give them shorter jail sentences or divert them to probation or substance-abuse treatment programs. Virginia created just such an algorithm in 2001. Minimizing the subjective element in sentencing, it was thought, might even reduce sentencing disparities between the races.

The results didn’t turn out entirely like people hoped. In a deep dive into the data, Megan T. Stevenson, a George Mason University professor,  and Jennifer L. Doleac, of Texas A&M, authors of, “Algorithmic Risk Assessment in the Hands of Humans,” found that the Virginia algorithm does influence outcomes: Defendants with higher risk scores got longer sentences and defendants with lower risk scores got shorter sentences. However, they found “no robust evidence that this reshuffling led to a decline in recidivism.”

While they found no evidence of an increase in racial disparities statewide, the authors did find that among the judges most likely to factor the risk scores into their sentencing decisions, there was a “relative increase in sentences.”

“Judges have their own sets of priorities, Stevenson and Doleac write. In Virginia judges tend to be far more lenient to young offenders than the algorithm suggests is optimal. “Attempts to nudge them towards particular policy goals via the risk assessment could backfire; judges may ignore the risk assessment altogether or respond strategically, using it to advance their own agenda.”

Since the 1980s Virginia has used voluntary sentence guidelines; judges are recommended, but not required, to sentence within a particular range, the authors write. In 1995 the state adopted a “truth-in-sentencing” reform that abolished parole and mandated that offenders serve at least 85% of their sentence. To free upstate prison space, the state also set the goal of diverting 25% of nonviolent offenders from jail or prison. To accomplish that goal, the Virginia Criminal Sentencing Commission developed an algorithm that computed a risk score for nonviolent offenders.

The score was developed by analyzing a randomly selected sample of 1,500 nonviolent offenders, considering such factors as age, employment, marital status, recent arrests, prior felonies and incarcerations, and the nature of the conviction (drug, larceny or fraud). Those whose risk scores are in the bottom 25% are recommended for diversion from jail or prison. The (state uses a separate risk assessment tool for sex offenders.)

In Virginia, juries determine sentences for about 2% of all felony convictions. Judges determine sentencing for bench trials, about 10% of convictions. The rest result from negotiated guilty pleas. However, the authors note, all plea negotiations must be approved by a judge. In sum, they argue, while judges are not the sole decision-makers in sentencing, they are the primary actors.

The authors perform a counter-factual analysis, asking what would have been the likely outcomes if the algorithm determined all sentencing. The results for youthful offenders were striking.

The relative probability of incarceration for young defendants would have increased by 15 percentage points, and relatively sentence lengths for young defendants would have increased by approximately 45%.

These simulations suggest that, even though age disparities increased after risk assessment was adopted, judicial discretion minimized the full impact on young people. Young age is one of the most important predictors of future offending and, accordingly, is given large weight in virtually every risk assessment tool. If the goal at sentencing is to prevent future crime by incarcerating those who pose the highest risk of committing it, then jails and prisons should be full of young people. Sentencing by algorithm would achieve just that.”

But there is a long tradition of leniency for teenagers and young adults. In Virginia, the authors write, it appears that judges are pursuing goals “that are in conflict with risk-based sentencing.”

The authors’ what-if analysis also predicts that full compliance with the algorithm would increase the incarceration of black defendants by 3.7%. Likewise, hewing to the algorithm would have increased black sentences relative to white by 8%. The flip side of that conclusion is that, thanks to judicial discretion, black incarceration is 3.7% lower than it would have been if all complied with the algorithm scores. 

A statewide survey found that only half of judges “always” or “almost always” consider the results of the risk scores. By contrast, 38% rely “primarily on judicial experience” when making decisions. As one judge put it, “I also don’t go to psychics.” Among the judges who do rely upon the scores, Stevenson and Doleac find, the probability of black incarceration relative to white defendants increased by 4% and the length of sentences increased 17%. 

Bacon’s bottom line: Creating sentencing algorithms that successfully reduce recidivism in a racially unbiased fashion is more difficult than people originally thought. For starters, as the authors write, “future criminal activity is hard to predict.” Then there is the issue of which factors to consider without creating unintentional racial biases; factors like employment, marital status and previous encounters with the criminal justice system are correlated with race. Next, there is the reality that many judges value their own judgment in specific cases over an algorithm score, with the potential that creates for bias — although the data suggest that the judges are less biased than the algorithm.

Finally, in age when everything is viewed through a racial prism, it is unavoidable that algorithms will be judged by the degree to which they aggravate or diminish racial disparities in sentencing.

Seeking an unbiased approach to nonviolent sentencing is a worthy objective, and so is the goal of reducing the incarceration rate without endangering the public. Just because Virginia’s system is imperfect doesn’t mean we should abandon it. Rather, studies like this remind us that we need to continue tinkering and refining the algorithm.

There are currently no comments highlighted.

8 responses to “Virginia Judges, 1; Artificial Intelligence, 0

  1. Algorithms and actuarial studies have somewhat limited use. Up until a year and 1/2 ago in Virginia, a man was considered for processing in court as a sexually violent predator (“SVP”) if he had an actuarial score of 5 on what’s known as the Static-99 (scored from -3 to 12). After a trial, if found to be an SVP, he was committed indefinitely to a secure facility in Burkeville.

    Some of you will remember the name Ariel Castro, the Cleveland man who kept four women locked up in his house for some 20 years, alternately raping and having children by them, then leaving them locked up while he played in a band in various bars. Ariel Castro would’ve scored a 2 on the Static 99.

    The law has recently changed in Virginia so that a score of 5 is not the only thing considered. I’ve yet to determine whether that’s better or worse than the prior system.

  2. I’d be curious to know if race is considered in the algorithm and if if it is not do the suggested guidelines turn out the same for black and white?

    • Race is excluded. The argument is that other factors such as employment, marital status and previous encounters with the criminal justice system are so closely correlated with race that the algorithm is indirectly biased. I believe I recall reading that the Virginia Criminal Sentencing Commission has since deleted employment and marital status from the algorithm.

  3. Thanks. Here’s the thing – anytime any process or algorithm that is supposed to be random/unbiased ends up having a disparate impact – we are left with one of two reasons.

    1. – other the process/algorithm has an implicit bias
    2. – the folks affected actually have something inherent in their group that is different from the other groups.

    So for some folks, whether it be criminal behavior or (for instance), the ability to learn – they consider it a race/culture/similar difference.

    In other words, are certain culture/races more inclined to commit more crime?

    OR, do our processes/algorithms falsely indicate it?

  4. I note this article in WaPo this morning:

    With unanimous vote, Montgomery passes wide-ranging racial equity bill

    The Montgomery County Council unanimously passed a sweeping racial equity bill Tuesday, joining dozens of liberal jurisdictions across the country attempting to use legislation to address long-standing racial disparities.

    The Racial Equity and Social Justice Act, which some advocates commend as one of the strongest of its kind, will mandate racial equity training for more than 8,000 full-time government employees in the suburban county of 1 million people.

    It requires that every bill considered by the council include a statement detailing the proposal’s impact on equity among different demographic groups, and it establishes an Office of Racial Equity and Social Justice with an annual operating budget of $375,860.

    The legislation also requires every government agency and department to develop an action plan by Sept. 30 to address racial disparities, which include a poverty rate for black and Latino residents that is nearly triple that of white residents.”

    I find the last sentence particularly striking – in a county that prides itself on diversity – the poverty rate for blacks and Hispanics is THREE TIMES that of whites.

    How can that be on an income basis but similar disparities in criminal justice?

  5. I appreciate Jim highlighting the article on the nonviolent risk assessment instrument. I have not had a chance to read it, but I intend to. Nevertheless, I do have a few comments because I am familiar with the Virginia Criminal Sentencing Commission and its sentencing guidelines and the nonviolent risk assessment instrument.

    Reduction of recidivism is not the goal of the nonviolent sentencing instrument. As a recent annual report of the Sentencing Commission put it, “The goal of the nonviolent risk assessment instrument is to divert low-risk offenders who are recommended for incarceration on the guidelines to an alternative sanction other than prison or jail.” The late Gene Johnson, a long-time fixture in the Department of Corrections, including many years as director, used to say often, “We shouldn’t be putting people we are mad at in prison, just those we are afraid of.”

    The instrument identifies those nonviolent felony offenders who have the lowest risk of reoffending if given a sentence other than prison. The use of the instrument by judges is voluntary; it is intended to provide guidance. As the article points out, judges frequently sentence contrary to the guidance provided by the instrument.

    Race is not a factor used in the application of the instrument. However, age is strongly correlated to reoffending and thus is an important factor. The study authors point out that judges tend to be more lenient (provide an alternative sentence to prison) for young offenders than the risk assessment instrument would recommend. Therefore, to the extent that young black men are over-represented in the convicted felony pool, then letting the algorithm “play out” would increase the incarceration of black defendants, both absolutely and relative to those of white defendants.

    From what has been reported so far, the results of the study are heartening. They show that judges are taking individual circumstances into consideration and not “sentencing by the numbers”. In some cases, they may be taking a chance. In other words, they are doing their job.

  6. An algorithm put into work in 2001 is probably not really AI based but an algorithm nonetheless.

    • yep – not AI….

      As good as these programs are – the fact that we are still imprisoning blacks at a higher rate than their demographic percentages – troubles me and in my view – that issues should be at the top of the heap…

Leave a Reply