One possible remedy would be to impose fairness constraints, such as incorporating criteria requiring algorithms to output similar proportions of positive outcomes for various communities, particularly around gender, race, or socioeconomic status. A model could be written so that if 10 percent of White defendants were admitted to a diversion program, 10 percent of Black defendants and 10 percent of Asian defendants would receive that same positive outcome. However, these types of criteria are often imposed without a full understanding of the underlying group dynamics (in the case of the above example, how different groups may react differently to the 10 percent admission rate), which research—by the Ohio State University’s Xueru Zhang and Mahdi Khalili, Bilkent University’s Cem Tekin, and University of Michigan’s Mingyan Liu—has shown can have unintended and likely negative consequences.
In a 2022 paper about routing, Zhang, Shi, and Ward considered this thorny issue of fairness. If you force equal outcomes, you’re essentially ignoring real differences that might exist between groups, not because of inherent traits but because of circumstances. Say people with tattoos are less affluent than people without them, on average. Over time you would likely see a lower proportion of tattooed applicants succeed because they had less support.
The cost of fairness
This raises a question about fairness and the associated trade-offs. In practice, if judges or caseworkers knew certain people from groups were more likely to complete treatment successfully, they could reduce crime more effectively by giving those groups preferential access to a program. But this efficiency gain would come at the expense of fairness. Is it acceptable to give some people an edge in accessing services and additional resources on the basis of group characteristics rather than individual merit?
Zhang, Shi, and Ward don’t directly address the question of fairness, but they can quantify how much it would cost a theoretical queueing system to treat all groups equally versus optimizing for the best overall outcomes. To do so, they used a theoretical, mathematical “survival” model often employed to predict patient outcomes in medical research. The model involves two customer groups with different risk distributions and different responses to intensive versus standard services.
The researchers tested two policies: a “fair” approach that gives both groups equal chances at intensive services, and a potentially “unfair” policy that prioritizes the group more likely to succeed when receiving such services. The cost of fairness can be substantial—in some scenarios, fair policies reduced system efficiency by more than 15 percent, the researchers find. Even more troubling, policies designed to be fair in the short run can exacerbate unfairness in the long run. This is due to a broader issue: Metrics that look good after one year may look poor after 10 years. If a policy leads to most defendants being put behind bars for 18 months, the one-year metrics may look great because no one has recidivated. But after 10 years, without attempts to address root-cause issues through rehabilitation or community support, the amount of recidivism may be much worse.
Optimizing for long-term good
In another study that’s still in early stages, Shi and Ward, this time with Booth principal researcher Chuwen Zhang, employed an equilibrium approach, similar to that used to study long-term behavior in epidemics, that allowed them to approach resource allocation like a chess match, examining the interactions between three key players: an offender, a policymaker, and a member of the general public.
In their model, each individual makes decisions based on the others’ moves. The offender is less likely to commit a crime if provided support when reentering society. The policymaker allocates limited resources—probation officers, prison beds, or slots in diversion programs—to reduce crime while maintaining public support. Meanwhile, public attitudes toward crime and punishment influence both policy implementation and effectiveness.
Their theoretical framework suggests there’s an ongoing struggle: When the justice system changes its approach, such as by increasing probation-officer staffing or expanding diversion programs, people involved in it are likely to adjust their behavior as a result.
Rather than examine the immediate effects of these algorithm-informed policies, the researchers focused on long-term equilibrium, or what happens when everyone has figured out how to respond and the system reaches new balance. They find that well-intentioned policies can backfire unexpectedly—as would be the case with locking up everyone for 18 months. Understanding how the whole system evolves over time is crucial for designing effective policies, the researchers write. When evaluating an algorithm, we should wonder what happens when we follow it in practice, across different groups and over years, to see the effects on long-term public safety and well-being. Their framework lays the groundwork for developing a tool to do this.
Models reflect values
An emphasis on long-term thinking leads to this crucial consideration: We shouldn’t be focusing on fully automating criminal-justice decisions but instead be determining where we can trust algorithms and when humans should intervene. Research suggests automation works best at extremes—when program capacity is either limited or abundant, it creates clearer decision boundaries. Human oversight proves most valuable in the middle ranges, where admission thresholds are fuzzier.
But technical decisions reflect profound value judgments. For example, emphasizing short-term savings versus long-term societal benefits can produce dramatically different recommendations from identical data. There’s no inherently optimal policy; the model simply reflects whatever values humans build into the system.
As policymakers, program directors, and researchers weigh the benefits of fairness versus efficiency, or algorithmic versus human decision-making, the big picture is that the status quo perpetuates cycles of incarceration and reoffending. Thoughtfully redesigned systems could help break these cycles while protecting communities. Used well, algorithms could help make this a reality. Used badly, they risk entrenching existing inequalities under the guise of scientific legitimacy.
Early results offer genuine hope. A spokesperson for the Adult Redeploy Illinois program says that “the evidence-informed practices employed by ARI sites, such as cognitive behavioral therapy, have been shown to reduce recidivism rates by 20 percent or more in some cases.” In terms of per-year costs, sending someone to the Illinois Department of Corrections is $49,000, whereas the average ARI intervention is $5,000. “For state fiscal year 2025, ARI reported an estimated $83 million in total costs avoided,” according to the spokesperson, who says that “programs can benefit from the use of tools that support objective decision-making and effective resource allocation.”
Could AI tools help make the financial impact even bigger? And could they keep more people from returning to prison? “The promise of AI is to improve our lives,” says Ward. “Let’s hold it to that standard.”
