Big Data And Bad Cops: Can An Algorithm Predict Police Misconduct? (Part 1)
Police misconduct comes in many forms. From a rude interaction with civilians to a fatal, unjustified shooting. At worst these acts are criminal. At best they erode public trust and make it harder for good cops to do their jobs.
But what if there was a computer program that could identify problem officers BEFORE an incident takes place? What if that same program could even identify what makes an otherwise good officer go bad and allow the police department to predict and prevent misconduct?
Over the past year the Charlotte-Mecklenburg Police Department has partnered with big data researchers to see if such a program can be created.
THE THREE STRIKES SYSTEM
Computer programs that flag ‘problem officers’ aren’t exactly new. The CMPD has had one for more than a decade. Crystal Cody manages the program. It involves three strikes over a set period of time. “The longest period of time we look at is 180 days.”
The strikes could be something significant, like firing a gun. Or something more mundane, a traffic accident or a citizen complaining the officer wasn’t courteous enough.
Whatever the case, the officer gets flagged. Which sets off a range of official actions. “You know, maybe we need to intervene and speak to the officer,” explains Cody, “or give them additional training or counseling or whatever that may be.”
It is a blunt system. Efficient at identifying true problem officers. But one that doesn’t take into account nuance or key facts.
What if the traffic accident wasn’t the officer’s fault? The civilian complaint was unfounded? The use of force was justified?
Cody says all this led to, “a huge amount of false positives and that’s counterproductive to what we want to do.”
False positives eat up department resources and take good cops out of the rotation for unneeded extra training or talks with their supervisors. And there’s another problem, its reactive nature, says Major Johnny Jennings, the commander of CMPD’s Internal Affairs division. Their job is to police the police.
“There are times when we can fail our own employees,” especially, says Jennings, by the time his folks get involved. “If there are things that we could have identified earlier that could have probably thwarted that behavior and not reached that level.” It’s something he’s seen again and again during his time in Internal Affairs.
Moments when an otherwise good cop was pushed to the brink – and then cracked. The CMPD wanted to see if there was a way to predict those moments allowing them to intervene ahead of time and stop it from happening.
In 2015, the CMPD opted to take part in the White House Police Data Initiative.
They were the first police force in the country to do so.
The experiment began when a small team of researchers was given access to a huge amount of CMPD data. “We’re talking millions of records.”
Joe Walsh is a researcher with the University of Chicago’s Center for Data Science and Public Policy. “We have all the arrests, all the field interviews, all the dispatches. All the training. All the internal affairs investigations data going back to at least 2005.”
The next step for the team was to come down to Charlotte in order to learn how to read the treasure trove of data they’d been given. They did ride-alongs, officer focus groups and interviewed support staff.
Armed with a better understanding of the data they went back to Chicago and started building a program to predict the officers that would have an “adverse incident” which Walsh describes can be anything from “a sustained complaint to a preventable accident to an unjustified use of force.”
At the heart of that program is an algorithm.
And there are different types of algorithms out there. Some are good at deciding what ads you see online. Others affect what’s found in your Facebook feed and Google search results.
Walsh says the researchers tried a bunch of these before settling on one. “We’re using an algorithm called random forest.”
It’s cutting edge big data that works on a principle that may date back to Aristotle, the wisdom of the crowd.
HOW 'RANDOM FOREST' WORKS
The ‘forest’ focuses on a big question. As an example, let’s say you want to know how likely it is for a particular person to drink a cup of coffee in the next hour.
Walsh explains the forest starts small. “There’s something called a decision tree.”
The decision tree answers, in essence, a yes or no question. Like does that person drink coffee? Yes or no.
That eliminates a lot of people, but it’s not exactly the most accurate. But fear not, “you can go a step further and add another variable.” Like:
- Did the person just stop at a coffee shop?
- Did they buy a coffee?
- Did they brew some at home?
- Are they thirsty?
- Are they sleepy?
You get the point.
Each variable is a decision tree and build enough trees and you have a random forest. In theory, if the forest is big enough you can predict, with high degrees of accuracy, all kinds of things.
And, in theory, going through the data you can identify what causes a good cop to go bad.
But if you build a forest that’s too big, has too many variables, the results can be skewed, and accuracy falls.
So far the algorithm is still being tested but there are results says CMPD’s Crystal Cody. “It’s finding connections and patterns that we wouldn’t necessarily see from the human perspective.”
So what has the CMPD data shown? Can such an algorithm really change modern policing?
That story in part 2 of our series.