As a US military veteran and patient of the Veteran’s Administration, I’m glad to see the VA using all the tools at their disposal to identify veterans at risk for suicide. I’ve lost friends to suicide, so it’s a subject I care about.
But as a data scientist, a recent NextGov article about the predictive analytics being used to identify veterans at risk for suicide was troubling.
Let’s start with the obvious:
“I will say that one of the challenges in applying predictive analytics and providing that information to clinicians, is it’s really important that clinicians understand why the veteran is being identified,” he said. “But that is really hard to do with a true machine-learning approach because we don’t really know why the machine is identifying somebody.” – Aaron Eagan Deputy Director for Innovation and Program Development
I’m not sure I understand what Mr. Eagan understands to be a “true” machine-learning approach, but I will offer this suggestion:
If you don’t understand the operation of a model well enough to know how a model arrives at its output, you shouldn’t be using it.
As a data science educator, please let me explain.
There is a range of advanced statistical and machine learning approaches that can be applied to the available data in order to generate a deeper understanding of the underlying phenomenon the data describes. Some are very simple (linear regression and decision trees), some are very complex and opaque (neural networks), with most falling into the wide spectrum of clarity and transparency in the middle.
While the math powering these approaches can be complex, the underlying intuition is often fairly straightforward. In this case, there are a set of patient attributes that the model identifies as being predictive of whether the patient is at risk for suicide. These can be in isolation or in some combination (possibly many various combinations of many attributes) that appears predictive based on the data of past instances the model was trained on.
This is important because models only “learn” the attributes (or features) of the data we give the model to train on. The model understands these as numbers in a matrix upon which it does calculations to arrive at a result. The model has no way of discerning whether a feature is actually related to the outcome. The model only knows that in the limited view of the world it was provided in the sample data, this proved significant.
For example, say the researchers designing this model included the patient’s height in the features. In the dataset, there were two patients who were 6’8″ or taller and both of them reported suicidal ideation or had committed suicide. In the model’s understanding of the world of patients at the VA, being 6’8″ or taller is a strong predictor (100%) of suicidal ideation because it has no understanding of how relatively rare that height is in the general population or that height is likely not related to thoughts of suicide (or at least hasn’t been empirically shown to be so).
I don’t know whether the VA submitted their model to the kind of rigorous validation and verification that would eliminate these kinds of errors from occurring, but the quoted statement above doesn’t give me much confidence that they understand how the model is arriving at its decisions.
Which leads me to this:
If you can’t explain the model, you don’t understand the model, and if you don’t understand model, but are making decisions based on the model outputs, you’re just as likely to make the wrong decision as the right one.
No machine learning model is absolutely right all of the time. Being able to identify those classification errors is important and only possible when you understand how and why the model gets its predictions wrong.
As a patient in the VA system, I was further concerned that the outputs of this model were shared with clinicians without any context for the decision. What does it mean if I get flagged in the system? How do I know it wasn’t because of my height, or hair color, or some unimportant characteristic the machine falsely flagged as significant? Will I get different treatment? Will I be in a situation where my rights as a patient are denied because a black box system no one can describe says that’s what’s best for me?
This is a pivotal issue whenever we bring predictive analytics/machine learning/artificial intelligence into the public space. It’s bad enough when a private company takes license with these tools to create a black box for providing their services, but in the public space, decisions are made without the transparency we expect and as a person in this system, I’d rather have a less accurate decision I can understand and can be explained to me than a potentially more accurate one that I can’t and can’t be.
At best, these models are decision support tools, and like any tool, they need to be understood in order to be employed properly without causing any undue harm to others. I just hope the VA gives more thought to this than they seem to present in this article. My brothers and sisters in arms deserve this, as do we all.
This post was originally published September 24, 2019 on LinkedIn as “Predictive Analytics at the VA: The Dangers of the Black Box“.