Are We Building Robust AI for Mental Health Prediction in Social Media?

A drawing of a brain that is being converted into a circuit board

[A shortened version of this post appears on the npj digital medicine website]

Did you know that Facebook has a “suicide prevention AI”? The popular social media website uses behavioral and linguistic patterns to guess if someone may harm themselves. NPR reports that this AI detects about 10 people every day. From there, Facebook can make an intervention to potentially save someone’s life. Now, Facebook isn’t the only one who has used social media data to make predictions about people’s well-being. In fact, researchers have been investigating this for nearly a decade — and these systems boast great promise for lifesaving and cost-reducing applications.

A crucial part of this work is the trustworthy evaluation of mental health from social media data. Without a doctor to make diagnosis or a screener to evaluate, how can we make sure the social media signals are measuring the thing we hope they measure? If these answers are wrong and we make them without human oversight, we risk making life-altering mistakes. We may allocate resources to a person who is not in distress and be overbearing, or the reverse, and miss a person who desperately needs assistance. As researchers who conduct this work, we were curious about practices in the larger research community — we wanted to know about the methods and dataset selection when important fields work together, like machine learning and clinical psychiatry. We set out to answer those questions.

In our paper in NPJ Digital Medicine, we studied 75 scientific papers for predicting mental illness using machine learning and social media data — the state-of-the-art in predictions within the space over the last five years. We look at two dimensions: whether the data could extract signals around mental illness and if the AI systems themselves were mathematically correct and adhered to scientific standards for machine learning

What we found was that most papers have large gaps in scientific reporting and validation strategies for “clinical” assessment that is essential to making trustworthy predictions. This has serious consequences. Failing to validate one’s data can result in the risks we mentioned earlier, incorrectly allocating already limited monetary and human resources. Not following reporting standards can make it difficult, even borderline impossible for fellow scientists to independently evaluate and confirm a study. Either can be enough to prevent accurate research — in conjunction, they pose a major barrier to solving the pressing challenges posed by AI applied to mental health.

Social Media and Mental Health Prediction

You have probably felt “anxious” in your life — public speaking makes almost all of us uneasy, for example. But you may have also felt anxious as a longer-term emotional state around an uncertain job search or challenging problem. Perhaps you know someone who suffers from anxiety.

Anxiety is a word with lots of meanings — the term is overloaded because it has several definitions both formally informally. It includes a category of disorders, the symptom of anxiety (that happens with other disorders), an emotion people occasionally feel, or casual use that describes a personality trait.

In our study, we examined if the papers defined what they were studying — for anxiety and other disorders and symptoms, such as depression, eating disorders, and suicidal thoughts. We found seven types of “proxy signals” were used to make these decisions, such as using hashtags like #depressed or a self-disclosure like posting “I was diagnosed with anorexia”. On first glance, this seems promising — who would participate in a depression community if they weren’t depressed?

What we discovered is that very few studies define what they mean when they explore certain terms — we couldn’t tell if people meant they wanted to study anxiety the emotion, anxiety the symptom, or anxiety the disorder. Even more concerning is that very few studies went back to check if those proxy signals actually mapped to a good measurement of mental illness and symptoms.

In addition to studying proxy signals from social media, we also wondered if the AI models were mathematically correct and reported so that later work could produce the same results. We cataloged the data collection and study design as almost all papers in the dataset used machine learning to predict the presence of mental illness or an important symptom.

To look at whether these papers could be reproduced, we looked at five factors that are essential for machine learning applications. These include how big and who is in the dataset, the variables used for prediction, and the statistics for how well the model performed, like accuracy. What we found alarmed us — only 32 of 75 papers, or 42%, reported on all five of these factors and therefore could be reproduced. We noticed that many papers did not say how many variables they used or what they were measuring, which leads to models that will not perform reliably over time.

In light of these results, we are hopeful the field can improve in these areas. In the paper, we generate a list of reporting standards for what must be included in these papers. We also discuss promising opportunities for collaborations with medical researchers to align proxy signals with clinical findings. These both are crucial in developing trustworthy and accurate predictions that leads to better research, and eventually better tools that we all can use for helping solve the pressing challenge of mental illness diagnosis and treatment.

The paper is available here (open access, so everyone can read it!). We’d love to hear your thoughts or comments on this phenomenon.

Short version of the methods: This paper is based out of a larger literature review of the state of the art in predicting mental illness from social media data. From 4400 papers, we found 75 papers published between 2013–2018 that discuss prediction, mental illness, and social media data. By reading each paper and qualitatively analyzing them, we analyzed the data collection, methods, results, and analysis methods, and examined the questions we mentioned above.

Full paper citation: Chancellor, S., De Choudhury, M. Methods in predictive techniques for mental health status on social media: a critical review. npj Digit. Med. 3, 43 (2020).

Professor at Minnesota CS, Georgia Tech PhD. Human-centered machine learning, work/life balance, and productivity. @snchancellor on Twitter