Dealing with a global pandemic has taken a toll on the mental health of millions of people. A team of MIT and Harvard University researchers has shown that they can measure those effects by analyzing the language that people use to express their anxiety online.
Using machine learning to analyze the text of more than 800,000 Reddit posts, the researchers were able to identify changes in the tone and content of language that people used as the first wave of the Covid-19 pandemic progressed, from January to April of 2020. Their analysis revealed several key changes in conversations about mental health, including an overall increase in discussion about anxiety and suicide.
“We found that there were these natural clusters that emerged related to suicidality and loneliness, and the amount of posts in these clusters more than doubled during the pandemic as compared to the same months of the preceding year, which is a grave concern,” says Daniel Low, a graduate student in the Program in Speech and Hearing Bioscience and Technology at Harvard and MIT and the lead author of the study.
Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study (Journal of Medical Internet Research). From the abstract:
- Background: The COVID-19 pandemic is impacting mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit.
- Objective: The aim of this study is to leverage natural language processing (NLP) with the goal of characterizing changes in 15 of the world’s largest mental health support groups (eg, r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with 11 non–mental health groups (eg, r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic.
- Results: We found that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately 2 months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress,” “isolation,” and “home,” while others such as “motion” significantly decreased. We found that support groups related to attention-deficit/hyperactivity disorder, eating disorders, and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses … Using unsupervised clustering, we found the suicidality and loneliness clusters more than doubled in the number of posts during the pandemic.
- Conclusions: By using a broad set of NLP techniques and analyzing a baseline of prepandemic posts, we uncovered patterns of how specific mental health problems manifest in language, identified at-risk users, and revealed the distribution of concerns across Reddit, which could help provide better resources to its millions of users. We then demonstrated that textual analysis is sensitive to uncover mental health complaints as they appear in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests.
The authors further explain that “Throughout many subreddits, we found significant increases in the use of tokens related to isolation (eg, “lonely,” “can’t see anyone,” “quarantine”), economic stress (eg, “rent,” “debt,” “pay the bills”), and home (“fridge,” “pet,” “lease”), and a decrease in the lexicon related to motion (eg, “walk,” “visit,” “travel”).”
The Study in Context:
- Microsoft announces support for three innovative mental health services harnessing artificial intelligence (AI)
- Three ways to protect your mental health during –and after– COVID-19
- Debate the Future of Mental Health in North America: In ten years, will we see DSM‑6 or Something Much Better (SMB?1)?