Using Reddit as a population-level “mental health tracker” during the COVID pandemic

Fig­ure 1. Men­tion of COVID-19–related words across men­tal health sup­port groups. Time­line land­marks were cho­sen from NBC News time­line giv­en that US users are the most preva­lent across Red­dit. Glob­al, Chi­na, and US con­firmed COVID-19 cas­es are dis­played. Low et al (2020).

Using machine learn­ing to track the pandemic’s impact on men­tal health (MIT News):

Deal­ing with a glob­al pan­dem­ic has tak­en a toll on the men­tal health of mil­lions of peo­ple. A team of MIT and Har­vard Uni­ver­si­ty researchers has shown that they can mea­sure those effects by ana­lyz­ing the lan­guage that peo­ple use to express their anx­i­ety online.

Using machine learn­ing to ana­lyze the text of more than 800,000 Red­dit posts, the researchers were able to iden­ti­fy changes in the tone and con­tent of lan­guage that peo­ple used as the first wave of the Covid-19 pan­dem­ic pro­gressed, from Jan­u­ary to April of 2020. Their analy­sis revealed sev­er­al key changes in con­ver­sa­tions about men­tal health, includ­ing an over­all increase in dis­cus­sion about anx­i­ety and suicide.

We found that there were these nat­ur­al clus­ters that emerged relat­ed to sui­ci­dal­i­ty and lone­li­ness, and the amount of posts in these clus­ters more than dou­bled dur­ing the pan­dem­ic as com­pared to the same months of the pre­ced­ing year, which is a grave con­cern,” says Daniel Low, a grad­u­ate stu­dent in the Pro­gram in Speech and Hear­ing Bio­science and Tech­nol­o­gy at Har­vard and MIT and the lead author of the study.

The Study:

Nat­ur­al Lan­guage Pro­cess­ing Reveals Vul­ner­a­ble Men­tal Health Sup­port Groups and Height­ened Health Anx­i­ety on Red­dit Dur­ing COVID-19: Obser­va­tion­al Study (Jour­nal of Med­ical Inter­net Research). From the abstract:

  • Back­ground: The COVID-19 pan­dem­ic is impact­ing men­tal health, but it is not clear how peo­ple with dif­fer­ent types of men­tal health prob­lems were dif­fer­en­tial­ly impact­ed as the ini­tial wave of cas­es hit.
  • Objec­tive: The aim of this study is to lever­age nat­ur­al lan­guage pro­cess­ing (NLP) with the goal of char­ac­ter­iz­ing changes in 15 of the world’s largest men­tal health sup­port groups (eg, r/schizophrenia, r/SuicideWatch, r/Depression) found on the web­site Red­dit, along with 11 non–mental health groups (eg, r/PersonalFinance, r/conspiracy) dur­ing the ini­tial stage of the pandemic.
  • Results: We found that the r/HealthAnxiety forum showed spikes in posts about COVID-19 ear­ly on in Jan­u­ary, approx­i­mate­ly 2 months before oth­er sup­port groups start­ed post­ing about the pan­dem­ic. There were many fea­tures that sig­nif­i­cant­ly increased dur­ing COVID-19 for spe­cif­ic groups includ­ing the cat­e­gories “eco­nom­ic stress,” “iso­la­tion,” and “home,” while oth­ers such as “motion” sig­nif­i­cant­ly decreased. We found that sup­port groups relat­ed to atten­tion-deficit/hy­per­ac­tiv­i­ty dis­or­der, eat­ing dis­or­ders, and anx­i­ety showed the most neg­a­tive seman­tic change dur­ing the pan­dem­ic out of all men­tal health groups. Health anx­i­ety emerged as a gen­er­al theme across Red­dit through inde­pen­dent super­vised and unsu­per­vised machine learn­ing analy­ses … Using unsu­per­vised clus­ter­ing, we found the sui­ci­dal­i­ty and lone­li­ness clus­ters more than dou­bled in the num­ber of posts dur­ing the pandemic.
  • Con­clu­sions: By using a broad set of NLP tech­niques and ana­lyz­ing a base­line of prepan­dem­ic posts, we uncov­ered pat­terns of how spe­cif­ic men­tal health prob­lems man­i­fest in lan­guage, iden­ti­fied at-risk users, and revealed the dis­tri­b­u­tion of con­cerns across Red­dit, which could help pro­vide bet­ter resources to its mil­lions of users. We then demon­strat­ed that tex­tu­al analy­sis is sen­si­tive to uncov­er men­tal health com­plaints as they appear in real time, iden­ti­fy­ing vul­ner­a­ble groups and alarm­ing themes dur­ing COVID-19, and thus may have util­i­ty dur­ing the ongo­ing pan­dem­ic and oth­er world-chang­ing events such as elec­tions and protests.

The authors fur­ther explain that “Through­out many sub­red­dits, we found sig­nif­i­cant increas­es in the use of tokens relat­ed to iso­la­tion (eg, “lone­ly,” “can’t see any­one,” “quar­an­tine”), eco­nom­ic stress (eg, “rent,” “debt,” “pay the bills”), and home (“fridge,” “pet,” “lease”), and a decrease in the lex­i­con relat­ed to motion (eg, “walk,” “vis­it,” “trav­el”).”

The Study in Context:

