More and more people are reading health-related news stories from the internet due to ease of access. However, not all health-related news on the internet is reliable and there is mounting evidence of an “infodemic” of misinformation , which is exacerbated by sharing on social (Wang et al. 2019). Evaluating health-related news stories from the internet is difficult for two reasons: (1) news stories are abundant, and (2) evaluating health-related information requires expertise in the medical field, which most people do not have. Both health institutions and social media companies have undertaken several initiatives to combat the spread of health-related misinformation online. For example, the U.S. National Library of Medicine and National Institutes of Health (NIH) have developed tutorials to help people evaluate health-related information found on the internet. These tutorials require answering a list of questions to validate the information source and the veracity of the claims (NIH 2011; U.S. National Library of Medicine 2018). In addition, social media companies have employed third-party fact-checkers that issue warning labels to social media users about misleading or false information in an attempt to help people distinguish between reliable and unreliable news stories (Facebook n.d.).
Health tutorials and single-dimensional warnings issued by third-party fact-checkers may not be very effective for curbing misinformation for several reasons. Health tutorials are long and tedious, and people are neither equipped with the skills to evaluate health-related information nor have a desire to do so themselves. Single-dimensional warning labels issued by third-party fact-checkers may unnecessarily reduce complex health-related topics to simply being true or false. Healthcare experts, who have the expertise and the knowledge to evaluate highly specialized health-related information are better equipped to evaluate health-related news on a wide range of dimensions.
In this study, we combine lexicon-based text analysis with machine learning algorithms to create a model that evaluates health-related news stories disseminated online. We use Linguistic Inquiry and Word Count (LIWC) to analyze the linguistic content of 663 health-related news stories about medical interventions that are published in various media outlets during 2013-2018. We use Information Manipulation Theory (McCornack 1992; Zhou et al. 2020) to identify the relevant LIWC textual features for detecting health-related misinformation. We then develop an ensemble classifier that predicts the reliability of news stories for ten different criteria, and a second algorithm that predicts the overall reliability score of individual news stories. The results of this study can be used in the implementation of reliability badges on health-related news stories, which could be applicable in the context of social media and may help people identify unreliable health-related news more easily.