Dynamics of the Negative Discourse Toward COVID-19 Vaccines : Topic Modeling Study and an Annotated Data Set of Twitter Posts

Thumbnail Image
Access rights
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
Degree programme
BACKGROUND: Since the onset of the COVID-19 pandemic, vaccines have been an important topic in public discourse. The discussions around vaccines are polarized, as some see them as an important measure to end the pandemic, and others are hesitant or find them harmful. A substantial portion of these discussions occurs openly on social media platforms. This allows us to closely monitor the opinions of different groups and their changes over time. OBJECTIVE: This study investigated posts related to COVID-19 vaccines on Twitter (Twitter Inc) and focused on those that had a negative stance toward vaccines. It examined the evolution of the percentage of negative tweets over time. It also examined the different topics discussed in these tweets to understand the concerns and discussion points of those holding a negative stance toward the vaccines. METHODS: A data set of 16,713,238 English tweets related to COVID-19 vaccines was collected, covering the period from March 1, 2020, to July 31, 2021. We used the scikit-learn Python library to apply a support vector machine classifier to identify the tweets with a negative stance toward COVID-19 vaccines. A total of 5163 tweets were used to train the classifier, of which a subset of 2484 tweets was manually annotated by us and made publicly available along with this paper. We used the BERTopic model to extract the topics discussed within the negative tweets and investigate them, including how they changed over time. RESULTS: We showed that the negativity with respect to COVID-19 vaccines has decreased over time along with the vaccine rollouts. We identified 37 topics of discussion and presented their respective importance over time. We showed that popular topics not only consisted of conspiratorial discussions, such as 5G towers and microchips, but also contained legitimate concerns around vaccination safety and side effects as well as concerns about policies. The most prevalent topic among vaccine-hesitant tweets was related to the use of messenger RNA and fears about its speculated negative effects on our DNA. CONCLUSIONS: Hesitancy toward vaccines existed before the COVID-19 pandemic. However, given the dimension of and circumstances surrounding the COVID-19 pandemic, some new areas of hesitancy and negativity toward COVID-19 vaccines have arisen, for example, whether there has been enough time for them to be properly tested. There is also an unprecedented number of conspiracy theories associated with them. Our study shows that even unpopular opinions or conspiracy theories can become widespread when paired with a widely popular discussion topic such as COVID-19 vaccines. Understanding the concerns, the discussed topics, and how they change over time is essential for policy makers and public health authorities to provide better in-time information and policies to facilitate the vaccination of the population in future similar crises.
Publisher Copyright: ©Gabriel Lindelöf, Talayeh Aledavood, Barbara Keller. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 12.04.2023.
COVID-19 vaccines, machine learning, natural language processing, SARS-CoV-2, social media, stance detection, topic modeling, Twitter, vaccine hesitancy
Other note
Lindelöf, G, Aledavood, T & Keller, B 2023, ' Dynamics of the Negative Discourse Toward COVID-19 Vaccines : Topic Modeling Study and an Annotated Data Set of Twitter Posts ', JOURNAL OF MEDICAL INTERNET RESEARCH, vol. 25, e41319, pp. 1-16 . https://doi.org/10.2196/41319