Sentiment Analysis and Paired Sample T-test on Indian Tweets during COVID-19
Analysis By :- Kopal Sharma, Heetakshi Shah, Shubham Kanade, Manas Malhotra, Rutuja Suryavanshi, Abhimanyu Bhola
ABSTRACT
COVID-19 (Corona Virus Disease 2019) has resulted in a significantly large number of psychological consequences. The aim of this study is to explore the impacts of COVID-19 on people’s mental health, to assist policy makers to develop actionable policies, and help clinical practitioners (e.g., social workers, psychiatrists, and psychologists) provide timely services to affected populations.
For this project we extracted and analyzed tweets of 10,000 active Indian users of twitter from the period one week before the announcement of first lockdown (w.e.f. 18th March to 20th March) and 10,000 tweets from just after the first lockdown (w.e.f. 25th March -27th March).
We calculated word frequency, scores of emotional indicators (e.g., anxiety, depression, indignation, and happiness) from the collected data. The sentiment analysis and the paired sample t-test were performed to examine the differences in the same group before and after the declaration of all India lockdown on 25 March, 2020.
EXTRACTING THE TWEETS
Our team extracted 10,000 tweets about coronavirus and related words for each timeframe, before and after lockdown, by using the Tweepy API in a python script.
The dataset for each of the periods, before and after, was extracted independent of each other and then stored in a CSV format for suitable analysis.
DATA CLEANING AND TOKENIZATION
First, the text column is pre-processed with making all characters lowercase, removing all punctuation marks, white spaces, and common words (stop words), usernames associated with tweets and storing the cleaned tweets in a column named tidy_tweets in the Data Frame.
After pre-processing, we used the tokenization method to grab the word combinations in the two datasets.
SENTIMENT WORD CLOUD FOR BEFORE LOCK-DOWN
SENTIMENT WORD CLOUD FOR AFTER LOCK-DOWN
We can see in the word cloud that before lock-down the emotions of joy and trust had more word frequency than after the lock-down. The word frequency for the emotion joy has decreased while the emotion trust isn’t been observed anymore.
BIGRAMS AND TRIGRAMS
Word frequency of bi-grams (2-word) and tri-grams (3-word) can give us better insights from the dataset we are analyzing.
SENTIMENT WORD FREQUENCY
Before lock-down
After lock-down
We can see that the polarity of emotions increase after lockdown. The frequency and polarity of negative emotions is evidently more than the positive emotions.
MOST POSITIVE AND NEGATIVE WORDS USED IN THE EXTRACTED TWEETS
To get a broader picture of how positive and negative words are used, we assigned the words with a sentiment using the ‘bing’ lexicon and did a simple count to generate the top 10 most common positive and negative words used in the extracted tweets.
Before lock-down
After lock-down
It can be observed that the usage of words like death has doubled. Although there has been an increase in the usage of positive words but not as much as those of negative words. The increase of negative has been from 2132 words to 2738 words, while that of positive has been from 1190 words to 1539 words. Overall it’s also clear that there is a prevalence of negative emotion more than that of the positive emotion.
CATEGORIZING THE SENTIMENTS INTO TEN TYPES OF EMOTIONS
Other than categorizing the words into two categories (positive and negative) only, we can also label the words into multiple emotional states. Here we have grouped the words into ten emotions and compare the total frequency of words in each of the group for before and after the lockdown.
It is evident from the graph that there has definitely an increase in the all the 10 emotions the words were classified into. There a significant increase in the emotions of fear, anger, sadness and trust.
PAIRED SAMPLE T-TEST
The Paired Samples t Test compares two means that are from the same individual, object, or related units. The two means of a paired sample t-test can represent things like a measurement taken at two different times, in our case pre-lockdown and post-lockdown.
The purpose of the test is to determine whether there is statistical evidence that the mean difference between paired observations on a particular outcome is significantly different from zero. The Paired Samples t Test is a parametric test.
After performing paired sample t-test, it is evident that there is a significant difference in the emotions fear, negative and sadness before and after the lockdown (as p<0.05).
Overall Paired Sample T-test
All the negative emotions before and after the lockdown such as anger, anticipation, fear, disgust, being negative, sadness were considered and paired sample t-test was performed on it.
It is observed that there is there is a significant change in the frequency of negative emotions before and after the lock-down (p<0.05) and it is evident from the mean that the negative emotions have increased.
CONCLUSION
In this study, we compared the difference before and after 25 March on sentiment and psychological profile. We found an increase in negative emotions (anxiety, depression, and indignation) and sensitivity to social risks after declaration of lock-down in India. Using social media twitter data may provide timely understanding of the impact of public health emergencies on the public’s mental health during the epidemic period.