Sentiment Analysis and Paired Sample T-test on Indian Tweets during COVID-19

Kopal Sharma
5 min readApr 9, 2021

Analysis By :- Kopal Sharma, Heetakshi Shah, Shubham Kanade, Manas Malhotra, Rutuja Suryavanshi, Abhimanyu Bhola

ABSTRACT

COVID-19 (Corona Virus Disease 2019) has resulted in a significantly large number of psychological consequences. The aim of this study is to explore the impacts of COVID-19 on people’s mental health, to assist policy makers to develop actionable policies, and help clinical practitioners (e.g., social workers, psychiatrists, and psychologists) provide timely services to affected populations.

For this project we extracted and analyzed tweets of 10,000 active Indian users of twitter from the period one week before the announcement of first lockdown (w.e.f. 18th March to 20th March) and 10,000 tweets from just after the first lockdown (w.e.f. 25th March -27th March).

We calculated word frequency, scores of emotional indicators (e.g., anxiety, depression, indignation, and happiness) from the collected data. The sentiment analysis and the paired sample t-test were performed to examine the differences in the same group before and after the declaration of all India lockdown on 25 March, 2020.

EXTRACTING THE TWEETS

Our team extracted 10,000 tweets about coronavirus and related words for each timeframe, before and after lockdown, by using the Tweepy API in a python script.

The dataset for each of the periods, before and after, was extracted independent of each other and then stored in a CSV format for suitable analysis.

DATA CLEANING AND TOKENIZATION

First, the text column is pre-processed with making all characters lowercase, removing all punctuation marks, white spaces, and common words (stop words), usernames associated with tweets and storing the cleaned tweets in a column named tidy_tweets in the Data Frame.

After pre-processing, we used the tokenization method to grab the word combinations in the two datasets.

SENTIMENT WORD CLOUD FOR BEFORE LOCK-DOWN

SENTIMENT WORD CLOUD FOR AFTER LOCK-DOWN

We can see in the word cloud that before lock-down the emotions of joy and trust had more word frequency than after the lock-down. The word frequency for the emotion joy has decreased while the emotion trust isn’t been observed anymore.

BIGRAMS AND TRIGRAMS

Word frequency of bi-grams (2-word) and tri-grams (3-word) can give us better insights from the dataset we are analyzing.

SENTIMENT WORD FREQUENCY

Before lock-down

After lock-down

We can see that the polarity of emotions increase after lockdown. The frequency and polarity of negative emotions is evidently more than the positive emotions.

MOST POSITIVE AND NEGATIVE WORDS USED IN THE EXTRACTED TWEETS

To get a broader picture of how positive and negative words are used, we assigned the words with a sentiment using the ‘bing’ lexicon and did a simple count to generate the top 10 most common positive and negative words used in the extracted tweets.

Before lock-down

After lock-down

It can be observed that the usage of words like death has doubled. Although there has been an increase in the usage of positive words but not as much as those of negative words. The increase of negative has been from 2132 words to 2738 words, while that of positive has been from 1190 words to 1539 words. Overall it’s also clear that there is a prevalence of negative emotion more than that of the positive emotion.

CATEGORIZING THE SENTIMENTS INTO TEN TYPES OF EMOTIONS

Other than categorizing the words into two categories (positive and negative) only, we can also label the words into multiple emotional states. Here we have grouped the words into ten emotions and compare the total frequency of words in each of the group for before and after the lockdown.

It is evident from the graph that there has definitely an increase in the all the 10 emotions the words were classified into. There a significant increase in the emotions of fear, anger, sadness and trust.

PAIRED SAMPLE T-TEST

The Paired Samples t Test compares two means that are from the same individual, object, or related units. The two means of a paired sample t-test can represent things like a measurement taken at two different times, in our case pre-lockdown and post-lockdown.

The purpose of the test is to determine whether there is statistical evidence that the mean difference between paired observations on a particular outcome is significantly different from zero. The Paired Samples t Test is a parametric test.

After performing paired sample t-test, it is evident that there is a significant difference in the emotions fear, negative and sadness before and after the lockdown (as p<0.05).

Overall Paired Sample T-test

All the negative emotions before and after the lockdown such as anger, anticipation, fear, disgust, being negative, sadness were considered and paired sample t-test was performed on it.

It is observed that there is there is a significant change in the frequency of negative emotions before and after the lock-down (p<0.05) and it is evident from the mean that the negative emotions have increased.

CONCLUSION

In this study, we compared the difference before and after 25 March on sentiment and psychological profile. We found an increase in negative emotions (anxiety, depression, and indignation) and sensitivity to social risks after declaration of lock-down in India. Using social media twitter data may provide timely understanding of the impact of public health emergencies on the public’s mental health during the epidemic period.

--

--