https://github.com/mariaorabi/data-mining-disease-tweets-analysis
Analyze tweets related to diseases using data mining techniques to derive insights and patterns.
https://github.com/mariaorabi/data-mining-disease-tweets-analysis
data-analysis-python data-anlysis data-mining data-processing disease juypter-notebook nlp python tweet-analysis
Last synced: 8 months ago
JSON representation
Analyze tweets related to diseases using data mining techniques to derive insights and patterns.
- Host: GitHub
- URL: https://github.com/mariaorabi/data-mining-disease-tweets-analysis
- Owner: Mariaorabi
- Created: 2023-11-20T19:54:17.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-20T20:32:51.000Z (almost 2 years ago)
- Last Synced: 2025-01-07T07:13:54.156Z (10 months ago)
- Topics: data-analysis-python, data-anlysis, data-mining, data-processing, disease, juypter-notebook, nlp, python, tweet-analysis
- Homepage:
- Size: 21.7 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Disease Tweets Analysis
## Introduction
In this project, I aim to analyze tweets related to four diseases: AIDS/HIV, cancer, Corona (COVID-19), and diabetes. The analysis involves preprocessing the tweets, extracting key information, and deriving insights through natural language processing (NLP) techniques.
## Data Files
- Four files containing tweets related to the diseases, each categorized by specific keywords.
## Preprocessing
- Remove user mentions and the word "LINK@" from the tweets.
- Preserve social media symbols as individual words to avoid separation into meaningless signs.
## Parts of Speech Analysis
### a. Grammatical Analysis using Spacy
- Utilize the Spacy package for grammatical analysis.
- Ignore stop words, perform lemmatization, and filter words based on the English dictionary.
-
### b. Most Common Words
- Report the 20 most common words for each disease.
- Discuss the relevance and value of the results.
### c. Most Common Adjective Words
- Identify the 20 most common adjective words for each disease.
- Evaluate the logical and meaningful aspects of the results.
### d. Most Common Verbs
- Extract the 20 most common verbs for each disease.
- Provide insights into the logical and meaningful implications of the results.
### e. Most Common Noun Words
- Analyze the 20 most common noun words for each disease.
- Discuss the sense and meaningfulness of the results.
## Parsing
### a. Dependency Parsing Function
- Implement a function for checking words directly related to the disease names in tweets.
- Convert verbs to their base forms (lemmas) and summarize the 10 common verbs along with their relative frequency for each disease.
- Discuss differences between diseases and compare with results from section d of question 2.
### b. Most Common Adjective Heads for Disease Names
- Find the adjectives for which the disease name is the most common head.
- Discuss differences between diseases and compare with results from section c of question 2.
## Conclusion
- Summarize key findings from the analysis.
- Reflect on any variations between diseases and the reasons behind them.
## How to Submit
- The code must be submitted in a Jupyter notebook file.