Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jakobzmrzlikar/fake-news-analysis
An analysis of the FakeNewsNet dataset using NLP techniques.
https://github.com/jakobzmrzlikar/fake-news-analysis
data-analysis fake-news ipynb-jupyter-notebook nlp-machine-learning
Last synced: 3 days ago
JSON representation
An analysis of the FakeNewsNet dataset using NLP techniques.
- Host: GitHub
- URL: https://github.com/jakobzmrzlikar/fake-news-analysis
- Owner: jakobzmrzlikar
- License: mit
- Created: 2021-10-29T09:31:09.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-03-18T09:51:12.000Z (over 2 years ago)
- Last Synced: 2024-10-12T15:43:48.580Z (about 1 month ago)
- Topics: data-analysis, fake-news, ipynb-jupyter-notebook, nlp-machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 16.6 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# fake-news-analysis
## This is a python project analyzing the [FakeNewsNet dataset](https://github.com/KaiDMML/FakeNewsNet) using NLP techniques.
### Description
The Jupyter notebook contains exploratory data analysis of news articles and tweets from the Politifact part of the FakeNewsNet dataset. This includes:
- NLP preprocessing steps including cleaning, tokenization, lemmatization and stemming
- Term-Document and TF-IDF matrix construction
- visualization of PCA and T-SNE projections
- traininng and testing linear and nonlinear SVM classifiers on the data
- and much more...### Some interesting takeaways
- 95% of news articles and 99% of the tweets containing the word 'transcript' were about real news.
- Tweets talking about presidents are mostly about real news, while tweets talking about vaccines are mostly about fake ones.
- Using SVM models trained on news articles to predict the turstworthiness of tweets is often impractical and produces mediocre results.
- The FakeNewsNet dataset is enormous, and this analysis barely touches the surface of what could be learned from it.