https://github.com/drleucine/cleantheburgh
Detecting fake news with Machine Learning: HackTheBurgh 2018
https://github.com/drleucine/cleantheburgh
2018 hackathon hacktheburgh
Last synced: 2 months ago
JSON representation
Detecting fake news with Machine Learning: HackTheBurgh 2018
- Host: GitHub
- URL: https://github.com/drleucine/cleantheburgh
- Owner: DrLeucine
- Created: 2018-03-10T13:47:37.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T00:53:28.000Z (over 2 years ago)
- Last Synced: 2025-03-20T04:13:42.613Z (2 months ago)
- Topics: 2018, hackathon, hacktheburgh
- Language: Python
- Homepage:
- Size: 17 MB
- Stars: 2
- Watchers: 4
- Forks: 0
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![]()
CleanTheBurgh
Detecting fake news in English and Italian
## Inspiration
Inspired by JP Morgan's challenge and Botometer which detects twitter bots, we created a similar tool that detects fake news, in both English and Italian.
## What it does
When user enter a link to online news, the programme will output the probability that it believes the real.
## How we built it
1. We first collected and parsed a large amount of data, including title, content and urls of fake news and real news.
2. Then the data is fed to our Machine Learning model to train. In the end we have chosen and tuned a RandomForest Model for prediction.
3. Also we logged down websites that usually post fake news in both English and Italian. This is used for cross-check.
4. For users, we created a web scraper which returns the news content when a link is provided.
5. Lastly, we have a GUI for the tool.## Challenges we ran into
1. Insufficient data for Deep Learning - used Random Forest instead, a supervised machine learning model
2. Some bugs with GUI not displaying image - fixed## Accomplishments that we're proud of
1. The tool works for both English and Italian.
2. The tool is working correctly - the accuracy on test data can go up to 80%## What we learned
We have gone through data collection, ML, web scraper and GUI.
## What's next for CleanTheBurgh
1. Detection ability on fake pictures or pictures with wrong titles/captions.
2. Ideally the news database and fake news domains need to be updated regularly in order to keep accuracy.