Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jeugregg/fakenewsdetectionfr
Detect French Fake News by camemBERT model
https://github.com/jeugregg/fakenewsdetectionfr
camembert-model french-newspapers ipynb newspaper-crawler notebook rss scrapy
Last synced: about 1 month ago
JSON representation
Detect French Fake News by camemBERT model
- Host: GitHub
- URL: https://github.com/jeugregg/fakenewsdetectionfr
- Owner: jeugregg
- Created: 2019-12-26T07:18:05.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-02-21T01:46:03.000Z (over 4 years ago)
- Last Synced: 2024-10-11T15:41:12.912Z (about 1 month ago)
- Topics: camembert-model, french-newspapers, ipynb, newspaper-crawler, notebook, rss, scrapy
- Language: Jupyter Notebook
- Homepage:
- Size: 8.49 MB
- Stars: 8
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# FakeNewsDetectionFr
Detect French Fake News
These notebooks can scrap and train a camemBERT model to detect True or False News in French.
More information can be found into /doc/ folder (in French).- Scraping TRUE news from French Newspapers using "gbolmier/newspaper-crawler" :
- Futura Sciences
- Liberation
- Telerama
- Le Monde
- Le FigaroFor Le Monde, it is necessary to modify the spider :
newspaper_crawler/spiders/lemonde_spider.py
- issue #1 : fixing lemonde body scraping- Scraping TRUE news using "scrapy" for :
- 20 minutes- Scraping FAKE news from French Parody Newspapers using "scrapy" :
- Le Gorafi
- NordPresse.be
- BuzzBeed.com- Train camemBERT model
- Evaluate
- Compare to baselineFiles :
01_Scraping_French_newspaper_crawler.ipynb :
This notebook can be used to scrap french news from RSS feed of these newspapers :- Figaro
- Futura Sciences
- Liberation
- Le MondeDoesn't work for :
- Nouvel Obs
This notebook can be executed several times to add new news to database : newspaper_db.
02_Scraping_French_News.ipynb :
This notebook use scrapy classes to retrieve latest news content from :
- Le Gorafi (societe, politique)
- Nord Presse.be (France)
- BuzzBeed
- 20 Minutes2 possible sources :
- pages links and next page (best)
- RSS feed (possible to update data)To select only RSS :
- execute only RSS parts and finish by Export parts.03_Train_evaluate_camemBERT.ipynb :
French Fake News Detection model with camemBERT model
This notebook contains :
- Exploration of news
- Preparation input data
- Training camemBERT Sequence Classification (using "simpletransformers")
- Evaluation
- Works on Google Colab
- Choose GPU Execution type
04_Train_evaluate_baseline.ipynb :
French Fake News Detection baseline model
This notebook contains :
- Preparation input data TF-IDF
- Training baseline Sequence Classification (using "LogisticRegression")
- Evaluation
- Works on Google Colab