{"id":16653163,"url":"https://github.com/anuraganalog/bullshit-detector","last_synced_at":"2026-05-08T10:37:16.576Z","repository":{"id":112669225,"uuid":"363390813","full_name":"AnuragAnalog/bullshit-detector","owner":"AnuragAnalog","description":"Trying a Different Approach on Fake News Detection","archived":false,"fork":false,"pushed_at":"2021-05-14T08:52:48.000Z","size":66,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-08T12:05:25.985Z","etag":null,"topics":["count","dataset","detection","ensemble","fake","learning","liar","news","nlp","python3","sklearn","tfidf","vectorizer"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AnuragAnalog.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-05-01T11:13:51.000Z","updated_at":"2021-08-19T09:32:27.000Z","dependencies_parsed_at":"2023-06-10T18:30:34.257Z","dependency_job_id":null,"html_url":"https://github.com/AnuragAnalog/bullshit-detector","commit_stats":{"total_commits":11,"total_committers":2,"mean_commits":5.5,"dds":0.09090909090909094,"last_synced_commit":"d538c6deec188d8b46d2fea6587fef5a03c1b0de"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AnuragAnalog/bullshit-detector","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnuragAnalog%2Fbullshit-detector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnuragAnalog%2Fbullshit-detector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnuragAnalog%2Fbullshit-detector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnuragAnalog%2Fbullshit-detector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AnuragAnalog","download_url":"https://codeload.github.com/AnuragAnalog/bullshit-detector/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AnuragAnalog%2Fbullshit-detector/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32776983,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T08:22:46.396Z","status":"ssl_error","status_checked_at":"2026-05-08T08:22:45.650Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["count","dataset","detection","ensemble","fake","learning","liar","news","nlp","python3","sklearn","tfidf","vectorizer"],"created_at":"2024-10-12T09:43:24.656Z","updated_at":"2026-05-08T10:37:16.558Z","avatar_url":"https://github.com/AnuragAnalog.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fake News Detection\n\n## Datasets\n\nThe dataset which was used in the detector was [Liar](https://www.cs.ucsb.edu/~william/data/liar_dataset.zip)\n\n**LIAR** is a publicly available dataset for fake news detection. A decade-long of 12.8K manually labeled short statements were collected in various contexts from POLITIFACT.COM, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type. The LIAR dataset4 includes 12.8K human labeled short statements from POLITIFACT.COM’s API, and each statement is evaluated by a POLITIFACT.COM editor for its truthfulness.\n\nYou can use the **[download.sh](./download.sh)** file to download the dataset from the site.\n\n## Preprocessing\n\nSix classes where mapped to fake and real outputs\n```\nTrue -\u003e Real\nMostly-True -\u003e Real\nHalf-True -\u003e Real\nMostly-False -\u003e Fake\nFalse -\u003e Fake\nPants-On-Fire -\u003e Fake\n```\n\n* Stemming\n* Removal of puncutations\n* Removal of Stopwords\n\n## Strategy\n\nThe strategy which was used was ensemble of classifiers.\n\n## Classifiers\n\nThese are the five classifiers which were used in the ensembling, the hyperparameter tuning was done using the library optuna.\n\n* Random Forest\n* Logistic Regression\n* Multinomial Naive Bayes\n* Support Vector Classifiers\n* SGD Classifier\n\n## Reproduce the results\n\nIf you wish to see the results of the trained models, you can use my models, which can be found in `saved_models` directory.\n\n\u003e The Ensemble models are huge, so I have included a download script which are hosted on [archive.org](archive.org)\n\n## Downloads\n\nIf the download script is very slow, you use the below links\n\n[Tfidf Ensemble](https://archive.org/download/bull-shit-detector-tfidf-ensemble/tfidf_ensemble.pkl)\n\n[Count Ensemble](https://archive.org/download/bullshit-detector-count-ensemble/count_ensemble.pkl)\n\n## Project Directory Structure\n\n```\n|-- Detector.ipynb\n|-- LICENSE\n|-- README.md\n|-- download.sh\n|-- metric_data\n|   |-- count_test_metric.csv\n|   |-- count_train_metric.csv\n|   |-- count_valid_metric.csv\n|   |-- tfidf_test_metrics.csv\n|   |-- tfidf_train_metrics.csv\n|   `-- tfidf_valid_metrics.csv\n|-- requirements.txt\n`-- saved_models\n    |-- count_logreg.pkl\n    |-- count_nb.pkl\n    |-- count_rf.pkl\n    |-- count_sgd.pkl\n    |-- count_svc.pkl\n    |-- download_models.sh\n    |-- tfidf_logreg.pkl\n    |-- tfidf_nb.pkl\n    |-- tfidf_rf.pkl\n    |-- tfidf_sgd.pkl\n    `-- tfidf_svc.pkl\n\n2 directories, 22 files\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanuraganalog%2Fbullshit-detector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanuraganalog%2Fbullshit-detector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanuraganalog%2Fbullshit-detector/lists"}