Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sayamalt/fake-news-classification-using-fine-tuned-bert
Successfully developed a text classification model to predict whether a given news text is fake or not by fine-tuning a pretrained BERT transformed model imported from Hugging Face.
https://github.com/sayamalt/fake-news-classification-using-fine-tuned-bert
bert-embeddings bert-model data-analysis data-visualization deep-learning fine-tuning-bert model-evaluation model-training-and-evaluation text-classification text-preprocessing text-tokenization tokenizer-nlp wordcloud-visualization
Last synced: 3 days ago
JSON representation
Successfully developed a text classification model to predict whether a given news text is fake or not by fine-tuning a pretrained BERT transformed model imported from Hugging Face.
- Host: GitHub
- URL: https://github.com/sayamalt/fake-news-classification-using-fine-tuned-bert
- Owner: SayamAlt
- License: apache-2.0
- Created: 2024-12-10T01:48:31.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-12-10T01:59:46.000Z (2 months ago)
- Last Synced: 2024-12-18T22:13:41.228Z (about 2 months ago)
- Topics: bert-embeddings, bert-model, data-analysis, data-visualization, deep-learning, fine-tuning-bert, model-evaluation, model-training-and-evaluation, text-classification, text-preprocessing, text-tokenization, tokenizer-nlp, wordcloud-visualization
- Language: Jupyter Notebook
- Homepage:
- Size: 18 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Fake News Classification Dataset
## Overview
The **Fake News Classification Dataset** is a comprehensive English-language dataset containing over 45,000 unique news articles. Each article is classified as either true (1) or false (0), making it an invaluable resource for researchers and practitioners in the field of fake news detection, particularly using Transformers-based models. This dataset is specifically designed to support text classification, fact-checking, and intent classification tasks.
---
## Dataset Summary
- **Title**: Fake News Classification Dataset
- **Language**: English (en-US)
- **Size**: 45,000+ news articles
- **Labels**: Binary (0 = Fake, 1 = True)
- **Supported Tasks**:
- Text Classification
- Fact-checking
- Intent Classification---
## Dataset Structure
The dataset comprises 40,587 records with the following fields:
### Key Fields
- **Title**: The title of the news article.
- **Text**: The content of the news article.
- **Label**: A binary classification indicating whether the news is fake (0) or true (1).### Example Instance:
```json
{
"id": "1",
"title": "Palestinians switch off Christmas lights in Bethlehem in anti-Trump protest",
"text": "RAMALLAH, West Bank (Reuters) - Palestinians switched off Christmas lights at Jesus' traditional birthplace in Bethlehem on Wednesday night in protest at U.S. President Donald Trump's decision to recognize Jerusalem as Israel's capital...",
"label": "1"
}
```---
### Data Splits
The dataset is split into three phases to support supervised learning methodologies:
Train: 24,353 instances
Validation: 8,117 instances
Test: 8,117 instances---
## Dataset Creation
This dataset is constructed using Python and the Pandas library as the primary data processing tool. It incorporates a mix of multiple fake news datasets sourced from Kaggle to ensure comprehensiveness and diversity.
Version: 1.0.0
Focus: Supervised learning with modern Transformers-based models in NLP.---
## Source Data
The dataset is compiled from multiple fake news datasets available on Kaggle, ensuring that it is both extensive and high-quality for machine learning tasks.
---
## Considerations for Using the Data
This dataset is structured for use in three key phases of a machine learning workflow:
Training Phase: To train NLP models effectively.
Validation Phase: To validate model performance and ensure there is no overfitting.
Testing Phase: To evaluate model accuracy and identify fine-tuning needs.---
## Repository
All processes and code used to create this dataset are available in the repository:
Fake News Detection Repository---
## License
Please check the repository for licensing information.
---
## Contributing
Contributions, issues, and feature requests are welcome! Feel free to open a pull request or raise an issue for discussion.
---
## Acknowledgments
Special thanks to Kaggle for providing access to the source datasets.
---
## Contact
For any inquiries or issues, please contact me through my email address '[email protected]'.