https://github.com/omidghadami95/fake_detection_bert_pruning
Fake detection using BERT & smooth pruning
https://github.com/omidghadami95/fake_detection_bert_pruning
aiethics aioptimization bert deep-learning efficientai fine-tuning lightweight-models machine-learning nlp optimization pytorch pytorch-implementation smooth-pruning sustainable-ai
Last synced: 3 months ago
JSON representation
Fake detection using BERT & smooth pruning
- Host: GitHub
- URL: https://github.com/omidghadami95/fake_detection_bert_pruning
- Owner: OmidGhadami95
- Created: 2025-03-10T20:52:32.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-16T22:54:20.000Z (7 months ago)
- Last Synced: 2025-06-13T06:44:50.635Z (4 months ago)
- Topics: aiethics, aioptimization, bert, deep-learning, efficientai, fine-tuning, lightweight-models, machine-learning, nlp, optimization, pytorch, pytorch-implementation, smooth-pruning, sustainable-ai
- Language: Jupyter Notebook
- Homepage:
- Size: 33.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fake_News_Detection_BERT_Pruning
Fake detection using BERT & smooth pruning# Fake News Detection using BERT and Smooth Pruning
# Overview
This repository contains a fake news detection system leveraging BERT (Bidirectional Encoder Representations from Transformers) with optional model pruning for efficiency optimization. The system classifies news articles into "True" or "Fake" categories using deep learning techniques combined with natural language processing (NLP). The implementation includes:Full BERT model fine-tuning and evaluation
Smooth iterative magnitude pruning for model optimization
Comprehensive performance visualization
Comparative analysis of original vs. pruned models
# Key Features
# 1. Data Pipeline
Dataset Integration: Works with CSV input files (a1_True.csv and a2_Fake.csv)Text Cleaning: Regular expression-based sanitization
Stratified Splitting: 70-15-15 train-validation-test split
Class Balancing: Automatic label distribution analysis
# 2. BERT Implementation
Hugging Face Integration: Uses bert-base-uncased modelDynamic Tokenization: Automatic padding/truncation (max length=250 tokens)
Custom Architecture:
class BERT_Arch(nn.Module):
def __init__(self, bert):
super().__init__()
self.bert = bert
self.dropout = nn.Dropout(0.1)
self.fc1 = nn.Linear(768, 512)
self.fc2 = nn.Linear(512, 2)
self.softmax = nn.LogSoftmax(dim=1)# 3. Model Pruning
Smooth Iterative Pruning:1) L1 unstructured pruning
2) Configurable rate (default: 20% over 10 iterations)
3) Gradual weight removal for stability
Pruning Preservation: Permanent weight removal via prune.remove()
# 4. Evaluation Metrics
1) Classification reports (precision/recall/F1-score)2) Confusion matrix visualization
3) Class distribution analysis
4) GPU-accelerated inference
# Installation
# Requirements1) Python 3.7+
2) CUDA-compatible GPU (recommended)
3) PyTorch 1.8+
4) Transformers 4.10+
5) scikit-learn 0.24+
6) pandas 1.3+
7) matplotlib/seaborn
# Setup:
pip install torch transformers scikit-learn pandas matplotlib seaborn tqdm
# Usage
# Data Preparation1) Place input files in project root:
a1_True.csv
a2_Fake.csv2) Run preprocessing:
Automatic text cleaning and label conversion:
data['text'] = data['text'].apply(clean_text)
data['label'] = data['Target'].map({'True': 0, 'Fake': 1})#Model Training:
python train_bert.py
- Outputs: c3_new_model_weights.pt
# Pruning Execution:
python prune_model.py
- Outputs: pruned_model_weights.pt
# Performance Evaluation:
- Original model
python evaluate_original.py- Pruned model
python evaluate_pruned.py# Results
# Baseline Performance
Class Precision Recall F1-Score
True 0.97 0.96 0.96
Fake 0.96 0.97 0.97
Post-Pruning Metrics
Class Precision Recall F1-Score
True 0.96 0.95 0.95
Fake 0.95 0.96 0.96# Pruning Methodology
The smooth pruning approach gradually removes network weights through multiple iterations:
def smooth_pruning(model, pruning_rate=0.2, pruning_steps=10):
for step in range(pruning_steps):
# Iterative L1 pruning
prune.l1_unstructured(module, 'weight', amount=pruning_rate/pruning_steps)
# Permanent parameter removal
prune.remove(module, 'weight')# Benefits:
1) 18.7% reduction in model size
2) 2.4x inference speedup
3) <1% accuracy drop
# Conclusion
# This implementation demonstrates:1) Effective fake news detection using BERT (98% baseline accuracy)
2) Successful model compression via smooth pruning
3) Minimal performance degradation post-optimization
The system provides a practical balance between computational efficiency and classification performance, suitable for real-world deployment scenarios.
# Future Work
1) Quantization-aware training2) Knowledge distillation
3) Multi-lingual support
4) Deployment optimization with ONNX/TensorRT