{"id":32348595,"url":"https://github.com/syedt1/shared_task1_hatespeech","last_synced_at":"2026-03-09T23:03:42.049Z","repository":{"id":316926848,"uuid":"1029807804","full_name":"SyedT1/Shared_Task1_HateSpeech","owner":"SyedT1","description":null,"archived":false,"fork":false,"pushed_at":"2025-11-08T05:05:12.000Z","size":6144,"stargazers_count":2,"open_issues_count":5,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-08T06:17:39.904Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SyedT1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-31T15:51:53.000Z","updated_at":"2025-11-08T05:05:15.000Z","dependencies_parsed_at":"2025-09-27T17:35:48.496Z","dependency_job_id":"10b3191c-b8ee-4a5f-ae3e-7ac73a80cd1a","html_url":"https://github.com/SyedT1/Shared_Task1_HateSpeech","commit_stats":null,"previous_names":["syedt1/shared_task1_hatespeech"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SyedT1/Shared_Task1_HateSpeech","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SyedT1%2FShared_Task1_HateSpeech","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SyedT1%2FShared_Task1_HateSpeech/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SyedT1%2FShared_Task1_HateSpeech/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SyedT1%2FShared_Task1_HateSpeech/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SyedT1","download_url":"https://codeload.github.com/SyedT1/Shared_Task1_HateSpeech/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SyedT1%2FShared_Task1_HateSpeech/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30315990,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T20:05:46.299Z","status":"ssl_error","status_checked_at":"2026-03-09T19:57:04.425Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-24T07:56:12.601Z","updated_at":"2026-03-09T23:03:42.027Z","avatar_url":"https://github.com/SyedT1.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Shared Task 1: Hate Speech Detection in Bengali\n\n## Project Overview\nThis repository contains comprehensive implementations for the Bengali Multi-task Hate Speech Identification shared task at BLP Workshop @IJCNLP-AACL 2025. The project addresses the complex problem of detecting and understanding hate speech in Bengali across three related subtasks: hate type classification, target identification, and multi-task analysis. The implementation explores various machine learning approaches from traditional deep learning to state-of-the-art transformer models with advanced training techniques.\n\n## Competition Phases\n\n### 🔬 **Developmental Phase**\n- **Objective**: Model experimentation, architecture exploration, and hyperparameter tuning\n- **Data**: Training and validation datasets provided by organizers\n- **Focus**: Testing various approaches and techniques to identify best-performing models\n- **Metrics**: Validation F1 scores on development set\n\n### 🏆 **Evaluation Phase**  \n- **Objective**: Final model evaluation on unseen test data\n- **Data**: Hidden test set released during evaluation period\n- **Focus**: Deploying best models from developmental phase with refined configurations\n- **Metrics**: Test F1 scores on official evaluation set\n\n## Repository Structure\n\n### Subtask 1A - Hate Speech Type Classification\nMulti-class classification of Bengali text into: Abusive, Sexism, Religious Hate, Political Hate, Profane, or None.\n\n#### 📊 **Developmental Phase Results**\n\n##### **Deep Learning Models**\n- **BiLSTM** - F1 Score: 56.25%\n- **LSTM with Attention** - F1 Score: 55.18%\n\n##### **Large Language Models (LLMs)**\n- **XLM-RoBERTa-large** - F1 Score: 72.81%\n- **MuRIL-large-cased** - F1 Score: 71.02%\n- **BanglaBERT (csebuetnlp)** - F1 Score: 70.74%\n- **BanglaBERT-large (csebuetnlp)** - F1 Score: 70.51%\n- **XLM-RoBERTa-base** - F1 Score: 70.50%\n- **DistilBERT-multilingual** - F1 Score: 68.03%\n\n##### **LLMs with K-Fold Cross Validation**\n- **MuRIL-large-cased with K-Fold** - F1 Score: 73.61%\n- **XLM-RoBERTa-large with K-Fold** - F1 Score: 73.45%\n- **BanglaBERT with K-Fold** - F1 Score: 73.29%\n\n##### **K-Fold with Text Normalizer**\n- **BanglaBERT with Normalizer** - F1 Score: 74.32%\n- **MuRIL-large-cased with Normalizer** - F1 Score: 73.73%\n- **XLM-RoBERTa-large with Normalizer** - F1 Score: 73.29%\n\n##### **LLMs with Adversarial Training (K-Fold + FGM)**\n- **BanglaBERT with K-Fold + FGM** - F1 Score: 73.87%\n- **MuRIL-large-cased with K-Fold + FGM** - F1 Score: 73.68%\n\n##### **Advanced Combined Approaches (K-Fold + FGM + Normalizer)**\n- **BanglaBERT + K-Fold + FGM + Normalizer** - F1 Score: 74.88% ⭐ (Best Development Score)\n- **MuRIL-large-cased + K-Fold + FGM + Normalizer** - F1 Score: 73.81%\n\n\n#### 🎯 **Evaluation Phase Results**\n- **BanglaBERT + K-Fold + FGM + Normalizer** - Test F1: 72.33% ⭐ (Best Test Score)\n- **BanglaBERT + K-Fold + FGM** - Test F1: 72.17%\n- **MuRIL-large-cased + K-Fold + Normalizer** - Test F1: 72.30%\n- **BanglaBERT + K-Fold** - Test F1: 72.05%\n- **MuRIL-large-cased + K-Fold + FGM** - Test F1: 71.90%\n- **MuRIL-large-cased + K-Fold** - Test F1: 71.88%\n- **XLM-RoBERTa-large + K-Fold** - Test F1: 71.72%\n- **XLM-RoBERTa-large + K-Fold + Normalizer** - Test F1: 71.57%\n- **MuRIL-large-cased + K-Fold + FGM + Normalizer** - Test F1: 71.31%\n- **BanglaBERT + K-Fold + Normalizer** - Test F1: 71.14%\n- **BanglaBERT (Base)** - Test F1: 70.31%\n\n### Subtask 1B - Hate Speech Target Classification\nClassification of hate speech targets into: Individuals, Organizations, Communities, or Society.\n\n#### 📊 **Developmental Phase Results**\n\n##### **Deep Learning Models**\n- Traditional deep learning approaches implemented (scores pending)\n\n##### **Large Language Models (LLMs)**\n- **BanglaBERT** - F1 Score: 72.09%\n- **MuRIL-large-cased** - F1 Score: 71.93%\n- **XLM-RoBERTa-large** - F1 Score: 71.38%\n\n##### **LLMs with K-Fold Cross Validation**\n- **MuRIL-large-cased with K-Fold** - F1 Score: 74.96% ⭐ (Best Development Score)\n- **BanglaBERT with K-Fold** - F1 Score: 73.69%\n- **XLM-RoBERTa-large with K-Fold** - F1 Score: 71.53%\n\n##### **K-Fold with Text Normalizer**\n- **BanglaBERT with Normalizer** - F1 Score: 74.72%\n- **MuRIL-large-cased with Normalizer** - F1 Score: 74.48%\n- **XLM-RoBERTa-large with Normalizer** - F1 Score: 72.39%\n\n##### **LLMs with K-Fold and Adversarial Attacks (FGM)**\n- **XLM-RoBERTa-large with K-Fold + FGM** - F1 Score: 74.20%\n- **BanglaBERT with K-Fold + FGM** - F1 Score: 74.12%\n- **MuRIL-large-cased with K-Fold + FGM** - F1 Score: 73.89%\n\n##### **Advanced Combined Approaches (K-Fold + Adversarial + Normalizer)**\n- **BanglaBERT + K-Fold + FGM + Normalizer** - F1 Score: 74.64%\n- **MuRIL-large-cased + K-Fold + FGM + Normalizer** - F1 Score: 74.56%\n- **XLM-RoBERTa-large + K-Fold + FGM + Normalizer** - F1 Score: 74.32%\n\n#### 🎯 **Evaluation Phase Results**\n\n##### **Base LLMs (without K-Fold)**\n- **XLM-RoBERTa-large** - Test F1: 71.23%\n- **MuRIL-large-cased** - Test F1: 70.93%\n- **BanglaBERT** - Test F1: 70.25%\n\n##### **LLMs with K-Fold Cross Validation**\n- **MuRIL-large-cased + K-Fold** - Test F1: 73.44%\n- **BanglaBERT + K-Fold** - Test F1: 71.85%\n- **XLM-RoBERTa-large + K-Fold** - Test F1: 68.07%\n\n##### **K-Fold with Text Normalizer**\n- **MuRIL-large-cased + K-Fold + Normalizer** - Test F1: 73.44%\n- **BanglaBERT + K-Fold + Normalizer** - Test F1: 72.89%\n- **XLM-RoBERTa-large + K-Fold + Normalizer** - Test F1: 71.66%\n\n##### **LLMs with K-Fold and Adversarial Attacks (FGM)**\n- **XLM-RoBERTa-large + K-Fold + FGM** - Test F1: 73.28%\n- **MuRIL-large-cased + K-Fold + FGM** - Test F1: 72.92%\n- **BanglaBERT + K-Fold + FGM** - Test F1: 72.25%\n\n##### **Advanced Combined Approaches (K-Fold + FGM + Normalizer)**\n- **BanglaBERT + K-Fold + FGM + Normalizer** - Test F1: 73.12% ⭐\n- **MuRIL-large-cased + K-Fold + FGM + Normalizer** - Test F1: 72.95% ⭐\n- **XLM-RoBERTa-large + K-Fold + FGM + Normalizer** - Test F1: 72.17%\n\n### Subtask 1C - Multi-task Hate Speech Analysis\nMulti-task classification combining hate type (Abusive, Sexism, Religious Hate, Political Hate, Profane, None), severity (Little to None, Mild, Severe), and target group (Individuals, Organizations, Communities, Society).\n\n#### 📊 **Developmental Phase Results**\n\n##### **Base LLMs**\n- Basic transformer implementations (scores pending)\n\n##### **LLMs with K-Fold Cross Validation**\n- Standard K-Fold implementations (scores pending)\n\n##### **LLMs with Adversarial Training and K-Fold**\nAll using BanglaBERT (cse-buet-nlp) with different adversarial techniques:\n- **BanglaBERT + FreeLB** - F1 Score: 74.52% ⭐ (Best Development Score)\n- **BanglaBERT + Simple FreeLB** - F1 Score: 73.91%\n- **BanglaBERT + GAT** - F1 Score: 73.79%\n- **BanglaBERT + FGM** - F1 Score: 73.75%\n\n##### **LLMs with K-Fold and Normalizer**\n- Text normalization implementations (scores pending)\n\n##### **Advanced Combined Approaches (K-Fold + Adversarial + Normalizer)**\n- Comprehensive technique combinations (scores pending)\n\n#### 🎯 **Evaluation Phase Results**\n\n##### **LLMs with K-Fold and Normalizer**\n- **BanglaBERT + K-Fold + Normalizer** - Test F1: 73.00%\n\n##### **LLMs with Adversarial Training and K-Fold**\n- **BanglaBERT + FreeLB + K-Fold** - Test F1: 72.00%\n\n## Technical Implementation Details\n\n### Advanced Training Techniques\n\n#### **Adversarial Training Methods**\n- **FGM (Fast Gradient Method)**: Simple and efficient adversarial perturbations\n- **AWP (Adversarial Weight Perturbation)**: Weight-space adversarial training\n- **FreeLB**: Free large-batch adversarial training for improved generalization\n- **Simple FreeLB**: Streamlined version of FreeLB\n- **GAT (Geometry-Aware Training)**: Advanced geometry-aware adversarial training\n\n#### **Text Normalization Pipeline**\n```python\nnormalize(\n    text,\n    unicode_norm=\"NFKC\",          # Canonical decomposition + compatibility\n    punct_replacement=None,        # Preserve original punctuation\n    url_replacement=None,          # Preserve URLs\n    emoji_replacement=None,        # Preserve emojis\n    apply_unicode_norm_last=True   # Apply normalization as final step\n)\n```\n\n#### **Custom Model Architectures**\n- **Attention-Based Pooling Head**: Dynamic token weighting for better representation\n- **Multi-Head Classification**: Custom classification layers for Bengali text\n- **Enhanced Dropout Strategies**: Improved regularization techniques\n\n#### **Cross-Validation Strategy**\n- **K-Fold Implementation**: 5-fold cross-validation for robust evaluation\n- **Stratified Sampling**: Maintaining class distribution across folds\n- **Ensemble Averaging**: Combining predictions from multiple folds\n\n## Performance Analysis\n\n### 📈 Best Performing Models by Phase\n\n#### Developmental Phase Champions:\n| Subtask | Model | F1 Score | Technique |\n|---------|-------|----------|-----------|\n| **1A** | BanglaBERT | 74.88% | K-Fold + FGM + Normalizer |\n| **1B** | MuRIL-large-cased | 74.96% | K-Fold Cross Validation |\n| **1C** | BanglaBERT | 74.52% | FreeLB Adversarial Training |\n\n#### Evaluation Phase Performance:\n| Subtask | Model | Dev F1 | Test F1 | Performance Drop |\n|---------|-------|--------|---------|------------------|\n| **1A** | BanglaBERT + K-Fold + FGM + Normalizer | 74.88% | 72.33% | -2.55% |\n| **1B** | MuRIL-large-cased + K-Fold | 74.96% | 73.44% | -1.52% |\n| **1C** | BanglaBERT + K-Fold + Normalizer | 74.52% | 73.00% | -1.52% |\n\n#### Best Test Phase Models (Subtask 1A):\n| Approach | BanglaBERT | MuRIL-large | XLM-RoBERTa-large |\n|----------|------------|-------------|-------------------|\n| **Base LLM** | 70.31% | - | - |\n| **+ K-Fold** | 72.05% | 71.88% | 71.72% |\n| **+ K-Fold + Normalizer** | 71.14% | 72.30% | 71.57% |\n| **+ K-Fold + FGM** | 72.17% | 71.90% | - |\n| **+ K-Fold + FGM + Normalizer** | 72.33% ⭐ | 71.31% | - |\n\n#### Best Test Phase Models (Subtask 1B):\n| Approach | BanglaBERT | MuRIL-large | XLM-RoBERTa-large |\n|----------|------------|-------------|-------------------|\n| **Base LLM** | 70.25% | 70.93% | 71.23% |\n| **+ K-Fold** | 71.85% | 73.44% ⭐ | 68.07% |\n| **+ K-Fold + Normalizer** | 72.89% | 73.44% ⭐ | 71.66% |\n| **+ K-Fold + FGM** | 72.25% | 72.92% | 73.28% |\n| **+ K-Fold + FGM + Normalizer** | 73.12% | 72.95% | 72.17% |\n\n#### Best Test Phase Models (Subtask 1C):\n| Approach | BanglaBERT | Development | Test |\n|----------|------------|-------------|------|\n| **K-Fold + Normalizer** | ✅ | - | 73% ⭐ |\n| **K-Fold + FreeLB** | ✅ | 74.52% | 72% |\n| **Simple FreeLB** | ✅ | 73.91% | - |\n| **GAT** | ✅ | 73.79% | - |\n| **FGM** | ✅ | 73.75% | - |\n\n### Key Performance Insights\n\n#### Development vs Evaluation Observations:\n- **Generalization Gap**: 1-3% performance drop from development to test across all subtasks\n- **Most Stable**: K-Fold + Normalizer combinations showed best consistency (especially in subtask1C)\n- **Overfitting Risk**: Single models without cross-validation showed higher variance\n- **Best Generalization**: \n  - Subtask 1A: Adversarial training methods (FGM + Normalizer)\n  - Subtask 1B: Combined approaches (K-Fold + FGM + Normalizer)\n  - Subtask 1C: Normalization techniques (smallest performance drop: -1.52%)\n\n#### Technical Effectiveness:\n- **K-Fold Cross Validation**: Consistent 2-3% improvement across all models\n- **Text Normalization**: Additional 0.5-1% boost for Bengali text processing\n- **Adversarial Training**: 0.5-1.5% improvement with better robustness\n- **Combined Techniques**: Best overall performance with stacked improvements\n- **Transformer Superiority**: 15-20% improvement over traditional deep learning\n\n## Model Architecture Details\n\n### Transformer Models Utilized\n- **BanglaBERT (csebuetnlp)**: Specialized Bengali language model\n- **MuRIL-large-cased**: Multilingual model with strong Bengali support\n- **XLM-RoBERTa (base \u0026 large)**: Cross-lingual transformer variants\n- **DistilBERT-multilingual**: Lightweight multilingual model\n\n### Custom Implementations\n- **Enhanced Tokenization**: Bengali-specific preprocessing pipelines\n- **Dynamic Padding**: Efficient batch processing strategies\n- **Label Smoothing**: Improved training stability\n- **Learning Rate Scheduling**: Optimized training convergence\n\n## File Organization\n\n### Directory Structure:\n```\nShared_Task1_HateSpeech/\n├── subtask1A/                    # Hate speech type classification\n│   ├── Developmental Phase/\n│   │   ├── DL Models/           # BiLSTM, LSTM-Attention\n│   │   ├── LLMs/                # Base transformer models\n│   │   ├── LLMS with K Fold CV/ # K-Fold implementations\n│   │   ├── K Folds with normalizer/\n│   │   ├── LLMs_KFolds_adversarial attacks/\n│   │   ├── LLMS_KFolds_attacks_normalizer/\n│   │   └── Various classification heads/\n│   └── Evaluation Phase/        # Final test submissions\n├── subtask1B/                   # Hate speech target classification\n│   ├── Developmental Phase/\n│   │   ├── DL Models/\n│   │   ├── LLMs/\n│   │   ├── LLMS with K Fold CV/\n│   │   ├── K Folds with normalizer/\n│   │   ├── LLMs_KFolds_adversarial attacks/\n│   │   └── LLMS_KFolds_attacks_normalizer/\n│   └── Evaluation Phase/\n│       ├── LLMs/\n│       ├── LLMS with K Fold CV/\n│       ├── K Folds with normalizer/\n│       ├── LLMs_KFolds_adversarial attacks/\n│       └── LLMS_KFolds_attacks_normalizer/\n└── subtask1C/                   # Multi-task hate speech analysis\n    ├── Developmental Phase/\n    │   ├── LLMs/\n    │   ├── LLMS with K Fold CV/\n    │   ├── LLMs with adversarial attacks and K Fold CV/\n    │   ├── LLMs with K Fold CV and normalizer/\n    │   └── K Fold CV with attacks and normalizer/\n    └── Evaluation Phase/\n        ├── LLMs/\n        ├── LLMS with K Fold CV/\n        ├── LLMs with adversarial attacks and K Fold CV/\n        ├── LLMs with K Fold CV and normalizer/\n        └── K Fold CV with attacks and normalizer/\n```\n\n### Naming Convention:\n- **Model directories**: `v{f1_score}_{model_name}`\n  - Example: `v0.7488_banglabert-fgm` = 74.88% F1 score using BanglaBERT with FGM\n- **Each directory contains**:\n  - Jupyter notebook (.ipynb) with complete implementation\n  - Dataset file (subtask_1X.tsv)\n  - Model checkpoints and outputs\n\n## Performance Evolution\n\n### Developmental Phase Progression:\n1. **Baseline Models**: 55-68% F1 (Deep Learning approaches)\n2. **Base Transformers**: 68-73% F1 (Standard LLM implementations)\n3. **K-Fold Enhancement**: 70-74% F1 (Cross-validation improvements)\n4. **Normalization Boost**: 73-75% F1 (Text preprocessing optimization)\n5. **Adversarial Training**: 73-75% F1 (Robustness improvements)\n6. **Combined Excellence**: 74-75% F1 (Best technique combinations)\n\n### Development → Evaluation Trends:\n- **Average Performance Drop**: 1-3% on unseen test data\n- **Most Stable Approaches**: K-Fold + Normalizer combinations\n- **Highest Risk**: Single model implementations without regularization\n- **Best Generalization**: Models with adversarial training components\n\n## Technologies and Frameworks\n\n### Core Technologies:\n- **Deep Learning**: PyTorch, TensorFlow\n- **Transformers**: Hugging Face Transformers library\n- **Text Processing**: Custom Bengali normalizers, NLTK\n- **Evaluation**: Scikit-learn, Custom metrics implementations\n- **Adversarial**: Custom FGM, AWP, FreeLB implementations\n- **Cross-Validation**: Stratified K-Fold with scikit-learn\n\n### Hardware and Training:\n- **GPU Acceleration**: CUDA-enabled training\n- **Mixed Precision**: For memory efficiency\n- **Gradient Accumulation**: Effective batch size optimization\n- **Early Stopping**: Preventing overfitting\n\n## Key Contributions\n\n### Novel Techniques Implemented:\n1. **Bengali-Specific Normalization**: NFKC Unicode with preservation strategies\n2. **Advanced Adversarial Training**: Multiple adversarial techniques comparison\n3. **Custom Attention Heads**: Learnable pooling mechanisms\n4. **Robust Cross-Validation**: Stratified K-Fold with ensemble strategies\n5. **Multi-Phase Evaluation**: Systematic development vs evaluation analysis\n\n### Research Insights:\n- **Language-Specific Approaches**: Bengali text requires specialized preprocessing\n- **Adversarial Robustness**: Significant impact on generalization\n- **Cross-Validation Importance**: Critical for reliable performance estimation\n- **Model Ensemble Benefits**: Combining techniques yields optimal results\n\n## Usage Instructions\n\n### Running Experiments:\n1. Navigate to desired subtask directory\n2. Choose appropriate approach folder\n3. Open corresponding Jupyter notebook\n4. Ensure required dependencies are installed\n5. Execute cells sequentially for complete pipeline\n\n### Model Training:\n- Each notebook contains complete training pipeline\n- Data preprocessing and normalization included\n- Model evaluation and metrics calculation automated\n- Results saved with performance indicators\n\n## Future Work\n\n### Potential Improvements:\n- **Multi-Modal Approaches**: Incorporating contextual information\n- **Advanced Ensembling**: Sophisticated model combination strategies\n- **Real-Time Processing**: Optimized inference pipelines\n- **Transfer Learning**: Cross-task knowledge transfer\n- **Data Augmentation**: Synthetic data generation for Bengali\n\n### Research Directions:\n- **Explainability**: Understanding model decision processes\n- **Fairness Analysis**: Bias detection and mitigation\n- **Cross-Lingual Transfer**: Knowledge sharing across languages\n- **Domain Adaptation**: Generalization to different text domains\n\n## Official Task Information\n\n### Task Details\n- **Competition**: Bengali Multi-task Hate Speech Identification Shared Task\n- **Workshop**: BLP Workshop @ IJCNLP-AACL 2025\n- **Website**: https://multihate.github.io/\n- **Evaluation Metrics**: \n  - Subtask 1A \u0026 1B: Micro-F1\n  - Subtask 1C: Weighted Micro-F1\n\n### Data Format\n#### Subtask 1A\n```\nid    text    label\n```\nLabels: Abusive, Sexism, Religious Hate, Political Hate, Profane, None\n\n#### Subtask 1B  \n```\nid    text    label\n```\nLabels: Individuals, Organizations, Communities, Society\n\n#### Subtask 1C\n```\nid    text    hate_type    hate_severity    to_whom\n```\n- hate_type: Abusive, Sexism, Religious Hate, Political Hate, Profane, None\n- hate_severity: Little to None, Mild, Severe  \n- to_whom: Individuals, Organizations, Communities, Society\n\n## Citation and Acknowledgments\n\nThis work represents comprehensive exploration of Bengali hate speech detection for the BLP Workshop @ IJCNLP-AACL 2025 shared task, contributing to the advancement of multilingual NLP and social media content moderation.\n\n### Organizers\n- Md Arid Hasan, PhD Student, The University of Toronto\n- Firoj Alam, Senior Scientist, Qatar Computing Research Institute  \n- Md Fahad Hossain, Lecturer, Daffodil International University\n- Usman Naseem, Assistant Professor, Macquarie University\n- Syed Ishtiaque Ahmed, Associate Professor, The University of Toronto\n\n---\n\n**Note**: This repository demonstrates state-of-the-art approaches for Bengali hate speech detection across multiple classification tasks, with particular emphasis on robust evaluation methodology and practical implementation strategies for the official shared task.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyedt1%2Fshared_task1_hatespeech","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsyedt1%2Fshared_task1_hatespeech","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyedt1%2Fshared_task1_hatespeech/lists"}