Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/raju-2003/indiaai-cyberguard-ai-hackathon
An NLP-powered system to simplify cybercrime reporting by analyzing descriptions, categorizing incidents, and providing actionable insights.
https://github.com/raju-2003/indiaai-cyberguard-ai-hackathon
matplotlib nltk numpy pandas python random-forest-classifier re scikit-learn seaborn shap spacy wordcloud
Last synced: about 1 month ago
JSON representation
An NLP-powered system to simplify cybercrime reporting by analyzing descriptions, categorizing incidents, and providing actionable insights.
- Host: GitHub
- URL: https://github.com/raju-2003/indiaai-cyberguard-ai-hackathon
- Owner: raju-2003
- Created: 2024-11-22T15:53:58.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-11-22T16:10:39.000Z (about 1 month ago)
- Last Synced: 2024-11-22T16:37:50.689Z (about 1 month ago)
- Topics: matplotlib, nltk, numpy, pandas, python, random-forest-classifier, re, scikit-learn, seaborn, shap, spacy, wordcloud
- Language: Jupyter Notebook
- Homepage: https://drive.google.com/file/d/1mwxB3uQAXCGdjd3nYwsymlLlXZ4fcv8y/view?usp=sharing
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# IndiaAI CyberGuard AI Hackathon
# Cybercrime Reporting NLP System
This project leverages advanced **Natural Language Processing (NLP)** techniques to enhance the process of filing cybercrime reports on the **National Cyber Crime Reporting Portal (NCRP)**. The system streamlines the reporting process by analyzing textual descriptions and supporting media files, identifying themes, and ensuring accurate and user-friendly submissions.
---
## 📌 Project Goals
- Simplify the process of filing cybercrime reports.
- Accurately classify cybercrime categories based on user descriptions.
- Provide actionable insights to law enforcement using data trends.---
## 🚀 Features
- **Sentiment Analysis**: Detects tone and urgency of reports to prioritize cases.
- **Topic Modeling**: Identifies commonly reported themes like phishing, identity theft, and financial fraud.
- **Text Classification**: Categorizes criminal activities with high accuracy using a Random Forest Classifier.
- **Visual Insights**: Generates word clouds, confusion matrices, and sentiment trend graphs.---
## 🛠️ Tech Stack
- **Programming Language**: Python
- **Libraries**:
- Data Handling: `Pandas`, `NumPy`
- Text Processing: `spaCy`, `NLTK`, `re`
- Machine Learning: `Scikit-learn`, `TF-IDF Vectorizer`
- Visualization: `Matplotlib`, `Seaborn`, `WordCloud`---
## 🔄 Workflow
1. **Data Loading and Cleaning**
- Load datasets, remove duplicates, and handle missing values.
2. **Exploratory Data Analysis (EDA)**
- Generate word clouds and analyze sentiment trends.
3. **Text Preprocessing**
- Perform tokenization, lemmatization, stopword removal, and feature extraction.
4. **Model Training and Evaluation**
- Train the Random Forest Classifier and evaluate using precision, recall, and F1-score.
5. **Visualization**
- Display feature importance, confusion matrices, and sentiment trends.---
## 📊 Results and Insights
- The system captures key cybercrime themes and provides structured feedback for users.
- Sentiment analysis identifies peaks in negative sentiment to indicate spikes in criminal activity.
- The Random Forest Classifier achieves robust accuracy and handles high-dimensional text data effectively.---
## 💡 How to Use
1. Download and Use:
```bash
https://drive.google.com/file/d/1mwxB3uQAXCGdjd3nYwsymlLlXZ4fcv8y/view?usp=sharing