https://github.com/mohammedsaim-quadri/intrusion_detection-system
This project is an Intrusion Detection System (IDS) using machine learning (ML) and deep learning (DL) to detect network intrusions. It leverages the CICIDS2018 dataset to classify traffic as normal or malicious. Key features include data preprocessing, model training, hyperparameter tuning, and Docker containerization for scalable deployment.
https://github.com/mohammedsaim-quadri/intrusion_detection-system
bayesian-optimization cicids2018 cybersecurity datapreprocessing deep-learning docker hyperparameter-tuning intrusion-detection machinelearning neural-networks
Last synced: 11 months ago
JSON representation
This project is an Intrusion Detection System (IDS) using machine learning (ML) and deep learning (DL) to detect network intrusions. It leverages the CICIDS2018 dataset to classify traffic as normal or malicious. Key features include data preprocessing, model training, hyperparameter tuning, and Docker containerization for scalable deployment.
- Host: GitHub
- URL: https://github.com/mohammedsaim-quadri/intrusion_detection-system
- Owner: MohammedSaim-Quadri
- License: other
- Created: 2024-10-23T10:21:42.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-04T22:25:26.000Z (12 months ago)
- Last Synced: 2025-07-04T23:26:49.756Z (12 months ago)
- Topics: bayesian-optimization, cicids2018, cybersecurity, datapreprocessing, deep-learning, docker, hyperparameter-tuning, intrusion-detection, machinelearning, neural-networks
- Language: Python
- Homepage:
- Size: 8.61 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Intrusion Detection System (IDS)




---
## π UPDATE! [July 2025]
πΉ **Demo Video: (click the below image)**
[](https://youtu.be/PJu7FfHhPmQ)
π **Live App:**
[Try the deployed IDS app here](https://ids-api-frontend.onrender.com)
> Weβve launched the first version of the web-based IDS system using the XGBoost model from Part 1!
> This includes fast inference on network flow data, batch upload support, and a clean UI for predictions and visualizations.
> Stay tuned!
---
## Table of Contents
1. [Overview](#overview)
2. [Project Structure](#project-structure)
3. [Features](#key-features)
4. [Performace](#performance-metrics)
5. [Installation](#installation)
6. [Usage](#usage)
7. [Hyperparameter Tuning](#hyperparameter-tuning)
8. [Model Evaluation](#model-evaluation)
9. [Docker Setup](#docker-setup)
10. [Future Work](#future-work)
11. [Contributing](#contributing)
## Overview
This project involves developing an Intrusion Detection System (IDS) using machine learning techniques to identify and prevent network intrusions. The final model is an XGBoost classifier trained on the CICIDS2018 dataset. The project incorporates full data pipeline automation β from ingestion to deployment β and is production-ready with Docker support.

## Project Structure
``` bash
βββ .gitattributes
βββ .gitignore
βββ Dockerfile # Docker configuration for containerization
βββ README.md
βββ requirements.txt # Project dependencies
βββ setup.py # Package setup script
βββ artifacts/ # Folder for artifacts like trained models and preprocessed data
β βββ IDS_data.csv # Original dataset
β βββ model_trained.keras # Trained model
β βββ model_trained.pkl # Trained model
β βββ preprocessor.pkl # Data preprocessor
β βββ test.csv # Test dataset
β βββ train.csv # Training dataset
βββ dataset/ # Folder containing dataset files
β βββ train_data.csv # Raw training data
βββ logs/ # Log files for tracking execution
βββ src/ # Source code for the project
β βββ exception.py # Custom exception handling
β βββ logger.py # Logging module
β βββ utils.py # Utility functions
β βββ components/ # Folder containing main components
β β βββ data_ingestion.py # Data ingestion logic
β β βββ data_transformation.py # Data preprocessing and feature engineering
β β βββ model_trainer.py # Model training and evaluation
β β βββ bayesian_tuner.py # Bayesian hyperparameter tuning
β β βββ optuna_tuner.py # Optuna hyperparameter tuning
β β βββ __init__.py
β βββ __init__.py
```
## Key Features
- **Data Preprocessing**:
- Data ingestion and transformation processes to clean and prepare the CICIDS2018 dataset.
- Handling missing values, encoding categorical features, and scaling numerical data.
- **Model Training**:
- Includes neural networks (commented), with final model selected as XGBoost based on performance.
- Training and evaluation of the model with performance metrics.
- **Hyperparameter Tuning**:
- Utilization of Optuna for optimizing hyperparameters to enhance model performance.
- **Model Evaluation**:
- Metrics used include accuracy, precision, recall, F1 score, and ROC AUC.
- **Docker Containerization**:
- The project includes a Dockerfile to simplify the deployment of the IDS. This allows the application to run consistently across various environments.
## Performance Metrics
Final model: XGBoost Classifier
- Testing Accuracy Score: 89.75%
- Training Accuracy Score: 89.87%
- Testing F1 Score: 88.27%
- Training F1 Score: 88.40%
- Testing Recall Score: 89.75%
- Training Recall Score: 89.87%
- Testing Precision Score: 89.08%
- Training Precision Score: 89.31%
- Balanced Accuracy Score: 86.55%
- ROC AUC (Testing): 99.17%
- ROC AUC (Training): 99.21%
These results indicate a well-performing model that generalizes effectively to unseen data, achieving high accuracy and a strong balance between precision and recall.
## Installation
1. Clone the repository:
```bash
git clone https://github.com/username/IDS.git
```
2. Navigate to the project directory:
```bash
cd IDS
```
3. Create and activate a virtual environment (optional but recommended):
```bash
python -m venv venv
source venv/bin/activate # For Windows: venv\Scripts\activate
```
4. Install required dependencies:
```bash
pip install -r requirements.txt
```
## Usage
1. Run the data ingestion pipeline:
```bash
python src/components/data_ingestion.py
```
2. (Optional) Perform hyperparameter tuning:
```bash
python src/components/bayesian_tuner.py
python src/components/optuna_tuner.py
```
3. View logs for detailed execution info:
```bash
tail -f logs/*.log
```
## Hyperparameter Tuning
This project includes two methods for tuning model hyperparameters:
1. **Bayesian Optimization:** This uses probabilistic models to explore the hyperparameter space. Run it using:
```bash
python src/components/bayesian_tuner.py
```
2. **Optuna:** A popular framework for efficient hyperparameter optimization. To use Optuna, run:
```bash
python src/components/optuna_tuner.py
```
Both methods aim to improve the modelβs accuracy while reducing training time.
## Model Evaluation
After training the model, it's evaluated using the following metrics:
- **Accuracy:** Measures the percentage of correct predictions.
- **Precision & Recall:** Useful for understanding the trade-off between false positives and false negatives.
- **F1-score:** A balanced measure between precision and recall.
## Docker Setup
The Dockerfile provided sets up the environment with TensorFlow and Python 3 support, installs the necessary dependencies, and exposes the required port for monitoring.
#### Option 1: Build the Docker Image Locally
1. **Build the Docker image**:
```bash
docker build -t ids-system .
```
2. **Run the Docker container**:
```bash
docker run -p 6006:6006 ids-system
```
This exposes port 6006 for TensorBoard or other monitoring tools.
3. **Default Command: The default command in the container is to run the data ingestion script**:
```bash
CMD ["python", "src/components/data_ingestion.py"]
```
You can modify the command to run other scripts as needed.
4. **Access Monitoring Tools**: Access TensorBoard or any other monitoring tools at http://localhost:6006.
### Option 2: Pull the Docker Image from Docker Hub
If you prefer not to build the image locally, you can directly pull the pre-built Docker image from Docker Hub:
1. **Pull the Docker image**:
```bash
docker pull saimquadri/ids-project
```
2. **Run the Docker container**:
```bash
docker run -p 6006:6006 saimquadri/ids-project
```
This will expose port 6006 for monitoring tools like TensorBoard.
## Future Work
- Add support for additional machine learning algorithms.
- Implement real-time intrusion detection using streaming data.
- Improve model accuracy with advanced feature engineering techniques.
- Expand Docker support to Kubernetes for large-scale deployments.
- Feature selection and ensemble model stacking.
- Integration with cloud-based dashboards for alerts
## Contributing
We welcome contributions from the community! Please feel free to fork the repository and submit a pull request with your improvements. For major changes, please open an issue first to discuss what you would like to change.
To contribute:
1. Fork the repository.
2. Create a new branch for your feature:
```bash
git checkout -b feature/your-feature
```
3. Make your changes and push to your branch:
```bash
git push origin feature/your-feature
```
4. Create a pull request
## License
This project is licensed under the MIT License(Modified for Attribution and Non-Commercial Use). See the [LICENSE](./LICENSE) file for details.