https://github.com/shinie19/hybrid-log-classification-system
Log Classification using hybrid classification framework
https://github.com/shinie19/hybrid-log-classification-system
Last synced: 2 months ago
JSON representation
Log Classification using hybrid classification framework
- Host: GitHub
- URL: https://github.com/shinie19/hybrid-log-classification-system
- Owner: shinie19
- Created: 2025-03-08T18:13:03.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-03-08T18:36:07.000Z (2 months ago)
- Last Synced: 2025-03-08T19:28:55.298Z (2 months ago)
- Language: Jupyter Notebook
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Log Classification With Hybrid Classification Framework
This project implements a hybrid log classification system, combining three complementary approaches to handle varying levels of complexity in log patterns. The classification methods ensure flexibility and effectiveness in processing predictable, complex, and poorly-labeled data patterns.
---
## Classification Approaches
1. **Regular Expression (Regex)**:
- Handles the most simplified and predictable patterns.
- Useful for patterns that are easily captured using predefined rules.2. **Sentence Transformer + Logistic Regression**:
- Manages complex patterns when there is sufficient training data.
- Utilizes embeddings generated by Sentence Transformers and applies Logistic Regression as the classification layer.3. **LLM (Large Language Models)**:
- Used for handling complex patterns when sufficient labeled training data is not available.
- Provides a fallback or complementary approach to the other methods.
---
## Folder Structure
1. **`training/`**:
- Contains the code for training models using Sentence Transformer and Logistic Regression.
- Includes the code for regex-based classification.2. **`models/`**:
- Stores the saved models, including Sentence Transformer embeddings and the Logistic Regression model.3. **`resources/`**:
- This folder contains resource files such as test CSV files, output files, images, etc.4. **Root Directory**:
- Contains the FastAPI server code (`server.py`).---
## Setup Instructions
1. **Install Dependencies**:
Make sure you have Python installed on your system. Install the required Python libraries by running the following command:```bash
pip install -r requirements.txt
```2. **Run the FastAPI Server**:
To start the server, use the following command:```bash
uvicorn server:app --reload
```Once the server is running, you can access the API at:
- `http://127.0.0.1:8000/` (Main endpoint)
- `http://127.0.0.1:8000/docs` (Interactive Swagger documentation)
- `http://127.0.0.1:8000/redoc` (Alternative API documentation)---
## Usage
Upload a CSV file containing logs to the FastAPI endpoint for classification. Ensure the file has the following columns:
- `source`
- `log_message`The output will be a CSV file with an additional column `target_label`, which represents the classified label for each log entry.