https://github.com/shinie19/hybrid-log-classification-system

Log Classification using hybrid classification framework
https://github.com/shinie19/hybrid-log-classification-system

Last synced: 2 months ago
JSON representation

Log Classification using hybrid classification framework

Host: GitHub
URL: https://github.com/shinie19/hybrid-log-classification-system
Owner: shinie19
Created: 2025-03-08T18:13:03.000Z (2 months ago)
Default Branch: main
Last Pushed: 2025-03-08T18:36:07.000Z (2 months ago)
Last Synced: 2025-03-08T19:28:55.298Z (2 months ago)
Language: Jupyter Notebook
Size: 0 Bytes
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Log Classification With Hybrid Classification Framework

This project implements a hybrid log classification system, combining three complementary approaches to handle varying levels of complexity in log patterns. The classification methods ensure flexibility and effectiveness in processing predictable, complex, and poorly-labeled data patterns.

---

## Classification Approaches

1. **Regular Expression (Regex)**:
- Handles the most simplified and predictable patterns.
- Useful for patterns that are easily captured using predefined rules.

2. **Sentence Transformer + Logistic Regression**:
- Manages complex patterns when there is sufficient training data.
- Utilizes embeddings generated by Sentence Transformers and applies Logistic Regression as the classification layer.

3. **LLM (Large Language Models)**:
- Used for handling complex patterns when sufficient labeled training data is not available.
- Provides a fallback or complementary approach to the other methods.

![architecture](resources/arch.png)

---

## Folder Structure

1. **`training/`**:
- Contains the code for training models using Sentence Transformer and Logistic Regression.
- Includes the code for regex-based classification.

2. **`models/`**:
- Stores the saved models, including Sentence Transformer embeddings and the Logistic Regression model.

3. **`resources/`**:
- This folder contains resource files such as test CSV files, output files, images, etc.

4. **Root Directory**:
- Contains the FastAPI server code (`server.py`).

---

## Setup Instructions

1. **Install Dependencies**:
Make sure you have Python installed on your system. Install the required Python libraries by running the following command:

```bash
pip install -r requirements.txt
```

2. **Run the FastAPI Server**:
To start the server, use the following command:

```bash
uvicorn server:app --reload
```

Once the server is running, you can access the API at:
- `http://127.0.0.1:8000/` (Main endpoint)
- `http://127.0.0.1:8000/docs` (Interactive Swagger documentation)
- `http://127.0.0.1:8000/redoc` (Alternative API documentation)

---

## Usage

Upload a CSV file containing logs to the FastAPI endpoint for classification. Ensure the file has the following columns:
- `source`
- `log_message`

The output will be a CSV file with an additional column `target_label`, which represents the classified label for each log entry.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shinie19/hybrid-log-classification-system

Awesome Lists containing this project

README