https://github.com/architj6/log-classification-system
A hybrid log classification system integrating Regex ✅, BERT + Logistic Regression 🤖, and LLMs 📚 to efficiently classify log messages via a FastAPI interface.
https://github.com/architj6/log-classification-system
ai bert fastapi large-language-models log-analysis log-classification logistic-regression machine-learning natural-language-processing regex
Last synced: about 2 months ago
JSON representation
A hybrid log classification system integrating Regex ✅, BERT + Logistic Regression 🤖, and LLMs 📚 to efficiently classify log messages via a FastAPI interface.
- Host: GitHub
- URL: https://github.com/architj6/log-classification-system
- Owner: ArchitJ6
- License: mit
- Created: 2025-03-13T11:23:18.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-13T13:06:37.000Z (over 1 year ago)
- Last Synced: 2025-03-22T15:18:23.956Z (over 1 year ago)
- Topics: ai, bert, fastapi, large-language-models, log-analysis, log-classification, logistic-regression, machine-learning, natural-language-processing, regex
- Language: Jupyter Notebook
- Homepage:
- Size: 136 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# 🚀 Log Classification System
This project implements a **hybrid log classification system** using three complementary approaches to handle varying complexity in log patterns. It integrates **Regular Expressions (Regex) ✅, Sentence Transformer + Logistic Regression 🤖, and Large Language Models (LLMs)** 📚 to ensure flexibility and accuracy in classifying log messages.
## ✨ Features
- ⚡ **FastAPI Interface**: Provides an API endpoint for classifying log messages from CSV files.
- 🔍 **Three-Tier Classification**:
- **Regex-based classification** ✅ for structured patterns.
- **BERT + Logistic Regression** 🤖 for complex, labeled data.
- **LLM fallback** 📚 for handling unknown or insufficiently labeled patterns.
- 📂 **Efficient Model Handling**: Uses a pre-trained model (`log_classifier.joblib`) for inference.
## 🔄 Classification Flow
1. 📥 **Log Message Input**
2. 📝 **Regex Classification**
- If a valid class is found, return it.
- If the pattern is unknown, proceed to step 3.
3. 🧠 **BERT-based Classification (if enough training samples exist)**
- If confident, return the predicted class.
- If uncertain, proceed to step 4.
4. 🤯 **LLM-based Classification** 📚
- Uses a large language model to predict the class for unknown patterns.
## 🎯 Decision Flow

## 📂 File Structure
```
├── models
│ ├── log_classifier.joblib
├── testing
│ ├── test.csv
│ ├── output.csv
├── training
│ ├── dataset
│ │ ├── data.csv
│ ├── train.ipynb
├── bert_helper.py
├── classify.py
├── llm_helper.py
├── main.py
├── regex_helper.py
```
## 🌐 API Usage
### 📌 **Endpoint: `/classify/`**
- 📤 **Method**: `POST`
- 📥 **Request**: Upload a CSV file with `source` and `log_message` columns.
- 📄 **Response**: A classified CSV file with an additional `target_label` column.
### 📌 **Example Request (Python)**
```python
import requests
url = "http://localhost:8000/classify/"
files = {"file": open("test.csv", "rb")}
response = requests.post(url, files=files)
if response.status_code == 200:
with open("classified_output.csv", "wb") as f:
f.write(response.content)
print("✅ Classified file saved as classified_output.csv")
else:
print("❌ Error:", response.json())
```
## ⚙️ Setup & Installation
### **1️⃣ Clone the Repository**
```sh
git clone https://github.com/ArchitJ6/Log-Classification-System.git
cd Log-Classification-System
```
### **2️⃣ Install Dependencies**
```sh
pip install -r requirements.txt
```
### **3️⃣ Run FastAPI Server**
```sh
fastapi run main.py
```
## 🏋️ Model Training
To train the classification model, run the Jupyter notebook:
```sh
jupyter notebook training/train.ipynb
```
The model will be saved as `models/log_classifier.joblib`.
## 🧑💻 Contributing
Contributions are welcome! Fork the project and submit your pull requests.
## 📜 License
This project is licensed under the **MIT License**.