{"id":26562644,"url":"https://github.com/architj6/log-classification-system","last_synced_at":"2026-05-01T17:33:33.210Z","repository":{"id":282235983,"uuid":"947867145","full_name":"ArchitJ6/Log-Classification-System","owner":"ArchitJ6","description":"A hybrid log classification system integrating Regex ✅, BERT + Logistic Regression 🤖, and LLMs 📚 to efficiently classify log messages via a FastAPI interface.","archived":false,"fork":false,"pushed_at":"2025-03-13T13:06:37.000Z","size":139,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-22T15:18:23.956Z","etag":null,"topics":["ai","bert","fastapi","large-language-models","log-analysis","log-classification","logistic-regression","machine-learning","natural-language-processing","regex"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ArchitJ6.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-13T11:23:18.000Z","updated_at":"2025-03-13T13:12:23.000Z","dependencies_parsed_at":"2025-03-13T14:23:15.557Z","dependency_job_id":null,"html_url":"https://github.com/ArchitJ6/Log-Classification-System","commit_stats":null,"previous_names":["architj6/log-classification-system"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchitJ6%2FLog-Classification-System","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchitJ6%2FLog-Classification-System/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchitJ6%2FLog-Classification-System/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ArchitJ6%2FLog-Classification-System/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ArchitJ6","download_url":"https://codeload.github.com/ArchitJ6/Log-Classification-System/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244973809,"owners_count":20541025,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","bert","fastapi","large-language-models","log-analysis","log-classification","logistic-regression","machine-learning","natural-language-processing","regex"],"created_at":"2025-03-22T15:18:28.868Z","updated_at":"2026-05-01T17:33:33.161Z","avatar_url":"https://github.com/ArchitJ6.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🚀 Log Classification System  \n\nThis project implements a **hybrid log classification system** using three complementary approaches to handle varying complexity in log patterns. It integrates **Regular Expressions (Regex) ✅, Sentence Transformer + Logistic Regression 🤖, and Large Language Models (LLMs)** 📚 to ensure flexibility and accuracy in classifying log messages.  \n\n## ✨ Features  \n\n- ⚡ **FastAPI Interface**: Provides an API endpoint for classifying log messages from CSV files.  \n- 🔍 **Three-Tier Classification**:  \n  - **Regex-based classification** ✅ for structured patterns.  \n  - **BERT + Logistic Regression** 🤖 for complex, labeled data.  \n  - **LLM fallback** 📚 for handling unknown or insufficiently labeled patterns.  \n- 📂 **Efficient Model Handling**: Uses a pre-trained model (`log_classifier.joblib`) for inference.  \n\n## 🔄 Classification Flow  \n\n1. 📥 **Log Message Input**  \n2. 📝 **Regex Classification**  \n   - If a valid class is found, return it.  \n   - If the pattern is unknown, proceed to step 3.  \n3. 🧠 **BERT-based Classification (if enough training samples exist)**  \n   - If confident, return the predicted class.  \n   - If uncertain, proceed to step 4.  \n4. 🤯 **LLM-based Classification** 📚 \n   - Uses a large language model to predict the class for unknown patterns.  \n\n## 🎯 Decision Flow\n![decision_flow](decision_flow.png)\n\n## 📂 File Structure  \n\n```\n├── models  \n│   ├── log_classifier.joblib  \n├── testing  \n│   ├── test.csv  \n│   ├── output.csv  \n├── training  \n│   ├── dataset  \n│   │   ├── data.csv  \n│   ├── train.ipynb  \n├── bert_helper.py  \n├── classify.py  \n├── llm_helper.py  \n├── main.py  \n├── regex_helper.py  \n```\n\n## 🌐 API Usage  \n\n### 📌 **Endpoint: `/classify/`**  \n\n- 📤 **Method**: `POST`  \n- 📥 **Request**: Upload a CSV file with `source` and `log_message` columns.  \n- 📄 **Response**: A classified CSV file with an additional `target_label` column.  \n\n### 📌 **Example Request (Python)**  \n\n```python\nimport requests\n\nurl = \"http://localhost:8000/classify/\"\nfiles = {\"file\": open(\"test.csv\", \"rb\")}\n\nresponse = requests.post(url, files=files)\nif response.status_code == 200:\n    with open(\"classified_output.csv\", \"wb\") as f:\n        f.write(response.content)\n    print(\"✅ Classified file saved as classified_output.csv\")\nelse:\n    print(\"❌ Error:\", response.json())\n```\n\n\n## ⚙️ Setup \u0026 Installation  \n\n### **1️⃣ Clone the Repository**  \n```sh\ngit clone https://github.com/ArchitJ6/Log-Classification-System.git\ncd Log-Classification-System\n```\n\n### **2️⃣ Install Dependencies**  \n```sh\npip install -r requirements.txt\n```\n\n### **3️⃣ Run FastAPI Server**  \n```sh\nfastapi run main.py\n```\n\n## 🏋️ Model Training  \n\nTo train the classification model, run the Jupyter notebook:  \n\n```sh\njupyter notebook training/train.ipynb\n```\n\nThe model will be saved as `models/log_classifier.joblib`.\n\n## 🧑‍💻 Contributing\nContributions are welcome! Fork the project and submit your pull requests.\n\n## 📜 License\nThis project is licensed under the **MIT License**.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farchitj6%2Flog-classification-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farchitj6%2Flog-classification-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farchitj6%2Flog-classification-system/lists"}