https://github.com/scorpia2004/lazarus
poc for Erdemir
https://github.com/scorpia2004/lazarus
Last synced: 3 months ago
JSON representation
poc for Erdemir
- Host: GitHub
- URL: https://github.com/scorpia2004/lazarus
- Owner: SCORPIA2004
- Created: 2026-04-01T14:15:05.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-01T14:16:43.000Z (3 months ago)
- Last Synced: 2026-04-04T01:00:02.259Z (3 months ago)
- Size: 2.93 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Idea Verification System — RAG Double-Checker for BERT Classifier
> **POC Project** | Erdemir | Timeline: 6-8 days
---
## Overview
Erdemir is a steel production company with ~10,000 employees. As part of a long-standing company tradition, every worker submits ideas for improving work processes and production. To date, over **300,000 ideas** have been collected.
An existing **BERT-based NLP classifier** has been developed to automatically evaluate incoming ideas and output an `Approve` / `Reject` decision, achieving **>70% accuracy**. The goal of this project is to build a **RAG (Retrieval-Augmented Generation) system** that acts as a **secondary verification layer** on top of the BERT classifier.
---
## Problem Statement
The BERT classifier processes a plain-text idea submission and outputs a binary decision:
```
Input: Plain-text idea (submitted by employee)
Output: Approve | Reject
```
While the BERT system performs well, a second-opinion layer is needed to increase confidence and reduce misclassifications. This project builds that layer as a RAG system.
---
## Solution Architecture
```
Employee Idea (plain text)
│
▼
┌─────────────┐
│ BERT │ ──── Approve / Reject
│ Classifier │
└─────────────┘
│
▼ (BERT output + original idea)
┌─────────────┐
│ RAG System │ ──── Confirm / Override
│ (this repo) │
└─────────────┘
│
▼
Final Decision
```
The RAG system receives both the **original idea text** and the **BERT classification output**, retrieves relevant context from the historical idea dataset, and produces a verification judgment — either confirming or challenging the BERT result.
---
## Scope
This repository covers the **POC (Proof of Concept)** phase:
- Set up and configure the RAG system for local machine deployment (RTX 5090 or equivalent, targeting models like QwQ 3.5 or similar)
- Integrate with the existing BERT classifier output
- Evaluate RAG verification accuracy against the labeled dataset
- Determine whether fine-tuning is required based on data characteristics
---
## Current Status
| Component | Status |
|---|---|
| BERT classifier | ✅ Exists, working |
| Labeled idea dataset | ✅ Available |
| RAG system | 🔧 In development (this repo) |
| Fine-tuning | ❓ TBD — pending data review |
---
## Project Details
- **Client:** Erdemir
- **Scale:** 300,000+ historical ideas, 10,000 employees
- **Deployment target:** Local machine (RTX 5090 / QwQ 3.5 or similar)
- **POC estimate:** 6–8 person-days
- **Full build timeline:** 5–6 months
---
## Action Items
- [ ] Obtain project proposition and technical documentation from Eren
- [ ] Review dataset to determine fine-tuning requirements
- [ ] Set up RAG pipeline and connect to BERT output
- [ ] Evaluate end-to-end system performance
---
## Open Questions
- Is fine-tuning required for the RAG model? (Pending data review)
- What retrieval strategy best fits the idea domain? (semantic search, keyword, hybrid?)