https://github.com/ryu-ryuk/kyoka

a CLI-first system that leverages Large Language Models (LLMs) to interpret natural language queries and retrieve relevant information from complex, unstructured insurance documents (PDFs, DOCX, emails)
https://github.com/ryu-ryuk/kyoka

cli decision-making llm parser python rag reasoning

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/ryu-ryuk/kyoka
Owner: ryu-ryuk
License: other
Created: 2025-07-11T15:25:41.000Z (12 months ago)
Default Branch: master
Last Pushed: 2025-07-27T07:35:35.000Z (12 months ago)
Last Synced: 2025-08-11T08:23:34.715Z (11 months ago)
Topics: cli, decision-making, llm, parser, python, rag, reasoning
Language: Python
Homepage:
Size: 69.3 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Kyoka

the name kyoka (狂歌, “mad poem”) reflects the system’s role in turning formal documents into intelligent, structured responses — just as kyōka poetry reimagines tradition with clarity and wit.

---

**Kyoka** is a high-performance RAG system that processes documents and answers complex queries with precise, source-cited responses. Built for insurance, legal, HR, and compliance domains.

## Features

- **Multi-format Support**: PDF, DOCX, and email document processing
- **Hybrid Retrieval**: Combines dense (Pinecone) and sparse (BM25) search for superior accuracy
- **Semantic Chunking**: Intelligent text splitting with contextual overlap
- **Re-ranking**: Uses Flashrank for refined result relevance
- **Structured Responses**: JSON output with answers, rationale, and source attribution
- **Async Processing**: Concurrent question handling for optimal performance
- **GPU Acceleration**: CUDA support for faster embedding generation
- **Intelligent Caching**: Document and retriever caching for repeated queries

## Architecture

The system operates in two phases:

1. **Document Ingestion**: Structured parsing → Semantic chunking → Vector embedding → Pinecone indexing
2. **Query Processing**: Hybrid retrieval → Re-ranking → LLM generation → Structured response

#### Preview

Untitled-2025-07-22-1029(2)

---

View Detailed Architecture Diagram
Untitled-2025-07-22-1029(1)

## Tech Stack

- **Backend**: FastAPI
- **Vector Database**: Pinecone
- **LLM**: Google Gemini 1.5 Flash
- **Embeddings**: HuggingFace (intfloat/e5-small-v2)
- **Re-ranker**: Flashrank (ms-marco-TinyBERT-L-2-v2)
- **Document Processing**: PyMuPDF, python-docx
- **Framework**: LangChain

## Setup

1. **Install Dependencies**
```bash
pip install -r requirements.txt
```

2. **Environment Configuration**

Create a `.env` file:
```env
PINECONE_API_KEY=your_pinecone_api_key
GEMINI_API_KEY=your_gemini_api_key
```

3. **Run the Application**
```bash
python main.py
```

## API Endpoints

### Process Document from URL
**POST** `/hackrx/run`

```json
{
"documents": "https://neow.com/document.pdf",
"questions": ["What is the grace period?", "What are the waiting periods?"]
}
```

### Upload Document File
**POST** `/hackrx/upload`

```bash
curl -X POST "http://localhost:8000/hackrx/upload" \
-F "file=@document.pdf" \
-F "questions=What is the grace period?" \
-F "questions=What are the waiting periods?"
```

### Response Format
```json
{
"answers": [
{
"query": "What is the grace period?",
"answer": "A 15-day grace period is provided for installment premium payments...",
"rationale": "Based on policy document analysis of premium payment terms",
"source": "page 31, page 3"
}
]
}
```

## Performance

- **Response Time**: ~8.5 seconds for 2 questions (with caching: ~4.5 seconds)
- **Supported Formats**: PDF, DOCX, email (.eml, .msg, .txt)
- **Concurrent Processing**: Up to 8 parallel questions
- **GPU Acceleration**: 3-5x faster embedding generation

## Configuration

Key parameters can be adjusted in `constants.py`:

```python
MAX_CHUNK_TOKENS = 500 # Chunk size for processing speed
MAX_CONTEXT_CHUNKS = 3 # Context chunks sent to LLM
CONCURRENT_QUESTION_LIMIT = 8 # Parallel question processing
GEMINI_MAX_OUTPUT_TOKENS = 300 # Response length limit
```

## Evaluation Criteria Compliance

- **Accuracy**: Hybrid retrieval + re-ranking ensures precise context matching
- **Token Efficiency**: Optimized chunking and context selection minimizes LLM usage
- **Latency**: Async processing and caching deliver sub-10-second responses
- **Reusability**: Modular design with configurable components
- **Explainability**: Structured responses with rationale and source traceability

## Health Check

```bash
curl http://localhost:8000/health
```

Returns system status, GPU availability, cache statistics, and configuration details.

## License

[License](LICENSE)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ryu-ryuk/kyoka

Awesome Lists containing this project

README

Kyoka