https://github.com/chaman2003/printchakra-ai
Al-powered document scanning and processing system with real-time desktop-mobile synchronization. Built with Flask (Python) backend, React + TypeScript frontend, OpenCV image enhancement, Tesseract OCR and Socket.IO WebSockets for seamless printing and workflow management.
https://github.com/chaman2003/printchakra-ai
computer-vision flask html-css-javascript ngrok python react-js typescript websocket
Last synced: 3 months ago
JSON representation
Al-powered document scanning and processing system with real-time desktop-mobile synchronization. Built with Flask (Python) backend, React + TypeScript frontend, OpenCV image enhancement, Tesseract OCR and Socket.IO WebSockets for seamless printing and workflow management.
- Host: GitHub
- URL: https://github.com/chaman2003/printchakra-ai
- Owner: chaman2003
- Created: 2025-11-13T17:06:35.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-03-10T21:31:53.000Z (3 months ago)
- Last Synced: 2026-03-11T01:48:23.926Z (3 months ago)
- Topics: computer-vision, flask, html-css-javascript, ngrok, python, react-js, typescript, websocket
- Language: Jupyter Notebook
- Homepage: https://printchakra.vercel.app/
- Size: 134 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README




---
## Overview
PrintChakra is a Windows-first document workflow platform built for scanning, OCR, print configuration, phone-assisted capture, and voice-driven interaction. It combines a Flask backend and a React frontend into a single experience for processing documents from intake to output.
It is designed around practical operations:
- Upload and manage document images and PDFs
- Clean and enhance scans before OCR or printing
- Extract text with OCR pipelines
- Configure print and scan workflows from the browser
- Capture documents from a phone-oriented flow
- Use voice sessions for transcription, orchestration, and spoken responses
- Keep UI state synchronized in real time through Socket.IO
---
## Quick Links
- [Stack](#stack)
- [Repository Layout](#repository-layout)
- [Setup](#setup)
- [Docker](#docker)
- [Run Locally](#run-locally)
- [Environment](#environment-configuration)
- [Features](#feature-highlights)
- [Architecture](#architecture)
- [Troubleshooting](#troubleshooting)
---
## Stack
### Backend
- Python
- Flask
- Flask-SocketIO
- OpenCV
- PaddleOCR
- Tesseract
- PyMuPDF and PDF tooling
- pywin32 for Windows printer integration
- Local Whisper, TTS, and LLM support
- Groq fallback for chat, STT, and TTS
### Frontend
- React 19
- TypeScript
- Chakra UI
- Framer Motion
- Axios
- Socket.IO client
- React Router
- Responsive dashboard and landing page
---
## Feature Highlights
### OCR Pipeline
Advanced document cleanup and OCR flow for scanned or photographed pages.
- Image enhancement
- Text extraction
- PDF and image handling
- Notebook-driven pipeline experimentation
### Print Workflow
Browser-based print setup and orchestration for Windows environments.
- Print configuration UI
- Queue and device awareness
- Real-time status updates
- Workflow-driven execution
### Voice Workflow
Voice session startup, transcription, chat, and speech response.
- Local-first voice stack
- Groq fallback support
- Frontend voice UI integration
- Orchestration-ready responses
### Phone Capture
A phone-oriented intake flow for documents captured outside the desktop UI.
- Capture handoff
- Document intake path
- Processing-ready uploads
### Real-Time Dashboard
Live file browsing, previews, system info, and document actions.
- Socket updates
- File previews
- Device panels
- Workflow access points
### Windows Integration
Built around Windows printer and local device workflows.
- pywin32 printing
- Local file paths
- Windows-friendly setup
- Optional HTTPS locally
---
## Repository Layout
```text
printchakra/
├── README.md
├── Document_Processing_Pipeline.ipynb
├── backend/
│ ├── app.py
│ ├── requirements.txt
│ ├── .venv/
│ ├── app/
│ │ ├── api/
│ │ ├── config/
│ │ ├── core/
│ │ ├── features/
│ │ ├── modules/
│ │ ├── sockets/
│ │ ├── utils/
│ │ ├── print_scripts/
│ │ └── .env
│ ├── public/
│ │ └── data/
│ └── test/
├── frontend/
│ ├── package.json
│ ├── public/
│ └── src/
└── phase-2/
```
### Important Files
- Backend entry point: [backend/app.py](backend/app.py)
- Backend dependencies: [backend/requirements.txt](backend/requirements.txt)
- Frontend dependencies: [frontend/package.json](frontend/package.json)
- Backend environment file used by settings: [backend/app/.env](backend/app/.env)
- Notebook pipeline: [Document_Processing_Pipeline.ipynb](Document_Processing_Pipeline.ipynb)
---
## Setup
### Requirements
- Windows 10 or 11
- Python 3.10 recommended
- Node.js 18+
- npm
### Backend Setup
```powershell
cd backend
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
```
If [backend/.venv](backend/.venv) already exists and is working, reuse it.
### Frontend Setup
```powershell
cd frontend
npm install
```
---
## Docker
PrintChakra now includes a production-oriented Docker setup for the full app:
- Backend container on port 5000
- Frontend container on port 3000
- Persistent backend data mounted from [backend/public/data](backend/public/data)
- Linux-native OCR and PDF runtime packages baked into the backend image
- Optional host Ollama access through `host.docker.internal`
### Start With Compose
```powershell
docker compose up --build
```
### Container URLs
- Frontend: http://localhost:3000
- Backend: http://localhost:5000
### Important Docker Notes
- Browser-to-backend routing is controlled by `REACT_APP_API_URL` at frontend build time.
- The backend image sets `POPPLER_PATH=/usr/bin` and `TESSERACT_CMD=/usr/bin/tesseract`.
- Ollama is not bundled; by default Compose points the backend to `http://host.docker.internal:11434`.
- Windows-native printing is not available inside the default Linux container. Linux printing can work if the host exposes CUPS.
---
## Run Locally
### Backend
```powershell
cd backend
.\.venv\Scripts\Activate.ps1
python app.py
```
### Frontend
```powershell
cd frontend
npm run dev
```
### Local URLs
- Frontend: http://localhost:3000
- Backend: http://localhost:5000
If port 3000 is occupied, the frontend may move to another port such as 3001.
---
## Environment Configuration
The backend settings currently load environment variables from [backend/app/.env](backend/app/.env).
When running with Docker Compose, container environment variables override local file-based defaults.
### Example
```env
FRONTEND_URL=http://localhost:3000
BACKEND_PUBLIC_URL=http://localhost:5000
API_CORS_ORIGINS=http://localhost:3000
VOICE_AI_MODEL=smollm2:135m
GROQ_API_KEY=your_key_here
GROQ_LLM_MODEL=llama-3.1-8b-instant
GROQ_STT_MODEL=whisper-large-v3-turbo
GROQ_TTS_ENDPOINT=https://api.groq.com/openai/v1/audio/speech
GROQ_TTS_MODEL=canopylabs/orpheus-v1-english
```
### Optional HTTPS
The backend defaults to HTTP. HTTPS is opt-in.
```env
USE_HTTPS=1
SSL_CERT=certs/cert.pem
SSL_KEY=certs/key.pem
```
---
## Architecture
```mermaid
flowchart TD
A[Phone Capture / Dashboard / Voice UI] --> B[React Frontend]
B --> C[Axios + Socket.IO]
C --> D[Flask Backend]
D --> E[Document Processing Modules]
D --> F[OCR + Image Enhancement]
D --> G[Print and Scan Orchestration]
D --> H[Voice Services]
H --> I[Local Whisper / Local TTS / Local LLM]
H --> J[Groq Fallback]
D --> K[Windows Printing + Local File Storage]
```
---
## Voice Fallback Behavior
PrintChakra uses a local-first voice strategy and can fall back to Groq when local services are unavailable.
Configured fallback areas:
- LLM chat
- Speech-to-text
- Text-to-speech
The `/voice/status` endpoint reports current readiness for local and fallback providers.
---
## Data and Output Locations
Runtime file storage is served through backend data directories inside the backend tree.
Canonical backend test outputs are kept in:
- [backend/test/test_outputs](backend/test/test_outputs)
Redundant generated output folders outside that canonical path were intentionally cleaned up.
---
## Troubleshooting
### Backend does not start
Check:
- Python version is compatible
- The backend virtual environment is activated
- Port 5000 is not occupied by another process
- Dependencies from [backend/requirements.txt](backend/requirements.txt) are installed
### Frontend cannot reach backend
Check:
- Backend is running on port 5000
- Frontend dev server is running
- CORS points to the correct frontend origin
- Backend is not accidentally running under HTTPS while the frontend expects HTTP
### Voice features fail
Check:
- Local voice dependencies installed correctly
- Groq settings are present in [backend/app/.env](backend/app/.env) if fallback is expected
- `/voice/status` reports the providers you expect
### OCR is unavailable or slow
Check:
- PaddleOCR and image dependencies are installed
- PDF tooling is available for conversion paths
- `TESSERACT_CMD` points to a valid binary when running in a container
- GPU support is optional and CPU fallback may be slower
### Docker printing does not work
Check:
- The default containers are Linux-based and cannot use Windows `pywin32` printing
- Linux printing requires host CUPS access and compatible printer visibility
- For Windows printer integration, run the backend locally on Windows instead of inside Docker
---
## Notebook
The repository includes a standalone notebook for experimenting with the document pipeline:
- [Document_Processing_Pipeline.ipynb](Document_Processing_Pipeline.ipynb)
---
## Summary
PrintChakra is a document workflow app centered on OCR, print and scan control, voice interaction, and phone-assisted capture. For local development, use Python 3.10, run the backend on port 5000, run the frontend with `npm run dev`, and keep backend environment values in [backend/app/.env](backend/app/.env).
