https://github.com/brints/unraveldocs-api
UnravelDocs is a File Extractor API that extracts information from up,loaded files and converts them into editable docs and pdf.
https://github.com/brints/unraveldocs-api
apache-pdfbox hibernate java junit maven mockito spring-boot spring-security tesseract-ocr
Last synced: about 1 month ago
JSON representation
UnravelDocs is a File Extractor API that extracts information from up,loaded files and converts them into editable docs and pdf.
- Host: GitHub
- URL: https://github.com/brints/unraveldocs-api
- Owner: Brints
- License: mit
- Created: 2025-02-08T16:46:12.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-02-27T15:04:17.000Z (about 1 month ago)
- Last Synced: 2026-02-27T20:48:42.045Z (about 1 month ago)
- Topics: apache-pdfbox, hibernate, java, junit, maven, mockito, spring-boot, spring-security, tesseract-ocr
- Language: Java
- Homepage: https://api.unraveldocs.xyz/swagger-ui/index.html
- Size: 2.02 MB
- Stars: 3
- Watchers: 0
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐ UnravelDocs API
[](https://openjdk.org/)
[](https://spring.io/projects/spring-boot)
[](LICENSE)
[](https://maven.apache.org/)
> A comprehensive, enterprise-grade document processing and management platform designed for extracting insights from documents with OCR, AI-powered analysis, secure storage, and multi-provider payment integrations.
---
## ๐ Table of Contents
- [Features](#-features)
- [Architecture](#-architecture)
- [Tech Stack](#-tech-stack)
- [Prerequisites](#-prerequisites)
- [Getting Started](#-getting-started)
- [Configuration](#-configuration)
- [Docker Deployment](#-docker-deployment)
- [API Documentation](#-api-documentation)
- [Testing](#-testing)
- [CI/CD Pipeline](#-cicd-pipeline)
- [Project Structure](#-project-structure)
- [Contributing](#-contributing)
- [License](#-license)
---
## โจ Features
### Document Processing
- **OCR Processing**: Extract text from images and scanned documents using Tesseract OCR and Google Cloud Vision API
- **PDF Processing**: Extract and analyze content from PDF documents using Apache PDFBox
- **Word Export**: Convert processed documents to Microsoft Word format using Apache POI
- **AI-Powered Analysis** *(Planned)*: Entity extraction, classification, and document summarization
### User Management & Security
- **User Authentication**: JWT-based authentication with access and refresh tokens
- **OAuth 2.0 Integration**: Social login support via Google, GitHub, etc.
- **Role-Based Access Control (RBAC)**: Differentiated access for users and administrators
- **Login Attempt Tracking**: Monitor and limit failed login attempts for security
- **Email Verification**: OTP-based email verification for new accounts
- **Password Reset**: Secure password reset flow with email notifications
### Team Management
- **Team Creation**: OTP-verified team creation for Premium/Enterprise subscribers
- **Subscription Tiers**:
- **Team Premium**: $29/month or $290/year, 200 docs/month, max 10 members
- **Team Enterprise**: $79/month or $790/year, unlimited docs, max 15 members
- **10-Day Free Trial**: Automatic trial period with 3-day warning emails
- **Flexible Billing**: Monthly or yearly subscription with auto-renewal
- **Subscription Management**: Cancel anytime but keep access until period ends
- **Member Management**: Add, remove, and batch remove members
- **Role-Based Access**: Owner, Admin, and Member roles with distinct permissions
- **Admin Promotion**: Enterprise-only feature to promote members to admin
- **Email Invitations**: Enterprise-only email invitation system with unique tokens
- **Team Lifecycle**: Close and reactivate teams
- **Privacy Controls**: Email masking for non-owner member views
### Payment Processing
- **Multi-Gateway Support**:
- **Stripe**: Full integration with webhooks, subscriptions, and one-time payments
- **Paystack**: Complete African payment gateway integration
- **PayPal**: International payment support *(stub)*
- **Flutterwave**: African payment gateway *(stub)*
- **Chappa**: Ethiopian payment gateway *(stub)*
### Subscription Plans
| Plan | Monthly | Yearly | Docs/Month | OCR Pages |
|-----------------|---------|---------|------------|-----------|
| Free | $0 | - | 5 | 25 |
| Starter | $9.99 | $89.99 | 30 | 150 |
| Pro | $19.99 | $189.99 | 100 | 500 |
| Business | $49.99 | $489.99 | 500 | 2,500 |
| Team Premium | $29.00 | $290.00 | 200 | 1,000 |
| Team Enterprise | $79.00 | $790.00 | Unlimited | |
> Yearly plans include 17% savings
### Storage Allocation
| Plan | Storage Limit |
|-----------------|---------------|
| Free | 120 MB |
| Starter | 2.66 GB |
| Pro | 12.66 GB |
| Business | 29.66 GB |
| Team Premium | 199.66 GB |
| Team Enterprise | Unlimited |
> Storage is automatically tracked when documents are uploaded and reclaimed when deleted.
### Currency Conversion
- **Real-time Exchange Rates**: Prices displayed in user's local currency
- **60+ Supported Currencies**: USD, EUR, GBP, NGN, INR, JPY, AUD, CAD, and more
- **Daily Rate Updates**: Exchange rates refreshed automatically via exchangerate-api.com
- **Fallback Rates**: Cached rates ensure service availability
- **Multi-Currency Support**: Accept payments in multiple currencies
- **Receipt Generation**: Automatic PDF receipt generation with AWS S3 storage
### Search & Analytics
- **Elasticsearch Integration**: Full-text search across documents, users, and payments
- **Kibana Dashboard**: Visual analytics and monitoring
### Communication & Notifications
- **Email Services**: Multi-provider email support (AWS SES, Mailgun)
- **SMS Notifications**: Twilio integration for SMS/voice notifications
- **Push Notifications**: Real-time notification system
### Administration
- **User Management**: View, activate/deactivate users, manage roles
- **Subscription Plan Management**: CRUD operations for subscription plans
- **Document Oversight**: Monitor, view, and moderate documents
- **System Statistics**: Real-time metrics on users, documents, and subscriptions
- **Admin Action Audit Logging**: Track all administrative actions
### Cloud & Storage
- **AWS S3**: Secure document and receipt storage
- **Cloudinary**: Image optimization and CDN delivery
- **CloudFront**: Content delivery network integration
### Internationalization
- **Multi-Language Support**: i18n ready for multiple languages and regional formats
---
## ๐ Architecture
```mermaid
graph TB
subgraph "Client Layer"
WEB[Web Client]
MOBILE[Mobile Client]
end
subgraph "API Gateway"
API[Spring Boot API
Port: 8080]
end
subgraph "Message Brokers"
RABBIT[RabbitMQ
Port: 5672]
KAFKA[Apache Kafka
Port: 9092]
end
subgraph "Data Layer"
PG[(PostgreSQL
Port: 5432)]
REDIS[(Redis
Port: 6379)]
ES[(Elasticsearch
Port: 9200)]
end
subgraph "Cloud Services"
S3[AWS S3]
SES[AWS SES]
CLOUD[Cloudinary]
GCP[Google Vision]
end
subgraph "Monitoring"
KIBANA[Kibana
Port: 5601]
KAFKA_UI[Kafka UI
Port: 8090]
end
WEB --> API
MOBILE --> API
API --> PG
API --> REDIS
API --> ES
API --> RABBIT
API --> KAFKA
API --> S3
API --> SES
API --> CLOUD
API --> GCP
ES --> KIBANA
KAFKA --> KAFKA_UI
```
---
## ๐ Tech Stack
### Core Framework
| Technology | Version | Purpose |
|------------|---------|---------|
| Java | 25 | Programming Language |
| Spring Boot | 4.0.1 | Application Framework |
| Spring Security | 6.x | Authentication & Authorization |
| Spring Data JPA | 3.x | Data Persistence |
| Spring Data Redis | 3.x | Caching |
| Spring Data Elasticsearch | 3.x | Search Engine |
| Spring AMQP | 3.x | RabbitMQ Messaging |
| Spring Kafka | 3.x | Kafka Messaging |
### Database & Storage
| Technology | Version | Purpose |
|------------|---------|---------|
| PostgreSQL | 17 | Primary Database |
| Redis | 7 (Alpine) | Caching & Session Store |
| Elasticsearch | 8.11.0 | Full-Text Search |
| Flyway | 10.x | Database Migrations |
### Message Brokers
| Technology | Version | Purpose |
|------------|---------|---------|
| RabbitMQ | Latest | Event-Driven Messaging |
| Apache Kafka | 3.7.0 | Stream Processing |
### Cloud Services
| Service | Purpose |
|---------|---------|
| AWS S3 | File Storage |
| AWS SES | Email Delivery |
| AWS SNS | Push Notifications |
| Cloudinary | Image CDN |
| Google Cloud Vision | OCR Processing |
### Document Processing
| Library | Version | Purpose |
|---------|---------|---------|
| Tesseract (Tess4J) | 5.15.0 | OCR Engine |
| Apache PDFBox | 3.0.4 | PDF Processing |
| Apache POI | 5.4.1 | Word Document Export |
| OpenCV | 4.9.0 | Image Processing |
| OpenPDF | 1.3.35 | PDF Generation |
### Payment Gateways
| Provider | SDK Version | Status |
|----------|-------------|--------|
| Stripe | 31.0.0 | โ
Full |
| Paystack | Custom | โ
Full |
| PayPal | - | ๐ฒ Stub |
| Flutterwave | - | ๐ฒ Stub |
| Chappa | - | ๐ฒ Stub |
### Security & Authentication
| Technology | Version | Purpose |
|------------|---------|---------|
| JWT (jjwt) | 0.12.6 | Token Authentication |
| OAuth 2.0 | Spring Security | Social Login |
| Bucket4j | 8.1.0 | Rate Limiting |
### Communication
| Service | Purpose |
|---------|---------|
| Mailgun | Email Delivery |
| AWS SES | Email Delivery |
| Twilio | SMS & Voice |
### Development & Utilities
| Tool | Version | Purpose |
|------|---------|---------|
| Lombok | 1.18.42 | Boilerplate Reduction |
| MapStruct | 1.5.5 | Object Mapping |
| SpringDoc OpenAPI | 3.0.0 | API Documentation |
| Logstash Logback | 7.4 | Structured Logging |
| Micrometer | Latest | Metrics & Observability |
### Testing
| Framework | Purpose |
|-----------|---------|
| JUnit 5 | Unit Testing |
| Mockito | Mocking Framework |
| Spring Security Test | Security Testing |
| Kafka Test | Kafka Integration Testing |
| RabbitMQ Test | RabbitMQ Integration Testing |
### Containerization & CI/CD
| Tool | Purpose |
|------|---------|
| Docker | Containerization |
| Docker Compose | Multi-Container Orchestration |
| GitHub Actions | CI/CD Pipeline |
---
## ๐ Prerequisites
- **JDK 25** or higher
- **Apache Maven 3.9.x** or higher
- **Docker** and **Docker Compose** (for containerized deployment)
- **PostgreSQL 17** (if running locally without Docker)
- **Redis 7** (if running locally without Docker)
- **Tesseract OCR** installed locally for OCR processing (optional)
- API keys for desired integrations:
- Payment gateways (Stripe, Paystack)
- Email services (Mailgun, AWS SES)
- Cloud storage (AWS S3, Cloudinary)
- Google Cloud Vision API
---
## ๐ Getting Started
### Option 1: Docker Compose (Recommended)
```bash
# 1. Clone the repository
git clone https://github.com/Brints/unraveldocs-api.git
cd unraveldocs-api
# 2. Copy environment template
cp .env.example .env
# 3. Configure your environment variables
# Edit .env with your credentials
# 4. Start all services
docker-compose up -d
# 5. View logs
docker-compose logs -f unraveldocs-api
```
### Option 2: Local Development
```bash
# 1. Clone the repository
git clone https://github.com/Brints/unraveldocs-api.git
cd unraveldocs-api
# 2. Start required services (PostgreSQL, Redis)
docker run --name postgres-unraveldocs -p 5432:5432 \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=unraveldocs \
-d postgres:17
docker run --name redis-unraveldocs -p 6379:6379 -d redis:7-alpine
# 3. Configure application properties
# Edit src/main/resources/application.properties or use environment variables
# 4. Build the project
mvn clean install
# 5. Run the application
mvn spring-boot:run
# 6. Access the application
# http://localhost:8080/unraveldocs
```
---
## โ๏ธ Configuration
### Environment Variables
Create a `.env` file from the template:
```bash
cp .env.example .env
```
#### Key Configuration Sections
| Section | Description |
|---------|-------------|
| Application | Base URLs, support email, frontend URL |
| Database | PostgreSQL connection details |
| Redis | Cache configuration |
| RabbitMQ | Message broker settings |
| Kafka | Stream processing configuration |
| AWS | S3, SES, SNS credentials |
| JWT | Token secrets and expiration |
| Mailgun | Email service credentials |
| Cloudinary | Image CDN configuration |
| Twilio | SMS/Voice settings |
| Stripe | Payment gateway credentials |
| Paystack | African payment gateway |
| Google Cloud | Vision API credentials |
| Elasticsearch | Search engine configuration |
### Application Properties
```properties
# Server Configuration
server.port=8080
server.servlet.context-path=/unraveldocs
# Database Configuration
spring.datasource.url=jdbc:postgresql://localhost:5432/unraveldocs
spring.datasource.username=postgres
spring.datasource.password=postgres
spring.jpa.hibernate.ddl-auto=validate
# JWT Configuration
jwt.secret=your-very-strong-jwt-secret-key
jwt.expiration.ms=86400000
# Flyway Migration
spring.flyway.enabled=true
spring.flyway.locations=classpath:db/migration
```
---
## ๐ณ Docker Deployment
### Services Overview
| Service | Port(s) | Description |
|---------|---------|-------------|
| unraveldocs-api | 8080 | Main application |
| postgres | 5432 | Primary database |
| redis | 6379 | Cache & sessions |
| rabbitmq | 5672, 15672 | Message broker |
| kafka | 9092 | Stream processing |
| kafka-ui | 8090 | Kafka dashboard |
| elasticsearch | 9200, 9300 | Search engine |
| kibana | 5601 | ES dashboard |
| localstack | 4566 | AWS local emulation |
### Docker Commands
```bash
# Start all services
docker-compose up -d
# Start specific services
docker-compose up -d postgres redis
# Stop all services
docker-compose down
# View logs
docker-compose logs -f [service-name]
# Rebuild application
docker-compose build unraveldocs-api
docker-compose up -d unraveldocs-api
# Remove volumes (clean slate)
docker-compose down -v
```
---
## ๐ API Documentation
### Swagger UI
Once the application is running, access the interactive API documentation:
- **Swagger UI**: http://localhost:8080/unraveldocs/swagger-ui.html
- **OpenAPI Spec**: http://localhost:8080/unraveldocs/v3/api-docs
### API Endpoints Overview
| Category | Base Path | Description |
|----------|-----------|-------------|
| Plans | `/api/v1/plans` | Plan pricing & currency conversion (public) |
| Auth | `/api/v1/auth` | Authentication & registration |
| Users | `/api/v1/users` | User management |
| Teams | `/api/v1/teams` | Team subscriptions & member management |
| Organizations | `/api/v1/organizations` | Enterprise organization management |
| Documents | `/api/v1/documents` | Document operations |
| OCR | `/api/v1/ocr` | OCR processing |
| Payments | `/api/v1/payments` | Payment operations |
| Stripe | `/api/v1/stripe` | Stripe-specific endpoints |
| Paystack | `/api/v1/paystack` | Paystack-specific endpoints |
| Subscriptions | `/api/v1/subscriptions` | Individual subscription management |
| Storage | `/api/v1/storage` | Storage usage and limits |
| Admin | `/api/v1/admin` | Administrative operations |
| Search | `/api/v1/search` | Elasticsearch queries |
---
## ๐งช Testing
### Run All Tests
```bash
mvn test
```
### Run Specific Test Class
```bash
mvn test -Dtest=DocumentServiceTest
```
### Run Specific Test Method
```bash
mvn test -Dtest=FileProcessingServiceTest#testProcessSingleFile
```
### Generate Coverage Report
```bash
mvn clean test jacoco:report
```
Coverage report will be available at `target/site/jacoco/index.html`
### Integration Tests
```bash
mvn verify -P integration-tests
```
---
## ๐ CI/CD Pipeline
The project uses **GitHub Actions** for continuous integration and deployment:
### Workflows
| Workflow | Trigger | Purpose |
|----------|---------|---------|
| `test.yml` | Push/PR to main | Run tests & build |
| `linting.yml` | Push/PR | Code style checks |
| `security.yml` | Push/PR | Security scanning |
| `deploy.yml` | Push to main | Deploy to staging/prod |
| `release.yml` | Tag creation | Create releases |
| `flyway.yml` | Manual | Database migrations |
### Pipeline Features
- Automated testing on every push
- Code quality checks (Checkstyle, SpotBugs)
- Security vulnerability scanning
- JaCoCo test coverage reporting
- Docker image building and pushing
- Automated deployments
---
## ๐ Project Structure
```
unraveldocs-api/
โโโ .github/
โ โโโ workflows/ # CI/CD pipeline definitions
โ โโโ scripts/ # Automation scripts
โโโ src/
โ โโโ main/
โ โ โโโ java/com/extractor/unraveldocs/
โ โ โ โโโ admin/ # Admin management
โ โ โ โโโ auth/ # Authentication & authorization
โ โ โ โโโ brokers/ # Message broker integrations
โ โ โ โโโ config/ # Application configurations
โ โ โ โโโ documents/ # Document management
โ โ โ โโโ elasticsearch/ # Search functionality
โ โ โ โโโ exceptions/ # Custom exceptions & handlers
โ โ โ โโโ googlevision/ # Google Cloud Vision integration
โ โ โ โโโ loginattempts/ # Login attempt tracking
โ โ โ โโโ messaging/ # Email & notification services
โ โ โ โโโ ocrprocessing/ # OCR processing services
โ โ โ โโโ organization/ # Enterprise organization management
โ โ โ โ โโโ controller/ # REST endpoints
โ โ โ โ โโโ dto/ # Request/response DTOs
โ โ โ โ โโโ impl/ # Service implementations
โ โ โ โ โโโ model/ # Entity models
โ โ โ โ โโโ repository/ # Data repositories
โ โ โ โโโ team/ # Team subscription management
โ โ โ โ โโโ controller/ # Team REST endpoints
โ โ โ โ โโโ dto/ # Team request/response DTOs
โ โ โ โ โโโ impl/ # Team service implementations
โ โ โ โ โโโ model/ # Team entity models
โ โ โ โ โโโ repository/ # Team data repositories
โ โ โ โ โโโ service/ # Team service interfaces
โ โ โ โโโ payment/ # Payment gateway integrations
โ โ โ โ โโโ common/ # Shared payment utilities
โ โ โ โ โโโ stripe/ # Stripe integration
โ โ โ โ โโโ paystack/ # Paystack integration
โ โ โ โ โโโ paypal/ # PayPal stub
โ โ โ โ โโโ flutterwave/ # Flutterwave stub
โ โ โ โ โโโ chappa/ # Chappa stub
โ โ โ โ โโโ receipt/ # Receipt generation
โ โ โ โโโ pushnotification/ # Push notification services
โ โ โ โโโ security/ # Security configurations
โ โ โ โโโ shared/ # Shared utilities & DTOs
โ โ โ โโโ storage/ # Storage allocation tracking
โ โ โ โโโ subscription/ # Subscription management
โ โ โ โโโ user/ # User management
โ โ โ โโโ utils/ # Common utilities
โ โ โ โโโ wordexport/ # Word document export
โ โ โโโ resources/
โ โ โโโ db/migration/ # Flyway migrations
โ โ โโโ templates/ # Email templates (Thymeleaf)
โ โ โโโ application.properties
โ โโโ test/ # Test sources
โโโ docker-compose.yml # Multi-container setup
โโโ Dockerfile # Application container
โโโ pom.xml # Maven dependencies
โโโ .env.example # Environment template
โโโ README.md
```
---
## ๐ค Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
### Code Style
- Follow Java coding conventions
- Use meaningful variable and method names
- Write comprehensive unit tests
- Document public APIs with Javadoc
---
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
## ๐ Support
- **Email**: support@unraveldocs.xyz
- **Issues**: [GitHub Issues](https://github.com/Brints/unraveldocs-api/issues)
---
Made with โค๏ธ by the UnravelDocs Team