An open API service indexing awesome lists of open source software.

https://github.com/matheus-rech/clinical-extraction-system

Clinical Study Extraction System - Interactive PDF annotation and data extraction with full traceability. TypeScript + Vite + PDF.js. 100% test coverage.
https://github.com/matheus-rech/clinical-extraction-system

clinical-research data-extraction pdf pdf-annotation playwright testing typescript vite

Last synced: about 2 months ago
JSON representation

Clinical Study Extraction System - Interactive PDF annotation and data extraction with full traceability. TypeScript + Vite + PDF.js. 100% test coverage.

Awesome Lists containing this project

README

          

# Clinical Study Extraction System

[![CI/CD](https://github.com/matheus-rech/clinical-extraction-system/actions/workflows/ci.yml/badge.svg)](https://github.com/matheus-rech/clinical-extraction-system/actions)
[![Tests](https://img.shields.io/badge/tests-52%2F52%20passing-brightgreen)](https://github.com/matheus-rech/clinical-extraction-system)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.7-blue)](https://www.typescriptlang.org/)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)

Interactive PDF annotation and clinical data extraction tool with full traceability. Built with TypeScript, Vite, and PDF.js.

![Clinical Extraction System](https://img.shields.io/badge/Status-Production%20Ready-brightgreen)

---

## โœจ Features

### ๐Ÿ“„ PDF Integration
- **Interactive PDF Viewer** with zoom, navigation, and fit-to-width
- **Text Selection & Extraction** with precise coordinate tracking
- **Annotation Markers** showing extraction locations
- **Multi-page Support** with keyboard shortcuts

### ๐Ÿ“ Smart Forms
- **8-Step Multi-Section Form** for comprehensive data collection
- **Dynamic Field Generation** for flexible data entry
- **Real-time Validation** with accessibility support
- **Auto-advance** between fields for efficient workflow

### ๐Ÿ” Advanced Search
- **Markdown File Support** for reference materials
- **Full-text Search** across PDF documents
- **Search Result Highlighting** with visual markers
- **Context Preview** for quick verification

### ๐Ÿ“Š Data Management
- **Real-time Extraction Tracking** with complete audit trail
- **Multiple Export Formats** (JSON, CSV, Audit Reports)
- **State Persistence** via localStorage
- **Coordinate Tracking** for full traceability

### โ™ฟ Accessibility
- **WCAG 2.1 Compliant** with ARIA labels
- **Keyboard Navigation** throughout the application
- **Screen Reader Support** for all interactive elements
- **Semantic HTML** structure

---

## ๐Ÿš€ Quick Start

### Prerequisites
- Node.js 18+
- npm or yarn

### Installation

```bash
# Clone the repository
git clone https://github.com/matheus-rech/clinical-extraction-system.git
cd clinical-extraction-system

# Install dependencies
npm install

# Start development server
npm run dev
```

Open [http://localhost:3000](http://localhost:3000) in your browser.

### Using with Your PDFs

1. Place PDF files in the `PDFs/` directory
2. Click "๐Ÿ“„ Load PDF" in the application
3. Select a form field
4. Highlight text in the PDF to extract
5. Export your data when complete

---

## ๐Ÿงช Testing

**100% Test Coverage** - All tests passing!

```bash
# Run unit tests (28 tests)
npm run test
npm run test:ui # Interactive UI

# Run E2E tests (24 tests)
npm run test:e2e
npm run test:e2e:ui # Interactive UI

# Run all tests
npm run test:all
```

### Test Coverage
- โœ… **Unit Tests**: 28/28 passing
- AppState management
- Security utilities
- Extraction tracking
- โœ… **E2E Tests**: 24/24 passing
- PDF upload & rendering
- Form navigation & validation
- Data extraction workflow
- Export functionality

---

## ๐Ÿ“ฆ Build & Deploy

### Build for Production
```bash
npm run build
# Output: dist/ directory
```

### Preview Production Build
```bash
npm run preview
```

### Deploy

[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/matheus-rech/clinical-extraction-system)

See [DEPLOYMENT_GUIDE.md](DEPLOYMENT_GUIDE.md) for detailed instructions on:
- Vercel (1-click deploy)
- Netlify
- GitHub Pages
- Custom hosting

---

## ๐Ÿ—๏ธ Architecture

### Tech Stack
- **Frontend**: TypeScript, Vite, SCSS
- **PDF**: PDF.js (bundled, not CDN)
- **Testing**: Vitest (unit) + Playwright (E2E)
- **Backend** (optional): Convex for data persistence

### Project Structure
```
clinical-extraction-system/
โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ core/ # Core utilities
โ”‚ โ”‚ โ”œโ”€โ”€ AppState.ts # State management
โ”‚ โ”‚ โ”œโ”€โ”€ ErrorHandler.ts
โ”‚ โ”‚ โ””โ”€โ”€ SecurityUtils.ts
โ”‚ โ”œโ”€โ”€ modules/
โ”‚ โ”‚ โ”œโ”€โ”€ pdf/ # PDF handling
โ”‚ โ”‚ โ”œโ”€โ”€ form/ # Form management
โ”‚ โ”‚ โ”œโ”€โ”€ extraction/ # Data extraction
โ”‚ โ”‚ โ””โ”€โ”€ export/ # Export functionality
โ”‚ โ”œโ”€โ”€ styles/ # SCSS stylesheets
โ”‚ โ””โ”€โ”€ types/ # TypeScript definitions
โ”œโ”€โ”€ tests/
โ”‚ โ”œโ”€โ”€ unit/ # Unit tests
โ”‚ โ””โ”€โ”€ e2e/ # E2E tests
โ”œโ”€โ”€ convex/ # Backend (optional)
โ””โ”€โ”€ PDFs/ # Sample PDFs for testing
```

### Key Modules

#### PDF Module
- `PDFLoader.ts` - Handles PDF file loading and initialization
- `PDFRenderer.ts` - Renders PDF pages with text layers
- `TextSelection.ts` - Manages text selection and extraction
- `PDFSearch.ts` - Full-text search functionality

#### Form Module
- `FormManager.ts` - Multi-step form navigation
- `FormValidator.ts` - Input validation
- `DynamicFields.ts` - Dynamic field generation

#### Extraction Module
- `ExtractionTracker.ts` - Tracks all extractions with coordinates
- Provides complete audit trail
- localStorage persistence

---

## ๐Ÿ”ง Configuration

### PDF.js Configuration
Located in `src/config/pdf.config.ts`:
- Worker source (bundled)
- CMap URL for character encoding
- Document options

### Security
- Content Security Policy configured
- Input sanitization
- XSS protection
- CSRF considerations

---

## ๐Ÿ“– Documentation

- [Deployment Guide](DEPLOYMENT_GUIDE.md) - Deploy to various platforms
- [Quick Start](QUICKSTART.md) - Get started quickly
- [Production Checklist](PRODUCTION_CHECKLIST.md) - Pre-deployment checklist
- [Supabase Integration](docs/SUPABASE_INTEGRATION.md) - Supabase backend setup
- [Supabase Migration Status](SUPABASE_MIGRATION_STATUS.md) - Migration analysis and recommendations
- [API Documentation](convex/README.md) - Convex backend API

---

## ๐Ÿค Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

### Development Guidelines
- Write tests for new features
- Follow TypeScript best practices
- Maintain accessibility standards
- Update documentation

---

## ๐Ÿ“Š Use Cases

### Clinical Research
- Extract study metadata from PDFs
- Track PICO-T criteria
- Collect baseline demographics
- Document interventions and outcomes

### Systematic Reviews
- Standardized data extraction
- Multiple reviewer workflow
- Complete audit trail
- Export for meta-analysis

### General PDF Data Extraction
- Form filling from PDF documents
- Data validation and cleaning
- Multi-format export
- Traceability requirements

---

## ๐Ÿ”’ Security

- โœ… Client-side processing (no server upload)
- โœ… Input sanitization
- โœ… Content Security Policy
- โœ… XSS protection
- โœ… Type-safe operations

For security concerns, please email: [Your security email]

---

## ๐Ÿ“ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## ๐Ÿ™ Acknowledgments

- **PDF.js** - Mozilla's PDF rendering library
- **Vite** - Next generation frontend tooling
- **Playwright** - End-to-end testing
- **Convex** - Backend infrastructure

---

## ๐Ÿ“ง Contact

- **GitHub**: [@matheus-rech](https://github.com/matheus-rech)
- **Repository**: [clinical-extraction-system](https://github.com/matheus-rech/clinical-extraction-system)
- **Issues**: [Report a bug](https://github.com/matheus-rech/clinical-extraction-system/issues)

---

## ๐ŸŒŸ Star History

If you find this project useful, please consider giving it a star โญ

[![Star History Chart](https://api.star-history.com/svg?repos=matheus-rech/clinical-extraction-system&type=Date)](https://star-history.com/#matheus-rech/clinical-extraction-system&Date)

---

**Made with โค๏ธ for clinical researchers and data scientists**