https://github.com/matheus-rech/clinical-extraction-system
Clinical Study Extraction System - Interactive PDF annotation and data extraction with full traceability. TypeScript + Vite + PDF.js. 100% test coverage.
https://github.com/matheus-rech/clinical-extraction-system
clinical-research data-extraction pdf pdf-annotation playwright testing typescript vite
Last synced: about 2 months ago
JSON representation
Clinical Study Extraction System - Interactive PDF annotation and data extraction with full traceability. TypeScript + Vite + PDF.js. 100% test coverage.
- Host: GitHub
- URL: https://github.com/matheus-rech/clinical-extraction-system
- Owner: matheus-rech
- License: mit
- Created: 2025-09-30T10:26:39.000Z (9 months ago)
- Default Branch: master
- Last Pushed: 2025-09-30T10:46:43.000Z (9 months ago)
- Last Synced: 2025-09-30T12:31:23.319Z (9 months ago)
- Topics: clinical-research, data-extraction, pdf, pdf-annotation, playwright, testing, typescript, vite
- Language: TypeScript
- Size: 14.6 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Clinical Study Extraction System
[](https://github.com/matheus-rech/clinical-extraction-system/actions)
[](https://github.com/matheus-rech/clinical-extraction-system)
[](https://www.typescriptlang.org/)
[](LICENSE)
Interactive PDF annotation and clinical data extraction tool with full traceability. Built with TypeScript, Vite, and PDF.js.

---
## โจ Features
### ๐ PDF Integration
- **Interactive PDF Viewer** with zoom, navigation, and fit-to-width
- **Text Selection & Extraction** with precise coordinate tracking
- **Annotation Markers** showing extraction locations
- **Multi-page Support** with keyboard shortcuts
### ๐ Smart Forms
- **8-Step Multi-Section Form** for comprehensive data collection
- **Dynamic Field Generation** for flexible data entry
- **Real-time Validation** with accessibility support
- **Auto-advance** between fields for efficient workflow
### ๐ Advanced Search
- **Markdown File Support** for reference materials
- **Full-text Search** across PDF documents
- **Search Result Highlighting** with visual markers
- **Context Preview** for quick verification
### ๐ Data Management
- **Real-time Extraction Tracking** with complete audit trail
- **Multiple Export Formats** (JSON, CSV, Audit Reports)
- **State Persistence** via localStorage
- **Coordinate Tracking** for full traceability
### โฟ Accessibility
- **WCAG 2.1 Compliant** with ARIA labels
- **Keyboard Navigation** throughout the application
- **Screen Reader Support** for all interactive elements
- **Semantic HTML** structure
---
## ๐ Quick Start
### Prerequisites
- Node.js 18+
- npm or yarn
### Installation
```bash
# Clone the repository
git clone https://github.com/matheus-rech/clinical-extraction-system.git
cd clinical-extraction-system
# Install dependencies
npm install
# Start development server
npm run dev
```
Open [http://localhost:3000](http://localhost:3000) in your browser.
### Using with Your PDFs
1. Place PDF files in the `PDFs/` directory
2. Click "๐ Load PDF" in the application
3. Select a form field
4. Highlight text in the PDF to extract
5. Export your data when complete
---
## ๐งช Testing
**100% Test Coverage** - All tests passing!
```bash
# Run unit tests (28 tests)
npm run test
npm run test:ui # Interactive UI
# Run E2E tests (24 tests)
npm run test:e2e
npm run test:e2e:ui # Interactive UI
# Run all tests
npm run test:all
```
### Test Coverage
- โ
**Unit Tests**: 28/28 passing
- AppState management
- Security utilities
- Extraction tracking
- โ
**E2E Tests**: 24/24 passing
- PDF upload & rendering
- Form navigation & validation
- Data extraction workflow
- Export functionality
---
## ๐ฆ Build & Deploy
### Build for Production
```bash
npm run build
# Output: dist/ directory
```
### Preview Production Build
```bash
npm run preview
```
### Deploy
[](https://vercel.com/new/clone?repository-url=https://github.com/matheus-rech/clinical-extraction-system)
See [DEPLOYMENT_GUIDE.md](DEPLOYMENT_GUIDE.md) for detailed instructions on:
- Vercel (1-click deploy)
- Netlify
- GitHub Pages
- Custom hosting
---
## ๐๏ธ Architecture
### Tech Stack
- **Frontend**: TypeScript, Vite, SCSS
- **PDF**: PDF.js (bundled, not CDN)
- **Testing**: Vitest (unit) + Playwright (E2E)
- **Backend** (optional): Convex for data persistence
### Project Structure
```
clinical-extraction-system/
โโโ src/
โ โโโ core/ # Core utilities
โ โ โโโ AppState.ts # State management
โ โ โโโ ErrorHandler.ts
โ โ โโโ SecurityUtils.ts
โ โโโ modules/
โ โ โโโ pdf/ # PDF handling
โ โ โโโ form/ # Form management
โ โ โโโ extraction/ # Data extraction
โ โ โโโ export/ # Export functionality
โ โโโ styles/ # SCSS stylesheets
โ โโโ types/ # TypeScript definitions
โโโ tests/
โ โโโ unit/ # Unit tests
โ โโโ e2e/ # E2E tests
โโโ convex/ # Backend (optional)
โโโ PDFs/ # Sample PDFs for testing
```
### Key Modules
#### PDF Module
- `PDFLoader.ts` - Handles PDF file loading and initialization
- `PDFRenderer.ts` - Renders PDF pages with text layers
- `TextSelection.ts` - Manages text selection and extraction
- `PDFSearch.ts` - Full-text search functionality
#### Form Module
- `FormManager.ts` - Multi-step form navigation
- `FormValidator.ts` - Input validation
- `DynamicFields.ts` - Dynamic field generation
#### Extraction Module
- `ExtractionTracker.ts` - Tracks all extractions with coordinates
- Provides complete audit trail
- localStorage persistence
---
## ๐ง Configuration
### PDF.js Configuration
Located in `src/config/pdf.config.ts`:
- Worker source (bundled)
- CMap URL for character encoding
- Document options
### Security
- Content Security Policy configured
- Input sanitization
- XSS protection
- CSRF considerations
---
## ๐ Documentation
- [Deployment Guide](DEPLOYMENT_GUIDE.md) - Deploy to various platforms
- [Quick Start](QUICKSTART.md) - Get started quickly
- [Production Checklist](PRODUCTION_CHECKLIST.md) - Pre-deployment checklist
- [Supabase Integration](docs/SUPABASE_INTEGRATION.md) - Supabase backend setup
- [Supabase Migration Status](SUPABASE_MIGRATION_STATUS.md) - Migration analysis and recommendations
- [API Documentation](convex/README.md) - Convex backend API
---
## ๐ค Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
### Development Guidelines
- Write tests for new features
- Follow TypeScript best practices
- Maintain accessibility standards
- Update documentation
---
## ๐ Use Cases
### Clinical Research
- Extract study metadata from PDFs
- Track PICO-T criteria
- Collect baseline demographics
- Document interventions and outcomes
### Systematic Reviews
- Standardized data extraction
- Multiple reviewer workflow
- Complete audit trail
- Export for meta-analysis
### General PDF Data Extraction
- Form filling from PDF documents
- Data validation and cleaning
- Multi-format export
- Traceability requirements
---
## ๐ Security
- โ
Client-side processing (no server upload)
- โ
Input sanitization
- โ
Content Security Policy
- โ
XSS protection
- โ
Type-safe operations
For security concerns, please email: [Your security email]
---
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
## ๐ Acknowledgments
- **PDF.js** - Mozilla's PDF rendering library
- **Vite** - Next generation frontend tooling
- **Playwright** - End-to-end testing
- **Convex** - Backend infrastructure
---
## ๐ง Contact
- **GitHub**: [@matheus-rech](https://github.com/matheus-rech)
- **Repository**: [clinical-extraction-system](https://github.com/matheus-rech/clinical-extraction-system)
- **Issues**: [Report a bug](https://github.com/matheus-rech/clinical-extraction-system/issues)
---
## ๐ Star History
If you find this project useful, please consider giving it a star โญ
[](https://star-history.com/#matheus-rech/clinical-extraction-system&Date)
---
**Made with โค๏ธ for clinical researchers and data scientists**