{"id":31663954,"url":"https://github.com/matheus-rech/clinical-extraction-system","last_synced_at":"2026-05-07T13:05:19.170Z","repository":{"id":317358302,"uuid":"1067059716","full_name":"matheus-rech/clinical-extraction-system","owner":"matheus-rech","description":"Clinical Study Extraction System - Interactive PDF annotation and data extraction with full traceability. TypeScript + Vite + PDF.js. 100% test coverage.","archived":false,"fork":false,"pushed_at":"2025-09-30T10:46:43.000Z","size":15322,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-09-30T12:31:23.319Z","etag":null,"topics":["clinical-research","data-extraction","pdf","pdf-annotation","playwright","testing","typescript","vite"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/matheus-rech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-30T10:26:39.000Z","updated_at":"2025-09-30T10:46:39.000Z","dependencies_parsed_at":"2025-09-30T12:41:34.252Z","dependency_job_id":null,"html_url":"https://github.com/matheus-rech/clinical-extraction-system","commit_stats":null,"previous_names":["matheus-rech/clinical-extraction-system"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/matheus-rech/clinical-extraction-system","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matheus-rech%2Fclinical-extraction-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matheus-rech%2Fclinical-extraction-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matheus-rech%2Fclinical-extraction-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matheus-rech%2Fclinical-extraction-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/matheus-rech","download_url":"https://codeload.github.com/matheus-rech/clinical-extraction-system/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matheus-rech%2Fclinical-extraction-system/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278846316,"owners_count":26056090,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clinical-research","data-extraction","pdf","pdf-annotation","playwright","testing","typescript","vite"],"created_at":"2025-10-07T20:52:31.570Z","updated_at":"2025-10-07T20:52:32.902Z","avatar_url":"https://github.com/matheus-rech.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Clinical Study Extraction System\n\n[![CI/CD](https://github.com/matheus-rech/clinical-extraction-system/actions/workflows/ci.yml/badge.svg)](https://github.com/matheus-rech/clinical-extraction-system/actions)\n[![Tests](https://img.shields.io/badge/tests-52%2F52%20passing-brightgreen)](https://github.com/matheus-rech/clinical-extraction-system)\n[![TypeScript](https://img.shields.io/badge/TypeScript-5.7-blue)](https://www.typescriptlang.org/)\n[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n\nInteractive PDF annotation and clinical data extraction tool with full traceability. Built with TypeScript, Vite, and PDF.js.\n\n![Clinical Extraction System](https://img.shields.io/badge/Status-Production%20Ready-brightgreen)\n\n---\n\n## ✨ Features\n\n### 📄 PDF Integration\n- **Interactive PDF Viewer** with zoom, navigation, and fit-to-width\n- **Text Selection \u0026 Extraction** with precise coordinate tracking\n- **Annotation Markers** showing extraction locations\n- **Multi-page Support** with keyboard shortcuts\n\n### 📝 Smart Forms\n- **8-Step Multi-Section Form** for comprehensive data collection\n- **Dynamic Field Generation** for flexible data entry\n- **Real-time Validation** with accessibility support\n- **Auto-advance** between fields for efficient workflow\n\n### 🔍 Advanced Search\n- **Markdown File Support** for reference materials\n- **Full-text Search** across PDF documents\n- **Search Result Highlighting** with visual markers\n- **Context Preview** for quick verification\n\n### 📊 Data Management\n- **Real-time Extraction Tracking** with complete audit trail\n- **Multiple Export Formats** (JSON, CSV, Audit Reports)\n- **State Persistence** via localStorage\n- **Coordinate Tracking** for full traceability\n\n### ♿ Accessibility\n- **WCAG 2.1 Compliant** with ARIA labels\n- **Keyboard Navigation** throughout the application\n- **Screen Reader Support** for all interactive elements\n- **Semantic HTML** structure\n\n---\n\n## 🚀 Quick Start\n\n### Prerequisites\n- Node.js 18+ \n- npm or yarn\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/matheus-rech/clinical-extraction-system.git\ncd clinical-extraction-system\n\n# Install dependencies\nnpm install\n\n# Start development server\nnpm run dev\n```\n\nOpen [http://localhost:3000](http://localhost:3000) in your browser.\n\n### Using with Your PDFs\n\n1. Place PDF files in the `PDFs/` directory\n2. Click \"📄 Load PDF\" in the application\n3. Select a form field\n4. Highlight text in the PDF to extract\n5. Export your data when complete\n\n---\n\n## 🧪 Testing\n\n**100% Test Coverage** - All tests passing!\n\n```bash\n# Run unit tests (28 tests)\nnpm run test\nnpm run test:ui      # Interactive UI\n\n# Run E2E tests (24 tests)\nnpm run test:e2e\nnpm run test:e2e:ui  # Interactive UI\n\n# Run all tests\nnpm run test:all\n```\n\n### Test Coverage\n- ✅ **Unit Tests**: 28/28 passing\n  - AppState management\n  - Security utilities\n  - Extraction tracking\n- ✅ **E2E Tests**: 24/24 passing\n  - PDF upload \u0026 rendering\n  - Form navigation \u0026 validation\n  - Data extraction workflow\n  - Export functionality\n\n---\n\n## 📦 Build \u0026 Deploy\n\n### Build for Production\n```bash\nnpm run build\n# Output: dist/ directory\n```\n\n### Preview Production Build\n```bash\nnpm run preview\n```\n\n### Deploy\n\n[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/matheus-rech/clinical-extraction-system)\n\nSee [DEPLOYMENT_GUIDE.md](DEPLOYMENT_GUIDE.md) for detailed instructions on:\n- Vercel (1-click deploy)\n- Netlify\n- GitHub Pages\n- Custom hosting\n\n---\n\n## 🏗️ Architecture\n\n### Tech Stack\n- **Frontend**: TypeScript, Vite, SCSS\n- **PDF**: PDF.js (bundled, not CDN)\n- **Testing**: Vitest (unit) + Playwright (E2E)\n- **Backend** (optional): Convex for data persistence\n\n### Project Structure\n```\nclinical-extraction-system/\n├── src/\n│   ├── core/              # Core utilities\n│   │   ├── AppState.ts    # State management\n│   │   ├── ErrorHandler.ts\n│   │   └── SecurityUtils.ts\n│   ├── modules/\n│   │   ├── pdf/           # PDF handling\n│   │   ├── form/          # Form management\n│   │   ├── extraction/    # Data extraction\n│   │   └── export/        # Export functionality\n│   ├── styles/            # SCSS stylesheets\n│   └── types/             # TypeScript definitions\n├── tests/\n│   ├── unit/              # Unit tests\n│   └── e2e/               # E2E tests\n├── convex/                # Backend (optional)\n└── PDFs/                  # Sample PDFs for testing\n```\n\n### Key Modules\n\n#### PDF Module\n- `PDFLoader.ts` - Handles PDF file loading and initialization\n- `PDFRenderer.ts` - Renders PDF pages with text layers\n- `TextSelection.ts` - Manages text selection and extraction\n- `PDFSearch.ts` - Full-text search functionality\n\n#### Form Module\n- `FormManager.ts` - Multi-step form navigation\n- `FormValidator.ts` - Input validation\n- `DynamicFields.ts` - Dynamic field generation\n\n#### Extraction Module\n- `ExtractionTracker.ts` - Tracks all extractions with coordinates\n- Provides complete audit trail\n- localStorage persistence\n\n---\n\n## 🔧 Configuration\n\n### PDF.js Configuration\nLocated in `src/config/pdf.config.ts`:\n- Worker source (bundled)\n- CMap URL for character encoding\n- Document options\n\n### Security\n- Content Security Policy configured\n- Input sanitization\n- XSS protection\n- CSRF considerations\n\n---\n\n## 📖 Documentation\n\n- [Deployment Guide](DEPLOYMENT_GUIDE.md) - Deploy to various platforms\n- [Quick Start](QUICKSTART.md) - Get started quickly\n- [Production Checklist](PRODUCTION_CHECKLIST.md) - Pre-deployment checklist\n- [Supabase Integration](docs/SUPABASE_INTEGRATION.md) - Supabase backend setup\n- [Supabase Migration Status](SUPABASE_MIGRATION_STATUS.md) - Migration analysis and recommendations\n- [API Documentation](convex/README.md) - Convex backend API\n\n---\n\n## 🤝 Contributing\n\nContributions are welcome! Please:\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n### Development Guidelines\n- Write tests for new features\n- Follow TypeScript best practices\n- Maintain accessibility standards\n- Update documentation\n\n---\n\n## 📊 Use Cases\n\n### Clinical Research\n- Extract study metadata from PDFs\n- Track PICO-T criteria\n- Collect baseline demographics\n- Document interventions and outcomes\n\n### Systematic Reviews\n- Standardized data extraction\n- Multiple reviewer workflow\n- Complete audit trail\n- Export for meta-analysis\n\n### General PDF Data Extraction\n- Form filling from PDF documents\n- Data validation and cleaning\n- Multi-format export\n- Traceability requirements\n\n---\n\n## 🔒 Security\n\n- ✅ Client-side processing (no server upload)\n- ✅ Input sanitization\n- ✅ Content Security Policy\n- ✅ XSS protection\n- ✅ Type-safe operations\n\nFor security concerns, please email: [Your security email]\n\n---\n\n## 📝 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## 🙏 Acknowledgments\n\n- **PDF.js** - Mozilla's PDF rendering library\n- **Vite** - Next generation frontend tooling\n- **Playwright** - End-to-end testing\n- **Convex** - Backend infrastructure\n\n---\n\n## 📧 Contact\n\n- **GitHub**: [@matheus-rech](https://github.com/matheus-rech)\n- **Repository**: [clinical-extraction-system](https://github.com/matheus-rech/clinical-extraction-system)\n- **Issues**: [Report a bug](https://github.com/matheus-rech/clinical-extraction-system/issues)\n\n---\n\n## 🌟 Star History\n\nIf you find this project useful, please consider giving it a star ⭐\n\n[![Star History Chart](https://api.star-history.com/svg?repos=matheus-rech/clinical-extraction-system\u0026type=Date)](https://star-history.com/#matheus-rech/clinical-extraction-system\u0026Date)\n\n---\n\n**Made with ❤️ for clinical researchers and data scientists**","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatheus-rech%2Fclinical-extraction-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmatheus-rech%2Fclinical-extraction-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatheus-rech%2Fclinical-extraction-system/lists"}