{"id":50419052,"url":"https://github.com/michael-borck/document-lens-legacy","last_synced_at":"2026-05-31T07:30:53.129Z","repository":{"id":312625542,"uuid":"1047939732","full_name":"michael-borck/document-lens-legacy","owner":"michael-borck","description":"Analyzes text documents for readability, academic integrity, and linguistic insights via REST API.","archived":false,"fork":false,"pushed_at":"2026-04-28T09:07:18.000Z","size":19414,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-06T15:14:16.732Z","etag":null,"topics":["academic-integrity","api","docker","document-analysis","edtech","microservice","natural-language-processing","nlp","python","readability","rest-api","text-analysis"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michael-borck.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-31T15:15:03.000Z","updated_at":"2026-05-06T06:40:17.000Z","dependencies_parsed_at":null,"dependency_job_id":"3e14650f-0433-4a5c-9e75-61c25cd5ef81","html_url":"https://github.com/michael-borck/document-lens-legacy","commit_stats":null,"previous_names":["michael-borck/document-lens","michaelborck-education/document-lens"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/michael-borck/document-lens-legacy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-borck%2Fdocument-lens-legacy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-borck%2Fdocument-lens-legacy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-borck%2Fdocument-lens-legacy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-borck%2Fdocument-lens-legacy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michael-borck","download_url":"https://codeload.github.com/michael-borck/document-lens-legacy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-borck%2Fdocument-lens-legacy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33723548,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["academic-integrity","api","docker","document-analysis","edtech","microservice","natural-language-processing","nlp","python","readability","rest-api","text-analysis"],"created_at":"2026-05-31T07:30:52.351Z","updated_at":"2026-05-31T07:30:53.123Z","avatar_url":"https://github.com/michael-borck.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DocumentLens\n\n\u003c!-- BADGES:START --\u003e\n[![edtech](https://img.shields.io/badge/-edtech-4caf50?style=flat-square)](https://github.com/topics/edtech) [![academic-integrity](https://img.shields.io/badge/-academic--integrity-blue?style=flat-square)](https://github.com/topics/academic-integrity) [![api](https://img.shields.io/badge/-api-blue?style=flat-square)](https://github.com/topics/api) [![docker](https://img.shields.io/badge/-docker-2496ed?style=flat-square)](https://github.com/topics/docker) [![document-analysis](https://img.shields.io/badge/-document--analysis-blue?style=flat-square)](https://github.com/topics/document-analysis) [![microservice](https://img.shields.io/badge/-microservice-blue?style=flat-square)](https://github.com/topics/microservice) [![natural-language-processing](https://img.shields.io/badge/-natural--language--processing-blue?style=flat-square)](https://github.com/topics/natural-language-processing) [![nlp](https://img.shields.io/badge/-nlp-blue?style=flat-square)](https://github.com/topics/nlp) [![python](https://img.shields.io/badge/-python-3776ab?style=flat-square)](https://github.com/topics/python) [![readability](https://img.shields.io/badge/-readability-blue?style=flat-square)](https://github.com/topics/readability)\n\u003c!-- BADGES:END --\u003e\n\n**Text Analysis \u0026 Academic Intelligence Microservice**\n\nTransform text content into actionable insights through comprehensive linguistic analysis, writing quality assessment, and academic integrity checking.\n\n## 🚀 Quick Start\n\n```bash\n# Docker deployment (recommended)\ndocker-compose up -d\n\n# Or raw deployment\n./deploy.sh\n\n# API available at: http://localhost:8002\n# Documentation: http://localhost:8002/docs\n```\n\n## 📊 API Endpoints\n\n### Core Analysis\n- `GET /health` - Service health check\n- `POST /text` - Text analysis (readability, quality, word frequency)\n- `POST /academic` - Academic analysis (citations, DOI resolution, integrity)\n- `POST /files` - File upload + analysis (PDF, DOCX, TXT, MD)\n\n### Advanced Text Analysis\n- `POST /advanced/ngrams` - N-gram extraction with optional filter terms\n- `POST /advanced/ner` - Named entity recognition\n- `POST /advanced/search/keywords` - Batch keyword search across multiple terms\n\n### Document Intelligence\n- `POST /files/infer-metadata` - Infer year, company, industry, document type from content\n- `POST /text/infer-metadata` - Metadata inference from raw text\n- Page-level text extraction (via `include_extracted_text=true` on `/files`)\n\n### Integration\n- Root endpoint: `GET /` - Service info and available endpoints\n- For presentations: Use [PresentationLens](https://github.com/michael-borck/presentation-lens)\n- For recordings: Use [RecordingLens](https://github.com/michael-borck/recording-lens)\n\n## 🎯 Use Cases\n\n- **Text Analysis**: Readability, writing quality, word frequency for any text content\n- **Academic Analysis**: Citation verification, DOI resolution, AI detection, integrity checking\n- **Document Intelligence**: Extract and analyze text from PDFs and Word documents\n- **Sustainability Research**: Batch keyword analysis for TCFD, GRI, SDGs, SASB frameworks\n- **Corporate Report Analysis**: Auto-detect metadata (year, company, industry) from annual reports\n- **Multi-Service Workflows**: Integrate with specialized analysis services\n\n### Desktop Application Support\nDocumentLens powers the **document-lens-desktop** Electron application for researchers analyzing corporate sustainability reports. Features include:\n- Smart metadata inference (company name, year, industry, document type)\n- Framework keyword analysis (TCFD, GRI, SDGs, SASB)\n- Batch processing with SQLite storage\n- Offline operation via bundled Python backend\n\n## 🏗️ Microservices Ecosystem\n\nDocumentLens is part of a focused microservices architecture:\n\n| Service | Purpose | Repository |\n|---------|---------|------------|\n| **DocumentLens** | Text analysis \u0026 academic intelligence | *This repo* |\n| **PresentationLens** | Presentation design \u0026 structure analysis | [presentation-lens](https://github.com/michael-borck/presentation-lens) |\n| **RecordingLens** | Student recordings (video/audio) analysis | [recording-lens](https://github.com/michael-borck/recording-lens) |\n| **CodeLens** | Source code quality \u0026 analysis | [code-lens](https://github.com/michael-borck/code-lens) |\n| **SubmissionLens** | Student submission router \u0026 frontend | [submission-lens](https://github.com/michael-borck/submission-lens) |\n\n### Integration Pattern\n```mermaid\ngraph LR\n    A[Student Submission] --\u003e B[SubmissionLens Frontend]\n    B --\u003e C{File Type Router}\n    C --\u003e|Text/PDF/DOCX| D[DocumentLens]\n    C --\u003e|PPTX| E[PresentationLens]\n    C --\u003e|Video/Audio| F[RecordingLens]\n    C --\u003e|Source Code| G[CodeLens]\n    E --\u003e D\n    F --\u003e D\n    G --\u003e D\n    D --\u003e H[Combined Feedback]\n    H --\u003e B\n    B --\u003e I[Student Dashboard]\n```\n\n## 🚀 Deployment\n\n### Docker Deployment (Recommended)\n```bash\ngit clone https://github.com/michael-borck/document-lens.git\ncd document-lens\ndocker-compose up -d  # Single container deployment\n```\n\n### Raw/Native Deployment\n```bash\ngit clone https://github.com/michael-borck/document-lens.git\ncd document-lens\n./deploy.sh  # Handles venv, dependencies, and production server\n```\n\n## 🧪 Testing\n\n```bash\n# Install dev dependencies\nuv sync --extra dev\n\n# Run all tests\nuv run pytest tests/ -v\n\n# Run specific test file\nuv run pytest tests/test_files.py -v\n\n# Run only PDF tests\nuv run pytest tests/ -m pdf -v\n\n# Skip slow tests\nuv run pytest tests/ -m \"not slow\" -v\n\n# Run with coverage report\nuv run pytest tests/\n```\n\n### Test Structure\n- `tests/conftest.py` - Shared fixtures and test client setup\n- `tests/test_health.py` - Health/smoke tests\n- `tests/test_text_analysis.py` - Text analysis endpoint tests\n- `tests/test_academic_analysis.py` - Academic analysis endpoint tests\n- `tests/test_files.py` - PDF file upload tests\n\n### Test Data\nPlace test files (PDF, DOCX, etc.) in the `test-data/` directory. The test suite automatically discovers and uses these files for parameterized tests.\n\n## 📚 Documentation\n\n- `DEPLOYMENT.md` - Deployment guide for Docker and raw installations\n- `DOCUMENTLENS_SETUP.md` - Setup and usage instructions\n- `.env.example` - Configuration template\n- `docs/` - Additional architecture and integration documentation\n\n---\n\n*DocumentLens: Pure text intelligence at the heart of content analysis*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichael-borck%2Fdocument-lens-legacy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichael-borck%2Fdocument-lens-legacy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichael-borck%2Fdocument-lens-legacy/lists"}