{"id":28503160,"url":"https://github.com/excoffierleonard/parser","last_synced_at":"2026-03-01T12:32:27.053Z","repository":{"id":269911076,"uuid":"908828856","full_name":"excoffierleonard/parser","owner":"excoffierleonard","description":"REST API service in Rust that takes in any file and returns its parsed content.","archived":false,"fork":false,"pushed_at":"2026-02-11T03:18:47.000Z","size":5666,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-11T08:10:23.476Z","etag":null,"topics":["api-rest","parser","rust"],"latest_commit_sha":null,"homepage":"https://parser.excoffierleonard.com","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/excoffierleonard.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-12-27T04:29:17.000Z","updated_at":"2026-02-11T03:18:49.000Z","dependencies_parsed_at":"2025-03-04T17:28:09.074Z","dependency_job_id":"91f5f7a4-2688-4812-a9b6-c9e61ba8d93a","html_url":"https://github.com/excoffierleonard/parser","commit_stats":null,"previous_names":["excoffierleonard/parser"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/excoffierleonard/parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/excoffierleonard%2Fparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/excoffierleonard%2Fparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/excoffierleonard%2Fparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/excoffierleonard%2Fparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/excoffierleonard","download_url":"https://codeload.github.com/excoffierleonard/parser/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/excoffierleonard%2Fparser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29969243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T11:43:06.159Z","status":"ssl_error","status_checked_at":"2026-03-01T11:43:03.887Z","response_time":124,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-rest","parser","rust"],"created_at":"2025-06-08T17:06:07.103Z","updated_at":"2026-03-01T12:32:27.030Z","avatar_url":"https://github.com/excoffierleonard.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Parser\n\nA Rust-based document parsing system that extracts text content from various file formats.\n\n[Live Demo](https://parser.excoffierleonard.com) | [API Endpoint](https://parser.excoffierleonard.com/parse)\n\n![Website Preview](website_preview.png)\n\n## 📚 Overview\n\nParser is a modular Rust project that provides comprehensive document parsing capabilities through multiple interfaces:\n\n- **Core library**: The foundation providing parsing functionality for various file formats\n- **CLI tool**: Command-line interface for quick file parsing\n- **Web API**: REST service for parsing files via HTTP requests\n- **Web UI**: Simple interface for testing the parser functionality\n\n## 📦 Project Structure\n\nThe project is organized as a Rust workspace with multiple crates:\n\n- **parser-core**: The core parsing engine\n- **parser-cli**: Command-line interface\n- **parser-web**: Web API and frontend\n- **test-utils**: Shared testing utilities\n\n## 📄 Supported File Types\n\n- **Documents**: PDF (`.pdf`), Word (`.docx`), PowerPoint (`.pptx`), Excel (`.xlsx`)\n- **Text**: Plain text (`.txt`), CSV, JSON, YAML, source code, and other text-based formats\n- **Images**: PNG, JPEG, WebP, and other image formats with OCR (Optical Character Recognition)\n\nThe OCR functionality supports English and French languages.\n\n## 🛠️ Getting Started\n\n### Prerequisites\n\n- [Rust](https://www.rust-lang.org/learn/get-started) (latest stable)\n- OCR Dependencies:\n  - Tesseract development libraries\n  - Leptonica development libraries\n  - Clang development libraries\n\n#### Installing OCR Dependencies\n\n**Debian/Ubuntu:**\n\n```bash\nsudo apt install libtesseract-dev libleptonica-dev libclang-dev\n```\n\n**macOS:**\n\n```bash\nbrew install tesseract\n```\n\n**Windows:**\nFollow the instructions at [Tesseract GitHub repository](https://github.com/tesseract-ocr/tesseract).\n\n### Building from Source\n\n```bash\n# Build all crates\ncargo build\n\n# Build in release mode\ncargo build --release\n```\n\n### Using the CLI\n\n```bash\n# Run directly with cargo\ncargo run -p parser-cli -- path/to/file1.pdf path/to/file2.docx\n\n# Or use the built binary\n./target/release/parser-cli path/to/file1.pdf path/to/file2.docx\n```\n\n### Running the Web Server\n\n```bash\n# Run the web server\ncargo run -p parser-web\n\n# With custom port\nPARSER_APP_PORT=9000 cargo run -p parser-web\n\n# With file serving enabled (for frontend)\nENABLE_FILE_SERVING=true cargo run -p parser-web\n```\n\n## 🚀 Deployment\n\nThe easiest way to deploy the service is using Docker:\n\n```bash\ncurl -o compose.yaml https://raw.githubusercontent.com/excoffierleonard/parser/refs/heads/main/compose.yaml \u0026\u0026 \\\ndocker compose up -d\n```\n\n### Environment Variables\n\n- `PARSER_APP_PORT`: The port on which the web service listens (default: 8080)\n- `ENABLE_FILE_SERVING`: Enable serving frontend files (default: false)\n\n## 🧪 Development\n\n### Testing\n\n```bash\n# Run all tests\ncargo test --workspace\n\n# Run specific test\ncargo test test_name\n```\n\n### Benchmarking\n\n```bash\n# Run benchmarks\ncargo bench --workspace\n\n# Run benchmark script\n./scripts/benchmark.sh\n```\n\n### Code Quality\n\n```bash\n# Run linter\ncargo clippy --workspace -- -D warnings\n\n# Format code\ncargo fmt --all\n```\n\n### Building with Scripts\n\n```bash\n# Full build script\n./scripts/build.sh\n\n# Deployment tests\n./scripts/deploy-tests.sh\n```\n\n## 📜 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexcoffierleonard%2Fparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexcoffierleonard%2Fparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexcoffierleonard%2Fparser/lists"}