{"id":28389824,"url":"https://github.com/xvc323/omnidocs","last_synced_at":"2025-06-27T21:32:01.973Z","repository":{"id":294354219,"uuid":"986699506","full_name":"xVc323/omnidocs","owner":"xVc323","description":"Automated documentation crawler that generates LLM-friendly Markdown from any docs site. Export as single or multi-file, ready for AI ingestion.","archived":false,"fork":false,"pushed_at":"2025-05-21T17:54:50.000Z","size":58,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-31T11:38:21.640Z","etag":null,"topics":["crawler","documentation","llm","markdown"],"latest_commit_sha":null,"homepage":"https://omnidocs.pat.network","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xVc323.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-20T02:13:51.000Z","updated_at":"2025-05-30T07:08:27.000Z","dependencies_parsed_at":"2025-05-20T20:15:13.596Z","dependency_job_id":null,"html_url":"https://github.com/xVc323/omnidocs","commit_stats":null,"previous_names":["xvc323/omnidocs"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/xVc323/omnidocs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xVc323%2Fomnidocs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xVc323%2Fomnidocs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xVc323%2Fomnidocs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xVc323%2Fomnidocs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xVc323","download_url":"https://codeload.github.com/xVc323/omnidocs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xVc323%2Fomnidocs/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262335120,"owners_count":23295571,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","documentation","llm","markdown"],"created_at":"2025-05-31T02:12:51.873Z","updated_at":"2025-06-27T21:32:01.964Z","avatar_url":"https://github.com/xVc323.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OmniDocs\n\nA powerful tool for automated documentation site crawling and Markdown conversion. **OmniDocs generates LLM-friendly Markdown files**—perfect for AI ingestion, semantic search, and knowledge base building. OmniDocs intelligently crawls documentation websites and exports them as well-formatted, structured Markdown files ready for use with large language models.\n\n## 🌟 Features\n\n- **Smart Crawling**: Automatically identifies and targets only documentation pages\n- **Structured Conversion**: Preserves document hierarchy and navigation order\n- **LLM-Optimized Output**: Produces clean, consistent Markdown ideal for AI/ML pipelines, RAG, and vector databases\n- **Flexible Output**: Choose between single consolidated Markdown file or multi-file ZIP archive\n- **High-Fidelity Markdown**: Accurately converts tables, code blocks, lists, and more\n- **User-Friendly Interface**: Simple form with advanced options for customization\n- **Responsive Design**: Works on desktop and mobile devices with dark mode support\n- **Real-time Progress**: Live updates during conversion process\n- **Temporary Storage**: Files automatically deleted after 1 hour (users notified)\n\n## 🌐 Live Demo\n\nVisit [omnidocs.pat.network](https://omnidocs.pat.network) to try OmniDocs now!\n\n## 📋 User Guide\n\n### Basic Usage\n\n1. Enter the URL of the documentation site you want to convert\n2. Click \"Convert Site\"\n3. Wait for the conversion to complete (you'll see a progress indicator)\n4. Download your converted documentation as either:\n   - A single Markdown file (all_docs.md)\n   - A ZIP archive containing individual Markdown files\n\n### Advanced Options\n\n- **Path Prefix**: Limit crawling to specific sections of a documentation site\n- **Include/Exclude Patterns**: Fine-tune which pages get crawled using regex patterns\n- **Output Format**: Choose between consolidated Markdown or multi-file ZIP\n\n### Important Notes\n\n- **Download Your Files Promptly**: All converted files are automatically deleted after 1 hour\n- **Large Sites**: Complex documentation sites with many pages may take several minutes to process\n- **Same-Domain Limitation**: OmniDocs only crawls pages within the same domain as the seed URL\n\n## 🛠️ Installation\n\n### Prerequisites\n\n- Python 3.9 or higher\n- Node.js 16 or higher\n- Redis (for Celery task queue)\n- Cloudflare R2 account or compatible S3 storage\n\n### Setup\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/xvc323/omnidocs.git\n   cd omnidocs\n   ```\n\n2. Install Python dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. Install frontend dependencies:\n   ```bash\n   cd frontend\n   npm install\n   cd ..\n   ```\n\n4. Set up environment variables (create a `.env` file based on `env.example`):\n   ```\n   R2_ACCOUNT_ID=your_account_id\n   R2_ACCESS_KEY_ID=your_access_key\n   R2_SECRET_ACCESS_KEY=your_secret_key\n   R2_BUCKET_NAME=your_bucket_name\n   ```\n\n## 🚀 Running Locally\n\nStart all services with the provided script:\n\n```bash\n./start-omnidocs.sh\n```\n\nOr start each component manually:\n\n1. Start Redis (required for Celery):\n   ```bash\n   redis-server\n   ```\n\n2. Start Celery worker:\n   ```bash\n   celery -A celery_app worker --loglevel=info\n   ```\n\n3. Start Celery beat (for scheduled tasks):\n   ```bash\n   celery -A celery_app beat --loglevel=info\n   ```\n\n4. Start the API server:\n   ```bash\n   uvicorn api_main:app --reload\n   ```\n\n5. Start the frontend (in a separate terminal):\n   ```bash\n   cd frontend \u0026\u0026 npm run dev\n   ```\n\n6. Open your browser and navigate to `http://localhost:3000`\n\n## 🐳 Docker Deployment\n\nOmniDocs can be deployed using Docker:\n\n```bash\ndocker-compose up -d\n```\n\nFor Railway deployments, use the provided Railway configuration files in the `railway/` directory.\n\n## 💻 API Endpoints\n\n- `POST /convert` - Start a new conversion job\n- `GET /download/{jobId}` - Download converted file\n- `GET /api/jobs/{jobId}/events` - SSE endpoint for job progress updates\n\n## 📄 License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgements\n\n- [FastAPI](https://fastapi.tiangolo.com/) - Web framework\n- [Next.js](https://nextjs.org/) - Frontend framework\n- [Pandoc](https://pandoc.org/) - Document conversion\n- [Celery](https://docs.celeryq.dev/) - Task queue\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxvc323%2Fomnidocs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxvc323%2Fomnidocs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxvc323%2Fomnidocs/lists"}