{"id":30912389,"url":"https://github.com/h1anonymoush1/docify","last_synced_at":"2026-05-15T12:02:10.731Z","repository":{"id":312497456,"uuid":"1047681094","full_name":"h1Anonymoush1/Docify","owner":"h1Anonymoush1","description":null,"archived":false,"fork":false,"pushed_at":"2025-09-07T07:07:43.000Z","size":6281,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-07T09:10:28.148Z","etag":null,"topics":["appwrite","appwrite-sites","appwritehackathon"],"latest_commit_sha":null,"homepage":"https://docify-website.appwrite.network","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/h1Anonymoush1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-31T01:11:17.000Z","updated_at":"2025-09-07T07:07:46.000Z","dependencies_parsed_at":"2025-08-31T04:29:32.641Z","dependency_job_id":null,"html_url":"https://github.com/h1Anonymoush1/Docify","commit_stats":null,"previous_names":["h1anonymoush1/docify"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/h1Anonymoush1/Docify","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h1Anonymoush1%2FDocify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h1Anonymoush1%2FDocify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h1Anonymoush1%2FDocify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h1Anonymoush1%2FDocify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/h1Anonymoush1","download_url":"https://codeload.github.com/h1Anonymoush1/Docify/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h1Anonymoush1%2FDocify/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274367808,"owners_count":25272302,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-09T02:00:10.223Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["appwrite","appwrite-sites","appwritehackathon"],"created_at":"2025-09-09T21:51:45.036Z","updated_at":"2026-05-15T12:02:10.724Z","avatar_url":"https://github.com/h1Anonymoush1.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Docify - AI-Powered Document Analysis\n\nDocify is a web application that allows users to analyze any website using AI-powered content extraction and visualization. Users can input a URL and instructions, and the system will scrape the content, analyze it with Google's Gemini AI, and present the results in interactive charts and summaries.\n\n## 🎯 Key Features\n\n- **🔒 Raw Content Preservation**: Saves exact browserless HTML without dangerous cleaning\n- **🤖 AI-Generated Titles**: Smart 2-4 word titles using Gemini AI\n- **📝 Readable Summaries**: Human-friendly summaries up to 200 characters\n- **🔗 Format Compatibility**: Same JSON blocks format as original analyzer\n- **🎯 8-Step Linear Process**: Clear, reliable processing pipeline\n- **🛡️ Error Recovery**: Graceful failure handling with status updates\n- **🌐 Universal Web Scraping**: Extract content from any website\n- **📊 Interactive Visualizations**: Automatic generation of Mermaid diagrams and charts\n- **📱 Responsive Design**: Works on all devices with adaptive grid layouts\n\n## 🔄 How It Works\n\n```mermaid\ngraph TD\n    A[User Submits URL] --\u003e B[Create Document Record]\n    B --\u003e C[Trigger Unified Function]\n    C --\u003e D[Extract Document Data]\n    D --\u003e E[Validate Environment]\n    E --\u003e F[Raw Browserless Scraping]\n    F --\u003e G[Save Raw Content]\n    G --\u003e H[Generate AI Title]\n    H --\u003e I[Generate Analysis]\n    I --\u003e J[Create Compatible Blocks]\n    J --\u003e K[Final Save \u0026 Complete]\n    K --\u003e L[Display Results]\n\n    style A fill:#14b8a6,color:#ffffff\n    style K fill:#14b8a6,color:#ffffff\n    style L fill:#14b8a6,color:#ffffff\n```\n\n## 🏗️ Architecture\n\n```\n┌─────────────────┐    ┌─────────────────────┐    ┌─────────────────┐\n│   User Input    │    │   Unified Function   │    │   Results View  │\n│   (Frontend)    │───▶│   (Appwrite)         │───▶│   (Frontend)    │\n│                 │    │   8-Step Process     │    │                 │\n└─────────────────┘    └─────────────────────┘    └─────────────────┘\n         │                       │                       │\n         ▼                       ▼                       ▼\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   Document      │    │   Gemini AI     │    │   Browserless   │\n│   Creation      │    │   (Analysis)    │    │   (Scraping)    │\n│   (Database)    │    │                 │    │                 │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n```\n\n### 🔄 Processing Flow\n1. **User submits URL** → Document record created\n2. **Unified function triggered** → 8-step processing begins\n3. **Raw content scraped** → Exact HTML preserved\n4. **AI analysis performed** → Gemini generates insights\n5. **Results formatted** → Compatible with existing frontend\n6. **Document updated** → Ready for display\n\n### 🛠️ Technical Stack\n- **Frontend**: SvelteKit with TypeScript\n- **Backend**: Appwrite Functions (Python)\n- **Database**: Appwrite Database (consolidated schema)\n- **AI**: Google Gemini 2.5 Pro\n- **Scraping**: Browserless.io + Requests\n- **Hosting**: Vercel (frontend) + Appwrite Cloud (backend)\n\n## 📋 Prerequisites\n\n- Node.js 18+\n- npm or yarn\n- Python 3.9+ (for function development)\n- Appwrite account and project\n- Google Gemini API key\n- Browserless.io API key (optional, enhances scraping)\n\n## 🛠️ Setup Instructions\n\n### 1. Appwrite Project Setup\n\n1. Create a new project on [Appwrite Cloud](https://cloud.appwrite.io)\n2. Note your Project ID and API Endpoint\n3. Enable the following services:\n   - Databases\n   - Functions\n   - Storage (optional)\n\n### 2. Database Configuration\n\nCreate a single consolidated collection in your Appwrite database:\n\n#### Documents Collection (Consolidated)\n```json\n{\n  \"name\": \"documents_table\",\n  \"permissions\": [\"create\", \"read\", \"update\"],\n  \"attributes\": [\n    {\"key\": \"user_id\", \"type\": \"string\", \"size\": 36, \"required\": true},\n    {\"key\": \"title\", \"type\": \"string\", \"size\": 255, \"required\": false},\n    {\"key\": \"url\", \"type\": \"string\", \"required\": true},\n    {\"key\": \"instructions\", \"type\": \"string\", \"size\": 1000, \"required\": true},\n    {\"key\": \"status\", \"type\": \"enum\", \"elements\": [\"pending\", \"scraping\", \"analyzing\", \"completed\", \"failed\"], \"required\": true},\n    {\"key\": \"public\", \"type\": \"boolean\", \"default\": false},\n    {\"key\": \"scraped_content\", \"type\": \"string\", \"size\": 99999, \"required\": false},\n    {\"key\": \"analysis_summary\", \"type\": \"string\", \"size\": 2000, \"required\": false},\n    {\"key\": \"analysis_blocks\", \"type\": \"string\", \"size\": 99999, \"required\": false},\n    {\"key\": \"gemini_tools_used\", \"type\": \"string\", \"size\": 1000, \"required\": false},\n    {\"key\": \"research_context\", \"type\": \"string\", \"size\": 5000, \"required\": false},\n    {\"key\": \"$createdAt\", \"type\": \"datetime\", \"required\": true},\n    {\"key\": \"$updatedAt\", \"type\": \"datetime\", \"required\": true}\n  ]\n}\n```\n\n**Key Changes:**\n- **Single Collection**: All data consolidated into one table\n- **AI-Generated Titles**: `title` field now contains AI-generated 2-4 word titles\n- **Raw Content**: `scraped_content` stores exact browserless HTML\n- **Compatible Format**: `analysis_blocks` maintains same JSON structure as original analyzer\n- **Enhanced Fields**: Added `gemini_tools_used` and `research_context` for tracking\n\n### 3. Environment Variables\n\nCreate environment files for both frontend and backend:\n\n#### Frontend (.env.local in docify-website/)\n```env\n# Appwrite Configuration\nNEXT_PUBLIC_APPWRITE_ENDPOINT=https://your-region.cloud.appwrite.io/v1\nNEXT_PUBLIC_APPWRITE_PROJECT_ID=your-project-id\nNEXT_PUBLIC_APPWRITE_DATABASE_ID=your-database-id\nNEXT_PUBLIC_APPWRITE_DOCUMENTS_COLLECTION_ID=documents_table\n\n# OAuth Configuration (if using social login)\nNEXT_PUBLIC_APPWRITE_OAUTH_SUCCESS_URL=http://localhost:5173/auth/success\nNEXT_PUBLIC_APPWRITE_OAUTH_FAILURE_URL=http://localhost:5173/auth/error\n```\n\n#### Backend Function Environment Variables\nSet these in your Appwrite function configuration:\n```env\n# Required\nGEMINI_API_KEY=your-gemini-api-key\nDATABASE_ID=your-database-id\nDOCUMENTS_COLLECTION_ID=documents_table\n\n# Optional (enhances scraping)\nBROWSERLESS_API_KEY=your-browserless-api-key\n```\n\n### 4. Deploy Unified Function\n\nDeploy the unified orchestrator function:\n\n```bash\n# Install Appwrite CLI\nnpm install -g appwrite-cli\n\n# Login to Appwrite\nappwrite login\n\n# Navigate to function directory\ncd functions/docify-unified-orchestrator\n\n# Deploy the unified function\nappwrite functions create-deployment \\\n  --function-id docify-unified-orchestrator \\\n  --activate true \\\n  --code .\n```\n\n**Function Details:**\n- **Name**: Docify Unified Orchestrator v3.0\n- **Runtime**: Python 3.9\n- **Trigger**: Database events on document creation\n- **Timeout**: 500 seconds (for 8-step process)\n- **Memory**: 1024MB\n\n**Note**: The unified function replaces the previous separate scraper and analyzer functions.\n\n### 5. Frontend Setup\n\n```bash\ncd docify-website\nnpm install\nnpm run dev\n```\n\n## 🎯 Usage\n\n### Creating a Document\n\n1. Navigate to the main page of your application\n2. Enter a URL you want to analyze\n3. Provide analysis instructions (e.g., \"Create a visual overview of the API endpoints\")\n4. Click \"Create Document\"\n\n### 8-Step Processing Flow\n\nThe unified function executes 8 sequential steps:\n\n1. **📋 Extract Document Data** - Parse request and validate inputs\n2. **📝 Validate Environment** - Check API keys and configuration\n3. **🌐 Raw Browserless Scraping** - Scrape content without modification\n4. **💾 Save Raw Content** - Store exact HTML in database\n5. **🏷️ Generate AI Title** - Create 2-4 word intelligent titles\n6. **📈 Generate Analysis** - Produce comprehensive AI analysis\n7. **🧩 Create Compatible Blocks** - Format blocks for frontend\n8. **✅ Final Save \u0026 Complete** - Update database and mark complete\n\n### Analysis Results\n\nThe system will:\n1. **Preserve** raw HTML content without dangerous cleaning\n2. **Generate** AI-powered 2-4 word titles\n3. **Analyze** content using Google Gemini AI\n4. **Create** multiple content blocks in compatible JSON format:\n   - Summary of the document (≤200 chars)\n   - Mermaid diagrams and flowcharts\n   - Code examples with syntax highlighting\n   - Key points and highlights\n   - API references and guides\n   - Troubleshooting and best practices\n\n### Content Block Types\n\n- **Summary**: High-level overview\n- **Mermaid**: Visual diagrams and flowcharts\n- **Code**: Code examples with syntax highlighting\n- **Key Points**: Important highlights and takeaways\n- **API Reference**: API documentation\n- **Guide**: Step-by-step instructions\n- **Architecture**: System/component diagrams\n- **Best Practices**: Recommendations\n- **Troubleshooting**: Common issues and solutions\n\n## 🔧 Configuration\n\n### Unified Function Environment Variables\n\n#### Required Variables\n- `GEMINI_API_KEY`: Your Google Gemini API key\n- `DATABASE_ID`: Your Appwrite database ID\n- `DOCUMENTS_COLLECTION_ID`: Documents table ID (documents_table)\n\n#### Optional Variables\n- `BROWSERLESS_API_KEY`: Browserless.io API key for enhanced scraping\n\n### Status Tracking\n\nThe function updates document status through 5 stages:\n- `pending` → Document created, waiting for processing\n- `scraping` → Currently scraping content from URL\n- `analyzing` → Scraping complete, analyzing with Gemini\n- `completed` → Analysis complete, ready for display\n- `failed` → Processing failed (can be retried)\n\n### Customizing the AI Analysis\n\nEdit the analysis prompt in `functions/docify-unified-orchestrator/src/main.py` to customize how Gemini analyzes documents. The prompt includes instructions for generating compatible JSON blocks.\n\n## 📊 API Endpoints\n\n### POST `/functions/docify-unified-orchestrator/executions`\nTriggers the unified document processing pipeline.\n\n**Request Body:**\n```json\n{\n  \"documentId\": \"document-id\",\n  \"url\": \"https://example.com\",\n  \"instructions\": \"Analyze this documentation and create visual diagrams\"\n}\n```\n\n**Response:**\n```json\n{\n  \"success\": true,\n  \"executionId\": \"execution-id\",\n  \"message\": \"Unified processing started - 8 steps will be executed\"\n}\n```\n\n### Database Event Triggers\nThe unified function is automatically triggered when:\n- **Document Creation**: `databases.docify_db.collections.documents_table.documents.*.create`\n- **Status Updates**: Automatic progression through processing stages\n\n## 🐛 Troubleshooting\n\n### Common Issues\n\n1. **Function Deployment Fails**: Ensure Python 3.9+ runtime is selected and all dependencies are installed.\n\n2. **Gemini API Errors**: Check your `GEMINI_API_KEY` and ensure you have API quota remaining.\n\n3. **Browserless Scraping Fails**: Some websites block scraping. Try without `BROWSERLESS_API_KEY` or use different URLs.\n\n4. **Database Connection Issues**: Verify your Appwrite database configuration and collection permissions.\n\n5. **Function Timeouts**: The 8-step process may take time. The default 500s timeout should handle most documents.\n\n### Debug Mode\n\nMonitor function logs through the Appwrite Console:\n```bash\nappwrite functions logs --function-id docify-unified-orchestrator\n```\n\nCheck document status in your database to see processing progress through the 5 stages: `pending` → `scraping` → `analyzing` → `completed`/`failed`.\n\n## 🔒 Security\n\n- OAuth authentication with Google and GitHub\n- User-based data isolation in database\n- API keys stored securely as environment variables\n- Raw content preservation maintains original security context\n- Function execution limited to authorized users only\n\n## 🚀 Deployment\n\n### Production Deployment\n\n1. **Appwrite Setup**:\n   - Create production project on Appwrite Cloud\n   - Set up database with consolidated schema\n   - Configure OAuth providers (Google, GitHub)\n\n2. **Function Deployment**:\n   ```bash\n   cd functions/docify-unified-orchestrator\n   appwrite functions create-deployment --function-id docify-unified-orchestrator --activate true --code .\n   ```\n\n3. **Frontend Deployment**:\n   ```bash\n   cd docify-website\n   npm run build\n   npm run preview  # or deploy to Vercel/Netlify\n   ```\n\n4. **Environment Configuration**:\n   - Set production API keys\n   - Configure production database\n   - Set up monitoring and alerts\n\n### Scaling Considerations\n\n- **Function Limits**: 500s timeout, 1024MB memory for complex analyses\n- **Gemini API**: Monitor usage and costs\n- **Database**: Consolidated schema reduces query complexity\n- **Browserless**: Optional enhancement for difficult sites\n- **Storage**: Raw content preservation requires adequate storage\n\n## 🤝 Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests if applicable\n5. Submit a pull request\n\n## 📝 License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## 🆘 Support\n\nFor support and questions:\n- Check the troubleshooting section above\n- Review the Appwrite documentation\n- Monitor function logs: `appwrite functions logs --function-id docify-unified-orchestrator`\n- Check document status in database for processing progress\n\n## 📈 Key Improvements (v3.0)\n\n### 🔄 Unified Architecture\n- **Single Function**: Replaced separate scraper + analyzer with unified orchestrator\n- **8-Step Process**: Clear, linear processing pipeline\n- **Raw Content**: Preserves exact HTML without dangerous cleaning\n- **AI Titles**: Smart 2-4 word titles using Gemini\n\n### 🤖 Enhanced AI\n- **Google Gemini**: Latest AI model with advanced capabilities\n- **Compatible Format**: Same JSON blocks as original analyzer\n- **Error Recovery**: Graceful failure handling with status updates\n- **Simple Tools**: Clean tracking of AI tool usage\n\n### 🗄️ Database Optimization\n- **Consolidated Schema**: Single table for all document data\n- **Removed Fields**: Cleaned up unused attributes (13/17 used)\n- **Enhanced Fields**: Added tracking for tools and research context\n\n---\n\nBuilt with ❤️ using Appwrite, SvelteKit, Google Gemini, and Python","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh1anonymoush1%2Fdocify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fh1anonymoush1%2Fdocify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh1anonymoush1%2Fdocify/lists"}