{"id":51026666,"url":"https://github.com/gitstq/documind-ai-pro","last_synced_at":"2026-06-21T20:02:11.245Z","repository":{"id":361857716,"uuid":"1256155482","full_name":"gitstq/documind-ai-pro","owner":"gitstq","description":"🧠 AI-powered document conversion and knowledge extraction tool - Convert documents to Markdown with intelligent analysis","archived":false,"fork":false,"pushed_at":"2026-06-01T14:13:53.000Z","size":31,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-01T16:12:19.434Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gitstq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-01T14:10:02.000Z","updated_at":"2026-06-01T14:35:16.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/gitstq/documind-ai-pro","commit_stats":null,"previous_names":["gitstq/documind-ai-pro"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/gitstq/documind-ai-pro","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fdocumind-ai-pro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fdocumind-ai-pro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fdocumind-ai-pro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fdocumind-ai-pro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gitstq","download_url":"https://codeload.github.com/gitstq/documind-ai-pro/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fdocumind-ai-pro/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34623906,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-21T02:00:05.568Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-21T20:02:10.498Z","updated_at":"2026-06-21T20:02:11.240Z","avatar_url":"https://github.com/gitstq.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# 🧠 DocuMind AI\n\n**AI-Powered Document Conversion \u0026 Knowledge Extraction**\n\n[![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)](https://www.python.org/)\n[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n[![PyPI](https://img.shields.io/badge/PyPI-Coming%20Soon-orange.svg)](https://pypi.org/)\n\n[English](#english) | [简体中文](#简体中文) | [繁體中文](#繁體中文)\n\n\u003c/div\u003e\n\n---\n\n\u003ca name=\"english\"\u003e\u003c/a\u003e\n## 🇺🇸 English\n\n### 🎉 Introduction\n\nDocuMind AI is an intelligent document processing tool that goes beyond simple format conversion. While inspired by tools like `markitdown`, DocuMind AI differentiates itself by leveraging **Large Language Models (LLMs)** to deeply understand document content, extract meaningful insights, and build knowledge graphs.\n\n**Key Differentiators:**\n- 🧠 **AI-Powered Analysis**: Not just conversion—understand your documents\n- 🔗 **Knowledge Graph Extraction**: Visualize relationships between entities\n- 🤖 **Multi-Model Support**: Works with OpenAI, Azure OpenAI, and compatible APIs\n- 📊 **Smart Summarization**: Automatic key points and topic extraction\n- 🌐 **15+ Format Support**: PDF, DOCX, XLSX, PPTX, HTML, and more\n\n### ✨ Core Features\n\n| Feature | Description | Status |\n|---------|-------------|--------|\n| 📄 **Document Conversion** | Convert 15+ formats to clean Markdown | ✅ Ready |\n| 🧠 **AI Analysis** | Summarize, extract entities, analyze sentiment | ✅ Ready |\n| 🔗 **Knowledge Graphs** | Extract entities and relationships | ✅ Ready |\n| ❓ **Q\u0026A Generation** | Auto-generate questions from content | ✅ Ready |\n| ✅ **Action Items** | Extract tasks and todos | ✅ Ready |\n| 🎨 **Rich CLI** | Beautiful terminal interface with progress bars | ✅ Ready |\n| 📦 **Multiple Exports** | JSON, Cypher (Neo4j), RDF formats | ✅ Ready |\n\n### 🚀 Quick Start\n\n#### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/gitstq/documind-ai-pro.git\ncd documind-ai-pro\n\n# Install dependencies\npip install -r requirements.txt\n\n# Or install in development mode\npip install -e .\n```\n\n#### Environment Setup\n\n```bash\n# Set your OpenAI API key\nexport OPENAI_API_KEY=\"your-api-key-here\"\n\n# Optional: Set custom API base for Azure or other providers\nexport OPENAI_API_BASE=\"https://api.openai.com/v1\"\n```\n\n#### Basic Usage\n\n```bash\n# Convert a single document\ndocumind convert document.pdf\n\n# Convert with AI analysis\ndocumind convert document.pdf --extract-kg --questions\n\n# Convert directory of documents\ndocumind convert ./documents/ -o ./output/\n\n# Analyze document only\ndocumind analyze document.pdf\n\n# Extract knowledge graph\ndocumind extract-kg document.pdf --format cypher\n```\n\n### 📖 Detailed Usage\n\n#### Command: `convert`\n\nConvert documents to Markdown with optional AI analysis.\n\n```bash\ndocumind convert [OPTIONS] INPUT_PATH\n\nOptions:\n  -o, --output PATH       Output directory\n  --no-ai                 Disable AI analysis\n  --model TEXT            AI model to use [default: gpt-4o-mini]\n  --api-key TEXT          OpenAI API key\n  --extract-kg            Extract knowledge graph\n  --questions             Generate questions\n  --actions               Extract action items\n  --max-pages INTEGER     Maximum pages to process\n```\n\n**Examples:**\n\n```bash\n# Basic conversion\ndocumind convert report.pdf\n\n# Full analysis with knowledge extraction\ndocumind convert report.pdf --extract-kg --questions --actions\n\n# Use specific model\ndocumind convert report.pdf --model gpt-4o\n\n# Batch processing\ndocumind convert ./input/ -o ./output/ --extract-kg\n```\n\n#### Command: `analyze`\n\nAnalyze document content with AI.\n\n```bash\ndocumind analyze [OPTIONS] INPUT_PATH\n\nOptions:\n  --model TEXT    AI model to use\n  --api-key TEXT  OpenAI API key\n```\n\n**Output includes:**\n- 📋 Executive summary\n- 📝 Key points (top 10)\n- 🏷️ Topics and themes\n- 💭 Sentiment analysis\n- 👥 Named entities\n\n#### Command: `extract-kg`\n\nExtract knowledge graph from document.\n\n```bash\ndocumind extract-kg [OPTIONS] INPUT_PATH\n\nOptions:\n  -o, --output PATH       Output file path\n  --format [json|cypher|rdf]  Output format [default: json]\n```\n\n**Export Formats:**\n- **JSON**: Standard graph format with nodes and edges\n- **Cypher**: Neo4j query language for graph databases\n- **RDF**: Resource Description Framework for semantic web\n\n### 💡 Design Philosophy\n\nDocuMind AI was built with three core principles:\n\n1. **Intelligence over Conversion**: We don't just convert formats—we understand content\n2. **Developer Experience**: Rich CLI, comprehensive APIs, and extensive documentation\n3. **Extensibility**: Modular architecture for easy customization\n\n### 📦 Supported Formats\n\n| Format | Extension | Conversion | AI Analysis |\n|--------|-----------|------------|-------------|\n| PDF | .pdf | ✅ | ✅ |\n| Word | .docx, .doc | ✅ | ✅ |\n| Excel | .xlsx, .xls | ✅ | ✅ |\n| PowerPoint | .pptx, .ppt | ✅ | ✅ |\n| HTML | .html, .htm | ✅ | ✅ |\n| Markdown | .md | ✅ | ✅ |\n| Text | .txt | ✅ | ✅ |\n| CSV | .csv | ✅ | ✅ |\n| JSON | .json | ✅ | ✅ |\n| XML | .xml | ✅ | ✅ |\n| RTF | .rtf | ✅ | ✅ |\n| OpenDocument | .odt, .ods, .odp | ✅ | ✅ |\n\n### 🤝 Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'feat: add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n### 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n\u003ca name=\"简体中文\"\u003e\u003c/a\u003e\n## 🇨🇳 简体中文\n\n### 🎉 项目介绍\n\nDocuMind AI 是一款智能文档处理工具，它超越了简单的格式转换。虽然灵感来源于 `markitdown` 等工具，但 DocuMind AI 通过利用**大语言模型（LLM）**来深度理解文档内容、提取有意义的洞察，并构建知识图谱，从而实现差异化。\n\n**核心差异化亮点：**\n- 🧠 **AI 驱动分析**：不仅是转换，更是理解您的文档\n- 🔗 **知识图谱提取**：可视化实体间的关系\n- 🤖 **多模型支持**：兼容 OpenAI、Azure OpenAI 及兼容 API\n- 📊 **智能摘要**：自动提取关键点和主题\n- 🌐 **15+ 格式支持**：PDF、DOCX、XLSX、PPTX、HTML 等\n\n### ✨ 核心特性\n\n| 特性 | 描述 | 状态 |\n|------|------|------|\n| 📄 **文档转换** | 将 15+ 格式转换为干净的 Markdown | ✅ 就绪 |\n| 🧠 **AI 分析** | 摘要、实体提取、情感分析 | ✅ 就绪 |\n| 🔗 **知识图谱** | 提取实体和关系 | ✅ 就绪 |\n| ❓ **问答生成** | 从内容自动生成问题 | ✅ 就绪 |\n| ✅ **行动项提取** | 提取任务和待办事项 | ✅ 就绪 |\n| 🎨 **精美 CLI** | 带进度条的优雅终端界面 | ✅ 就绪 |\n| 📦 **多格式导出** | JSON、Cypher (Neo4j)、RDF 格式 | ✅ 就绪 |\n\n### 🚀 快速开始\n\n#### 安装\n\n```bash\n# 克隆仓库\ngit clone https://github.com/gitstq/documind-ai-pro.git\ncd documind-ai-pro\n\n# 安装依赖\npip install -r requirements.txt\n\n# 或以开发模式安装\npip install -e .\n```\n\n#### 环境配置\n\n```bash\n# 设置 OpenAI API 密钥\nexport OPENAI_API_KEY=\"your-api-key-here\"\n\n# 可选：为 Azure 或其他提供商设置自定义 API 基础地址\nexport OPENAI_API_BASE=\"https://api.openai.com/v1\"\n```\n\n#### 基本用法\n\n```bash\n# 转换单个文档\ndocumind convert document.pdf\n\n# 带 AI 分析的转换\ndocumind convert document.pdf --extract-kg --questions\n\n# 批量转换目录\ndocumind convert ./documents/ -o ./output/\n\n# 仅分析文档\ndocumind analyze document.pdf\n\n# 提取知识图谱\ndocumind extract-kg document.pdf --format cypher\n```\n\n### 📖 详细使用指南\n\n#### 命令：`convert`\n\n将文档转换为 Markdown，可选 AI 分析。\n\n```bash\ndocumind convert [选项] 输入路径\n\n选项：\n  -o, --output 路径       输出目录\n  --no-ai                 禁用 AI 分析\n  --model 文本            使用的 AI 模型 [默认: gpt-4o-mini]\n  --api-key 文本          OpenAI API 密钥\n  --extract-kg            提取知识图谱\n  --questions             生成问题\n  --actions               提取行动项\n  --max-pages 整数        最大处理页数\n```\n\n**示例：**\n\n```bash\n# 基础转换\ndocumind convert report.pdf\n\n# 完整分析并提取知识\ndocumind convert report.pdf --extract-kg --questions --actions\n\n# 使用特定模型\ndocumind convert report.pdf --model gpt-4o\n\n# 批量处理\ndocumind convert ./input/ -o ./output/ --extract-kg\n```\n\n#### 命令：`analyze`\n\n使用 AI 分析文档内容。\n\n```bash\ndocumind analyze [选项] 输入路径\n\n选项：\n  --model 文本    使用的 AI 模型\n  --api-key 文本  OpenAI API 密钥\n```\n\n**输出包括：**\n- 📋 执行摘要\n- 📝 关键点（前 10 个）\n- 🏷️ 主题和标签\n- 💭 情感分析\n- 👥 命名实体\n\n#### 命令：`extract-kg`\n\n从文档提取知识图谱。\n\n```bash\ndocumind extract-kg [选项] 输入路径\n\n选项：\n  -o, --output 路径       输出文件路径\n  --format [json|cypher|rdf]  输出格式 [默认: json]\n```\n\n**导出格式：**\n- **JSON**：标准图谱格式，包含节点和边\n- **Cypher**：Neo4j 图数据库查询语言\n- **RDF**：语义网资源描述框架\n\n### 💡 设计理念\n\nDocuMind AI 基于三个核心原则构建：\n\n1. **智能优于转换**：我们不只是转换格式，更是理解内容\n2. **开发者体验**：丰富的 CLI、全面的 API 和详尽的文档\n3. **可扩展性**：模块化架构，易于定制\n\n### 📦 支持的格式\n\n| 格式 | 扩展名 | 转换 | AI 分析 |\n|------|--------|------|---------|\n| PDF | .pdf | ✅ | ✅ |\n| Word | .docx, .doc | ✅ | ✅ |\n| Excel | .xlsx, .xls | ✅ | ✅ |\n| PowerPoint | .pptx, .ppt | ✅ | ✅ |\n| HTML | .html, .htm | ✅ | ✅ |\n| Markdown | .md | ✅ | ✅ |\n| 文本 | .txt | ✅ | ✅ |\n| CSV | .csv | ✅ | ✅ |\n| JSON | .json | ✅ | ✅ |\n| XML | .xml | ✅ | ✅ |\n| RTF | .rtf | ✅ | ✅ |\n| OpenDocument | .odt, .ods, .odp | ✅ | ✅ |\n\n### 🤝 贡献指南\n\n我们欢迎贡献！详情请参阅我们的[贡献指南](CONTRIBUTING.md)。\n\n1. Fork 本仓库\n2. 创建您的功能分支 (`git checkout -b feature/amazing-feature`)\n3. 提交您的更改 (`git commit -m 'feat: add amazing feature'`)\n4. 推送到分支 (`git push origin feature/amazing-feature`)\n5. 开启 Pull Request\n\n### 📄 开源协议\n\n本项目采用 MIT 协议开源 - 详见 [LICENSE](LICENSE) 文件。\n\n---\n\n\u003ca name=\"繁體中文\"\u003e\u003c/a\u003e\n## 🇹🇼 繁體中文\n\n### 🎉 專案介紹\n\nDocuMind AI 是一款智慧文件處理工具，它超越了簡單的格式轉換。雖然靈感來源於 `markitdown` 等工具，但 DocuMind AI 透過利用**大型語言模型（LLM）**來深度理解文件內容、提取有意義的洞察，並建構知識圖譜，從而實現差異化。\n\n**核心差異化亮點：**\n- 🧠 **AI 驅動分析**：不僅是轉換，更是理解您的文件\n- 🔗 **知識圖譜提取**：可視化實體間的關係\n- 🤖 **多模型支援**：相容 OpenAI、Azure OpenAI 及相容 API\n- 📊 **智慧摘要**：自動提取關鍵點和主題\n- 🌐 **15+ 格式支援**：PDF、DOCX、XLSX、PPTX、HTML 等\n\n### ✨ 核心特性\n\n| 特性 | 描述 | 狀態 |\n|------|------|------|\n| 📄 **文件轉換** | 將 15+ 格式轉換為乾淨的 Markdown | ✅ 就緒 |\n| 🧠 **AI 分析** | 摘要、實體提取、情感分析 | ✅ 就緒 |\n| 🔗 **知識圖譜** | 提取實體和關係 | ✅ 就緒 |\n| ❓ **問答生成** | 從內容自動生成問題 | ✅ 就緒 |\n| ✅ **行動項提取** | 提取任務和待辦事項 | ✅ 就緒 |\n| 🎨 **精美 CLI** | 帶進度條的優雅終端介面 | ✅ 就緒 |\n| 📦 **多格式匯出** | JSON、Cypher (Neo4j)、RDF 格式 | ✅ 就緒 |\n\n### 🚀 快速開始\n\n#### 安裝\n\n```bash\n# 克隆倉庫\ngit clone https://github.com/gitstq/documind-ai-pro.git\ncd documind-ai-pro\n\n# 安裝依賴\npip install -r requirements.txt\n\n# 或以開發模式安裝\npip install -e .\n```\n\n#### 環境配置\n\n```bash\n# 設定 OpenAI API 金鑰\nexport OPENAI_API_KEY=\"your-api-key-here\"\n\n# 可選：為 Azure 或其他提供商設定自定義 API 基礎地址\nexport OPENAI_API_BASE=\"https://api.openai.com/v1\"\n```\n\n#### 基本用法\n\n```bash\n# 轉換單個文件\ndocumind convert document.pdf\n\n# 帶 AI 分析的轉換\ndocumind convert document.pdf --extract-kg --questions\n\n# 批量轉換目錄\ndocumind convert ./documents/ -o ./output/\n\n# 僅分析文件\ndocumind analyze document.pdf\n\n# 提取知識圖譜\ndocumind extract-kg document.pdf --format cypher\n```\n\n### 📖 詳細使用指南\n\n#### 命令：`convert`\n\n將文件轉換為 Markdown，可選 AI 分析。\n\n```bash\ndocumind convert [選項] 輸入路徑\n\n選項：\n  -o, --output 路徑       輸出目錄\n  --no-ai                 禁用 AI 分析\n  --model 文字            使用的 AI 模型 [預設: gpt-4o-mini]\n  --api-key 文字          OpenAI API 金鑰\n  --extract-kg            提取知識圖譜\n  --questions             生成問題\n  --actions               提取行動項\n  --max-pages 整數        最大處理頁數\n```\n\n**範例：**\n\n```bash\n# 基礎轉換\ndocumind convert report.pdf\n\n# 完整分析並提取知識\ndocumind convert report.pdf --extract-kg --questions --actions\n\n# 使用特定模型\ndocumind convert report.pdf --model gpt-4o\n\n# 批量處理\ndocumind convert ./input/ -o ./output/ --extract-kg\n```\n\n#### 命令：`analyze`\n\n使用 AI 分析文件內容。\n\n```bash\ndocumind analyze [選項] 輸入路徑\n\n選項：\n  --model 文字    使用的 AI 模型\n  --api-key 文字  OpenAI API 金鑰\n```\n\n**輸出包括：**\n- 📋 執行摘要\n- 📝 關鍵點（前 10 個）\n- 🏷️ 主題和標籤\n- 💭 情感分析\n- 👥 命名實體\n\n#### 命令：`extract-kg`\n\n從文件提取知識圖譜。\n\n```bash\ndocumind extract-kg [選項] 輸入路徑\n\n選項：\n  -o, --output 路徑       輸出檔案路徑\n  --format [json|cypher|rdf]  輸出格式 [預設: json]\n```\n\n**匯出格式：**\n- **JSON**：標準圖譜格式，包含節點和邊\n- **Cypher**：Neo4j 圖資料庫查詢語言\n- **RDF**：語義網資源描述框架\n\n### 💡 設計理念\n\nDocuMind AI 基於三個核心原則構建：\n\n1. **智慧優於轉換**：我們不只是轉換格式，更是理解內容\n2. **開發者體驗**：豐富的 CLI、全面的 API 和詳盡的文件\n3. **可擴展性**：模組化架構，易於定製\n\n### 📦 支援的格式\n\n| 格式 | 副檔名 | 轉換 | AI 分析 |\n|------|--------|------|---------|\n| PDF | .pdf | ✅ | ✅ |\n| Word | .docx, .doc | ✅ | ✅ |\n| Excel | .xlsx, .xls | ✅ | ✅ |\n| PowerPoint | .pptx, .ppt | ✅ | ✅ |\n| HTML | .html, .htm | ✅ | ✅ |\n| Markdown | .md | ✅ | ✅ |\n| 文字 | .txt | ✅ | ✅ |\n| CSV | .csv | ✅ | ✅ |\n| JSON | .json | ✅ | ✅ |\n| XML | .xml | ✅ | ✅ |\n| RTF | .rtf | ✅ | ✅ |\n| OpenDocument | .odt, .ods, .odp | ✅ | ✅ |\n\n### 🤝 貢獻指南\n\n我們歡迎貢獻！詳情請參閱我們的[貢獻指南](CONTRIBUTING.md)。\n\n1. Fork 本倉庫\n2. 建立您的功能分支 (`git checkout -b feature/amazing-feature`)\n3. 提交您的更改 (`git commit -m 'feat: add amazing feature'`)\n4. 推送到分支 (`git push origin feature/amazing-feature`)\n5. 開啟 Pull Request\n\n### 📄 開源協議\n\n本專案採用 MIT 協議開源 - 詳見 [LICENSE](LICENSE) 檔案。\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n**Made with ❤️ by the DocuMind AI Team**\n\n[⭐ Star us on GitHub](https://github.com/gitstq/documind-ai-pro) | [🐛 Report Issues](https://github.com/gitstq/documind-ai-pro/issues) | [💡 Request Features](https://github.com/gitstq/documind-ai-pro/discussions)\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgitstq%2Fdocumind-ai-pro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgitstq%2Fdocumind-ai-pro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgitstq%2Fdocumind-ai-pro/lists"}