{"id":51026845,"url":"https://github.com/gitstq/documind-converter-v2","last_synced_at":"2026-06-21T20:02:25.586Z","repository":{"id":363167064,"uuid":"1262186000","full_name":"gitstq/documind-converter-v2","owner":"gitstq","description":"🧠 DocuMind-Converter - 轻量级AI文档智能转换与结构化提取引擎 | Lightweight AI Document Intelligent Conversion \u0026 Structured Extraction Engine - Zero Dependencies","archived":false,"fork":false,"pushed_at":"2026-06-07T17:25:53.000Z","size":35,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-07T19:15:05.798Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gitstq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-07T17:22:18.000Z","updated_at":"2026-06-07T17:25:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/gitstq/documind-converter-v2","commit_stats":null,"previous_names":["gitstq/documind-converter-v2"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/gitstq/documind-converter-v2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fdocumind-converter-v2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fdocumind-converter-v2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fdocumind-converter-v2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fdocumind-converter-v2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gitstq","download_url":"https://codeload.github.com/gitstq/documind-converter-v2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gitstq%2Fdocumind-converter-v2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34623906,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-21T02:00:05.568Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-21T20:02:24.924Z","updated_at":"2026-06-21T20:02:25.580Z","avatar_url":"https://github.com/gitstq.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# 🧠 DocuMind-Converter\n\n**轻量级AI文档智能转换与结构化提取引擎**\n\n*Lightweight AI Document Intelligent Conversion \u0026 Structured Extraction Engine*\n\n[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)\n[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n[![Zero Dependencies](https://img.shields.io/badge/Zero-Dependencies-orange)](setup.py)\n[![Tests](https://img.shields.io/badge/Tests-Passing-brightgreen)]()\n\n[English](#english) | [简体中文](#简体中文) | [繁體中文](#繁體中文)\n\n\u003c/div\u003e\n\n---\n\n## 简体中文\n\n### 🎉 项目介绍\n\nDocuMind-Converter 是一款**零依赖**的轻量级AI文档智能转换与结构化提取引擎，专为开发者、内容创作者和数据处理专家设计。\n\n**灵感来源**：本项目受到微软 [markitdown](https://github.com/microsoft/markitdown) 项目的启发，但采用了完全不同的技术路线——我们追求**极致轻量**和**零依赖**，让文档转换不再受困于复杂的依赖链。\n\n**核心价值**：\n- 🚀 **零依赖架构** - 纯Python标准库实现，无需安装任何第三方包\n- 🤖 **AI智能分析** - 内置关键词提取、摘要生成、实体识别等智能功能\n- 🔄 **多格式互转** - 支持 Markdown ↔ HTML ↔ JSON ↔ YAML ↔ Plain 双向转换\n- 📊 **结构化输出** - 不仅转换格式，更提取文档结构、生成目录、识别关键信息\n- 🖥️ **交互式TUI** - 提供美观的终端交互界面，零学习成本\n- 📁 **批量处理** - 支持文件夹批量转换、通配符匹配、并行处理\n\n### ✨ 核心特性\n\n| 特性 | 描述 | 状态 |\n|------|------|------|\n| 📝 **多格式支持** | Markdown/HTML/JSON/YAML/CSV/XML/RST/Org-mode | ✅ 已支持 |\n| 🧠 **智能分析** | 关键词提取、摘要生成、可读性分析 | ✅ 已支持 |\n| 🔍 **实体识别** | 自动识别邮箱、URL、IP、日期、版本号 | ✅ 已支持 |\n| 📑 **目录生成** | 自动生成文档目录(TOC) | ✅ 已支持 |\n| 🎨 **多主题** | Default/Minimal/Fancy 三种输出主题 | ✅ 已支持 |\n| 📊 **增强报告** | 结构化分析报告，包含统计和可读性评分 | ✅ 已支持 |\n| 🖥️ **TUI界面** | 交互式终端界面，菜单驱动 | ✅ 已支持 |\n| 📁 **批量处理** | 文件夹批量转换、并行处理 | ✅ 已支持 |\n| 🔄 **管道模式** | 支持自定义处理管道链 | ✅ 已支持 |\n| 🌐 **中英文支持** | 完整的中英文文档内容处理 | ✅ 已支持 |\n\n### 🚀 快速开始\n\n#### 环境要求\n\n- **Python**: 3.10 或更高版本\n- **操作系统**: Windows / macOS / Linux\n\n#### 安装\n\n```bash\n# 从源码安装\ngit clone https://github.com/gitstq/documind-converter-v2.git\ncd documind-converter-v2\npip install -e .\n\n# 或使用 pip (即将发布)\npip install documind-converter\n```\n\n#### 基本使用\n\n```bash\n# 单文件转换\ndocumind convert input.md -o output.html -f html\n\n# 批量转换\ndocumind batch \"docs/*.md\" -o out/ -f json\n\n# 文档分析\ndocumind analyze document.md -o report.txt\n\n# 查看文档信息\ndocumind info document.md\n\n# 交互式TUI界面\ndocumind-tui\n```\n\n#### Python API\n\n```python\nfrom documind import DocumentConverter, StructureExtractor, BatchPipeline\n\n# 单文件转换\nconverter = DocumentConverter()\nresult = converter.convert('input.md', output_format='html', output_path='output.html')\n\n# 文档分析\nextractor = StructureExtractor()\nanalysis = extractor.analyze_document(open('doc.md').read())\nprint(f\"关键词: {[kw[0] for kw in analysis['keywords'][:5]]}\")\nprint(f\"摘要: {analysis['summary']}\")\n\n# 批量转换\npipeline = BatchPipeline()\nresults = pipeline.batch_convert('docs/*.md', 'output/', 'html')\n```\n\n### 📖 详细使用指南\n\n#### 命令行界面\n\n```bash\n# 转换格式\ndocumind convert input.md -o output.html -f html --theme fancy\n\n# 分析文档\ndocumind analyze paper.md -o analysis.report -f report\n\n# 批量处理\ndocumind batch \"**/*.md\" -o converted/ -f structured -j 8\n\n# 查看帮助\ndocumind --help\ndocumind convert --help\n```\n\n#### 支持的格式\n\n**输入格式**: `.md`, `.markdown`, `.txt`, `.html`, `.htm`, `.json`, `.yaml`, `.yml`, `.csv`, `.xml`, `.rst`, `.org`\n\n**输出格式**: `markdown`, `html`, `json`, `yaml`, `plain`, `structured`\n\n#### 高级配置\n\n```python\nfrom documind import DocumentConverter, OutputFormatter\n\n# 自定义配置\nconfig = {\n    'min_keyword_length': 3,\n    'max_keywords': 30,\n    'format': {\n        'theme': 'fancy',\n        'include_toc': True,\n        'include_stats': True\n    }\n}\n\nconverter = DocumentConverter(config)\nformatter = OutputFormatter(theme='fancy', config=config['format'])\n```\n\n### 💡 设计思路与迭代规划\n\n#### 技术选型原因\n\n- **纯标准库实现**: 消除依赖地狱，确保在任何Python环境中开箱即用\n- **模块化架构**: Converter/Extractor/Formatter/Pipeline 四层分离，易于扩展\n- **规则+统计混合**: 轻量级NLP实现，无需重型ML框架即可实现智能分析\n\n#### 后续迭代计划\n\n- [ ] v1.1.0: 支持 PDF/Word/Excel 解析（基于纯Python实现）\n- [ ] v1.2.0: 集成 LLM API 进行智能摘要和翻译\n- [ ] v1.3.0: 支持插件系统，允许自定义转换器\n- [ ] v2.0.0: Web UI 界面，支持在线文档处理\n\n### 📦 打包与部署\n\n```bash\n# 构建分发包\npython setup.py sdist bdist_wheel\n\n# 本地安装\npip install -e .\n\n# 运行测试\npytest tests/ -v\n\n# 代码格式化\nblack documind/ tests/ --line-length 100\n```\n\n### 🤝 贡献指南\n\n欢迎提交 Issue 和 PR！\n\n- 提交 Issue 请描述清楚问题和复现步骤\n- 提交 PR 请确保通过所有测试\n- 遵循 PEP 8 代码规范\n\n### 📄 开源协议\n\n本项目采用 [MIT 协议](LICENSE) 开源。\n\n---\n\n## English\n\n### 🎉 Introduction\n\nDocuMind-Converter is a **zero-dependency** lightweight AI document intelligent conversion and structured extraction engine, designed for developers, content creators, and data processing professionals.\n\n**Inspiration**: This project is inspired by Microsoft's [markitdown](https://github.com/microsoft/markitdown), but takes a completely different technical approach — we pursue **extreme lightweight** and **zero dependencies**, making document conversion free from complex dependency chains.\n\n**Core Values**:\n- 🚀 **Zero Dependency** - Pure Python standard library, no third-party packages needed\n- 🤖 **AI Smart Analysis** - Built-in keyword extraction, summary generation, entity recognition\n- 🔄 **Multi-format Conversion** - Markdown ↔ HTML ↔ JSON ↔ YAML ↔ Plain bidirectional conversion\n- 📊 **Structured Output** - Not just format conversion, but document structure extraction\n- 🖥️ **Interactive TUI** - Beautiful terminal interface with zero learning curve\n- 📁 **Batch Processing** - Folder batch conversion, wildcard matching, parallel processing\n\n### ✨ Features\n\n| Feature | Description | Status |\n|---------|-------------|--------|\n| 📝 **Multi-format** | Markdown/HTML/JSON/YAML/CSV/XML/RST/Org-mode | ✅ Supported |\n| 🧠 **Smart Analysis** | Keyword extraction, summary generation, readability analysis | ✅ Supported |\n| 🔍 **Entity Recognition** | Auto-detect emails, URLs, IPs, dates, versions | ✅ Supported |\n| 📑 **TOC Generation** | Automatic table of contents generation | ✅ Supported |\n| 🎨 **Themes** | Default/Minimal/Fancy output themes | ✅ Supported |\n| 📊 **Enhanced Reports** | Structured analysis reports with statistics | ✅ Supported |\n| 🖥️ **TUI Interface** | Interactive terminal menu-driven interface | ✅ Supported |\n| 📁 **Batch Processing** | Folder batch conversion with parallel processing | ✅ Supported |\n| 🔄 **Pipeline Mode** | Custom processing pipeline chains | ✅ Supported |\n| 🌐 **Bilingual** | Full Chinese and English content processing | ✅ Supported |\n\n### 🚀 Quick Start\n\n#### Requirements\n\n- **Python**: 3.10 or higher\n- **OS**: Windows / macOS / Linux\n\n#### Installation\n\n```bash\n# Install from source\ngit clone https://github.com/gitstq/documind-converter-v2.git\ncd documind-converter-v2\npip install -e .\n\n# Or use pip (coming soon)\npip install documind-converter\n```\n\n#### Basic Usage\n\n```bash\n# Single file conversion\ndocumind convert input.md -o output.html -f html\n\n# Batch conversion\ndocumind batch \"docs/*.md\" -o out/ -f json\n\n# Document analysis\ndocumind analyze document.md -o report.txt\n\n# View document info\ndocumind info document.md\n\n# Interactive TUI\ndocumind-tui\n```\n\n#### Python API\n\n```python\nfrom documind import DocumentConverter, StructureExtractor, BatchPipeline\n\n# Single file conversion\nconverter = DocumentConverter()\nresult = converter.convert('input.md', output_format='html', output_path='output.html')\n\n# Document analysis\nextractor = StructureExtractor()\nanalysis = extractor.analyze_document(open('doc.md').read())\nprint(f\"Keywords: {[kw[0] for kw in analysis['keywords'][:5]]}\")\nprint(f\"Summary: {analysis['summary']}\")\n\n# Batch conversion\npipeline = BatchPipeline()\nresults = pipeline.batch_convert('docs/*.md', 'output/', 'html')\n```\n\n### 📖 Detailed Guide\n\n#### CLI Commands\n\n```bash\n# Convert format\ndocumind convert input.md -o output.html -f html --theme fancy\n\n# Analyze document\ndocumind analyze paper.md -o analysis.report -f report\n\n# Batch processing\ndocumind batch \"**/*.md\" -o converted/ -f structured -j 8\n\n# View help\ndocumind --help\ndocumind convert --help\n```\n\n#### Supported Formats\n\n**Input**: `.md`, `.markdown`, `.txt`, `.html`, `.htm`, `.json`, `.yaml`, `.yml`, `.csv`, `.xml`, `.rst`, `.org`\n\n**Output**: `markdown`, `html`, `json`, `yaml`, `plain`, `structured`\n\n### 💡 Design \u0026 Roadmap\n\n#### Technical Choices\n\n- **Pure Standard Library**: Eliminate dependency hell, ensure out-of-box experience\n- **Modular Architecture**: Converter/Extractor/Formatter/Pipeline separation\n- **Rule + Statistics Hybrid**: Lightweight NLP without heavy ML frameworks\n\n#### Roadmap\n\n- [ ] v1.1.0: PDF/Word/Excel parsing (pure Python)\n- [ ] v1.2.0: LLM API integration for smart summarization\n- [ ] v1.3.0: Plugin system for custom converters\n- [ ] v2.0.0: Web UI for online document processing\n\n### 📦 Packaging \u0026 Deployment\n\n```bash\n# Build distribution\npython setup.py sdist bdist_wheel\n\n# Local install\npip install -e .\n\n# Run tests\npytest tests/ -v\n\n# Code formatting\nblack documind/ tests/ --line-length 100\n```\n\n### 🤝 Contributing\n\nIssues and PRs are welcome!\n\n- Describe issues clearly with reproduction steps\n- Ensure all tests pass before submitting PR\n- Follow PEP 8 code style\n\n### 📄 License\n\nThis project is open-sourced under the [MIT License](LICENSE).\n\n---\n\n## 繁體中文\n\n### 🎉 項目介紹\n\nDocuMind-Converter 是一款**零依賴**的輕量級AI文檔智能轉換與結構化提取引擎，專為開發者、內容創作者和數據處理專家設計。\n\n**核心價值**：\n- 🚀 **零依賴架構** - 純Python標準庫實現，無需安裝任何第三方包\n- 🤖 **AI智能分析** - 內置關鍵詞提取、摘要生成、實體識別等智能功能\n- 🔄 **多格式互轉** - 支持 Markdown ↔ HTML ↔ JSON ↔ YAML ↔ Plain 雙向轉換\n- 📊 **結構化輸出** - 不僅轉換格式，更提取文檔結構、生成目錄、識別關鍵信息\n- 🖥️ **交互式TUI** - 提供美觀的終端交互界面，零學習成本\n- 📁 **批量處理** - 支持文件夾批量轉換、通配符匹配、並行處理\n\n### ✨ 核心特性\n\n| 特性 | 描述 | 狀態 |\n|------|------|------|\n| 📝 **多格式支持** | Markdown/HTML/JSON/YAML/CSV/XML/RST/Org-mode | ✅ 已支持 |\n| 🧠 **智能分析** | 關鍵詞提取、摘要生成、可讀性分析 | ✅ 已支持 |\n| 🔍 **實體識別** | 自動識別郵箱、URL、IP、日期、版本號 | ✅ 已支持 |\n| 📑 **目錄生成** | 自動生成文檔目錄(TOC) | ✅ 已支持 |\n| 🎨 **多主題** | Default/Minimal/Fancy 三種輸出主題 | ✅ 已支持 |\n| 📊 **增強報告** | 結構化分析報告，包含統計和可讀性評分 | ✅ 已支持 |\n| 🖥️ **TUI界面** | 交互式終端界面，菜單驅動 | ✅ 已支持 |\n| 📁 **批量處理** | 文件夾批量轉換、並行處理 | ✅ 已支持 |\n| 🌐 **中英文支持** | 完整的中英文文檔內容處理 | ✅ 已支持 |\n\n### 🚀 快速開始\n\n#### 環境要求\n\n- **Python**: 3.10 或更高版本\n- **操作系統**: Windows / macOS / Linux\n\n#### 安裝\n\n```bash\n# 從源碼安裝\ngit clone https://github.com/gitstq/documind-converter-v2.git\ncd documind-converter-v2\npip install -e .\n```\n\n#### 基本使用\n\n```bash\n# 單文件轉換\ndocumind convert input.md -o output.html -f html\n\n# 批量轉換\ndocumind batch \"docs/*.md\" -o out/ -f json\n\n# 文檔分析\ndocumind analyze document.md -o report.txt\n\n# 交互式TUI界面\ndocumind-tui\n```\n\n#### Python API\n\n```python\nfrom documind import DocumentConverter, StructureExtractor\n\n# 單文件轉換\nconverter = DocumentConverter()\nresult = converter.convert('input.md', output_format='html')\n\n# 文檔分析\nextractor = StructureExtractor()\nanalysis = extractor.analyze_document(open('doc.md').read())\nprint(f\"關鍵詞: {[kw[0] for kw in analysis['keywords'][:5]]}\")\n```\n\n### 📄 開源協議\n\n本項目採用 [MIT 協議](LICENSE) 開源。\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n**Made with ❤️ by DocuMind Team**\n\n[GitHub](https://github.com/gitstq/documind-converter-v2) | [Issues](https://github.com/gitstq/documind-converter-v2/issues) | [License](LICENSE)\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgitstq%2Fdocumind-converter-v2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgitstq%2Fdocumind-converter-v2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgitstq%2Fdocumind-converter-v2/lists"}