{"id":29092221,"url":"https://github.com/lijianqiao/table_data","last_synced_at":"2026-04-29T14:02:13.594Z","repository":{"id":301084147,"uuid":"1008110457","full_name":"lijianqiao/table_data","owner":"lijianqiao","description":"基于 Streamlit 和 Polars 构建的专业数据表处理工具，提供数据合并、清理、验证和导出功能。","archived":false,"fork":false,"pushed_at":"2025-06-25T03:50:07.000Z","size":0,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-06-25T04:33:03.178Z","etag":null,"topics":["datatables","polars","python","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lijianqiao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-25T03:42:03.000Z","updated_at":"2025-06-25T03:50:10.000Z","dependencies_parsed_at":"2025-06-25T04:33:09.280Z","dependency_job_id":"4608294a-0f06-40f0-a9ab-c3a52e6ea234","html_url":"https://github.com/lijianqiao/table_data","commit_stats":null,"previous_names":["lijianqiao/table_data"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lijianqiao/table_data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lijianqiao%2Ftable_data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lijianqiao%2Ftable_data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lijianqiao%2Ftable_data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lijianqiao%2Ftable_data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lijianqiao","download_url":"https://codeload.github.com/lijianqiao/table_data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lijianqiao%2Ftable_data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32428622,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T13:34:34.882Z","status":"ssl_error","status_checked_at":"2026-04-29T13:34:29.830Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datatables","polars","python","streamlit"],"created_at":"2025-06-28T07:04:01.815Z","updated_at":"2026-04-29T14:02:13.564Z","avatar_url":"https://github.com/lijianqiao.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 数据表处理系统\n\n基于 Streamlit 和 Polars 构建的专业数据表处理工具，提供数据合并、清理、验证和导出功能。\n\n## 🚀 功能特性\n\n- **📊 数据合并**: 支持多个 CSV/Excel 文件的智能合并\n- **🔍 数据预览**: 实时数据预览和统计信息展示\n- **🎯 字段选择**: 灵活的字段选择和导出配置\n- **⚙️ 数据预处理**: 支持去重、数据清理等预处理功能\n- **📤 数据导出**: 高性能的 Excel 格式导出\n- **🔧 模块化架构**: 易于扩展的插件化设计\n- **🎨 清晰UI**: 侧边栏应用选择，主区域功能展示\n- **🔄 组件独立**: 各应用内部独立管理文件上传、预处理等功能\n\n## 🏗️ 系统架构\n\n```\ntable_data/\n│\n├── app/                        # 应用核心代码\n│   ├── run.py                  # 应用主编排器（依赖注入、服务注册）\n│   ├── base/                   # 基础接口定义\n│   ├── core/                   # 核心服务层（容器、注册管理、全局服务）\n│   ├── state/                  # 状态管理\n│   ├── handlers/               # 业务处理层\n│   ├── components/             # 可复用UI组件\n│   ├── ui/                     # 主UI界面（侧边栏+主内容区）\n│   ├── merge_extract/          # 数据合并应用（内含文件上传、预处理）\n│   └── utils/                  # 通用工具\n├── config/                     # 配置管理\n├── main.py                     # 项目主入口\n└── pyproject.toml              # 项目配置\n```\n\n## 🛠️ 技术栈\n\n- **前端框架**: Streamlit - 快速构建数据应用\n- **数据处理**: Polars - 高性能数据处理库\n- **文件处理**: 支持 CSV、Excel (.xlsx/.xls) 格式\n- **架构模式**: 依赖注入、策略模式、组件化设计\n\n## 📦 安装与运行\n\n### 环境要求\n\n- Python \u003e= 3.13\n- Windows 10/11 (当前配置)\n\n### 安装依赖\n\n```bash\n# 安装项目依赖\npip install -e .\n\n# 或者直接安装依赖\npip install streamlit polars openpyxl pandas pyarrow\n```\n\n### 启动应用\n\n```bash\n# 方式一：直接运行主入口\nstreamlit run main.py\n\n# 方式二：使用 Python 模块方式\npython -m streamlit run main.py\n\n# 方式三：使用 uv 模式启动\nuv run streamlit run main.py\n```\n\n## 💡 使用指南\n\n### 基本使用流程\n\n1. **选择应用**: 在左侧边栏选择\"数据合并\"应用\n2. **上传文件**: 在\"文件上传\"标签页中上传一个或多个 CSV/Excel 文件\n3. **配置预处理**: 在同一标签页中可选择启用去重、数据清理等功能\n4. **预览数据**: 在\"数据预览\"标签页查看合并后的数据概览和统计信息\n5. **选择字段**: 在\"字段选择\"标签页选择需要导出的数据字段\n6. **导出数据**: 在\"导出数据\"标签页生成并下载 Excel 格式的处理结果\n\n### UI界面说明\n\n- **侧边栏**: 应用选择器 + 项目介绍和使用说明\n- **主内容区**: 根据选择的应用显示相应功能界面\n- **标签页设计**: 每个应用内部使用标签页组织不同功能模块\n- **独立操作**: 每个应用独立管理其所需的文件上传、预处理等功能\n\n### 支持的文件格式\n\n- **CSV 文件**: .csv\n- **Excel 文件**: .xlsx, .xls\n- **多工作表**: 自动读取 Excel 文件的所有工作表\n\n### 数据处理功能\n\n- **智能合并**: 基于共同列自动合并多个数据表\n- **数据清理**: 移除空行、清理字符串空格\n- **去重处理**: 移除重复数据行\n- **列标准化**: 自动标准化列名格式\n\n## 🔧 扩展开发\n\n### 添加新应用\n\n1. 在 `app/` 目录下创建新的应用模块\n2. 继承 `BaseApp` 接口并实现必要方法\n3. 在 `AppOrchestrator` 中注册新应用\n\n```python\nfrom app.base.base_app import BaseApp\n\nclass YourNewApp(BaseApp):\n    def get_name(self) -\u003e str:\n        return \"新应用名称\"\n    \n    def get_description(self) -\u003e str:\n        return \"应用描述\"\n    \n    def render(self) -\u003e None:\n        # 实现UI渲染逻辑\n        pass\n    \n    def validate_input(self, data) -\u003e bool:\n        # 实现数据验证逻辑\n        return True\n```\n\n### 添加新组件\n\n在 `app/components/` 目录下创建新的UI组件，遵循组件化设计原则。\n\n## 🐛 故障排除\n\n### 常见问题\n\n1. **导入错误**: 确保已安装所有依赖项\n2. **文件读取失败**: 检查文件格式和编码\n3. **内存不足**: 对于大文件，建议启用数据清理和去重功能\n\n### 日志查看\n\n应用运行时的日志信息会显示在 Streamlit 界面中，便于调试和故障排除。\n\n## 📄 许可证\n\nMIT License - 详见 [LICENSE](LICENSE) 文件\n\n## 👥 贡献\n\n欢迎提交 Issue 和 Pull Request 来改进这个项目！\n\n---\n\n**作者**: lijianqiao  \n**邮箱**: lijianqiao2906@live.com  \n**版本**: 1.0.0","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flijianqiao%2Ftable_data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flijianqiao%2Ftable_data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flijianqiao%2Ftable_data/lists"}