{"id":17993889,"url":"https://github.com/timandy/table_ocr","last_synced_at":"2025-10-14T12:05:08.132Z","repository":{"id":97064554,"uuid":"153886389","full_name":"timandy/table_ocr","owner":"timandy","description":"表格印刷文字识别","archived":false,"fork":false,"pushed_at":"2018-11-24T03:32:58.000Z","size":3570,"stargazers_count":10,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-21T01:34:25.300Z","etag":null,"topics":["ocr"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timandy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-20T08:43:23.000Z","updated_at":"2024-12-12T03:23:19.000Z","dependencies_parsed_at":"2023-03-13T16:19:44.997Z","dependency_job_id":null,"html_url":"https://github.com/timandy/table_ocr","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timandy%2Ftable_ocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timandy%2Ftable_ocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timandy%2Ftable_ocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timandy%2Ftable_ocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timandy","download_url":"https://codeload.github.com/timandy/table_ocr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245576522,"owners_count":20638123,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr"],"created_at":"2024-10-29T20:13:21.983Z","updated_at":"2025-10-14T12:05:03.079Z","avatar_url":"https://github.com/timandy.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--自述文件--\u003e\n# 表格 OCR\n识别印刷表格内容\n\n## 编译\n- 集成开发环境为 `IDEA`, 建议升级为最新版本, 低版本可能无法编译启动\n- 如果 `IDEA` 没有自动设置 `jdk`, 需要自己手动设置 `File/Project Structure/Project/Project SDK` 选择 `1.8`\n- 在 `IDEA` 中安装 `lombok` 插件, `File/Settings/Plugins`\n- 启用 `lombok` 插件, `File/Settings/Build, Execution, Deployment/Compiler/Annotation Processors` 勾选 `Enable annotation processing`\n- 设置 `gradle`, `File/Settings/Build, Execution, Deployment/Build Tools/Gradle` 选中 `Use default gradle wrapper (recommended)`\n\n## 辅助线要求\n- 颜色: `RGB(255,0,0)`(红色)\n- 宽度: `[1px-3px]`\n- PhotoShop: 不能开启抗锯齿(打开抗锯齿无法识别)\n- 辅助线可以不横平竖直,但是必须画到边\n\n## 代码运行\n- 运行程序\n- 访问 [传送门](http://localhost:8080/) 上传图片识别\n- 访问 [传送门](http://localhost:8080/demo) 查看内置示例\n\n## 前置条件\n- 原图识别会出现部分列连在一起无法分割的情况\n- 拍摄内容非横平竖直不易区分行列\n- 字过大识别率较低,可能与阿里识别服务未使用较大字体训练有关\n\n## 实现思路\n- 调整图片大小保证字体大小适中\n- 画辅助线,根据辅助线将图片分成大小不等的矩形\n- 调用阿里云文本识别服务,返回文字和所在矩形(**阿里云账号信息为个人账号,请勿用于生产**)\n- 将识别结果派分到划分的矩形中\n- 由于英文逗号会被识别为中文逗号,分派时将中文逗号替换为英文逗号\n- 为了剔除干扰识别结果,分派数据时只分派了面积最大的文本矩形,会导致一个单元格多行的只能识别最长的行\n\n## 阿里云文本识别服务购买地址\n\n\n## 示例图片识别结果\n![ocr](img/ocr.png \"识别结果\")\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimandy%2Ftable_ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimandy%2Ftable_ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimandy%2Ftable_ocr/lists"}