{"id":16881483,"url":"https://github.com/swhl/textrecmetric","last_synced_at":"2025-07-20T09:33:02.152Z","repository":{"id":233431374,"uuid":"787152719","full_name":"SWHL/TextRecMetric","owner":"SWHL","description":"Compute the metric of text recognition algorithm.","archived":false,"fork":false,"pushed_at":"2024-04-23T13:59:20.000Z","size":27,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-09T21:50:26.933Z","etag":null,"topics":["accuracy-metrics","crnn","text-recognition"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SWHL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-04-16T01:40:05.000Z","updated_at":"2025-05-26T15:46:48.000Z","dependencies_parsed_at":"2024-04-23T14:51:15.000Z","dependency_job_id":null,"html_url":"https://github.com/SWHL/TextRecMetric","commit_stats":null,"previous_names":["swhl/textrecmetric"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/SWHL/TextRecMetric","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SWHL%2FTextRecMetric","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SWHL%2FTextRecMetric/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SWHL%2FTextRecMetric/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SWHL%2FTextRecMetric/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SWHL","download_url":"https://codeload.github.com/SWHL/TextRecMetric/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SWHL%2FTextRecMetric/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265203986,"owners_count":23727473,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accuracy-metrics","crnn","text-recognition"],"created_at":"2024-10-13T16:02:46.328Z","updated_at":"2025-07-20T09:33:02.069Z","avatar_url":"https://github.com/SWHL.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cdiv align=\"center\"\u003e\n    \u003ch1\u003e\u003cb\u003eText Recognition Metric\u003c/b\u003e\u003c/h1\u003e\n  \u003c/div\u003e\n\n\u003ca href=\"\"\u003e\u003cimg src=\"https://img.shields.io/badge/OS-Linux%2C%20Win%2C%20Mac-pink.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"\"\u003e\u003cimg src=\"https://img.shields.io/badge/python-\u003e=3.6,\u003c3.12-aff.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pypi.org/project/text_rec_metric/\"\u003e\u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/text_rec_metric\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pepy.tech/project/text_rec_metric\"\u003e\u003cimg src=\"https://static.pepy.tech/personalized-badge/text_rec_metric?period=total\u0026units=abbreviation\u0026left_color=grey\u0026right_color=blue\u0026left_text=Downloads \"\u003e\u003c/a\u003e\n\u003ca href=\"https://semver.org/\"\u003e\u003cimg alt=\"SemVer2.0\" src=\"https://img.shields.io/badge/SemVer-2.0-brightgreen\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/psf/black\"\u003e\u003cimg src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"\u003e\u003c/a\u003e\n\n\u003c/div\u003e\n\n\n### 简介\n该库用于计算`Exact Match`和`Char Match`两个指标，用来快速评测文本识别算法效果，与[text_rec_test_dataset](https://huggingface.co/datasets/SWHL/text_rec_test_dataset)配套使用。\n\n指标计算代码参考：[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/667fda88ed16dd25be2a79723a71846de3f9bb90/ppocr/metrics/rec_metric.py#L22)\n\n### 整体框架\n```mermaid\nflowchart LR\n\nA([Text Recognition Algorithm]) --get_pred_txt.py--\u003e B([pred_txt])\nB --compute_metric.py--\u003e C([TextRecMetric]) --\u003e D([ExactMatch])\nC --\u003e E([CharMatch])\n```\n\n### 指定数据集上评测\n如果想要评测其他文本识别算法，需要将预测结果写入`pred.txt`中，格式为`预测文本\\t真实文本\\t耗时`，详细可参考[link](./pred.txt)。示例如下：\n```text\n动漫\t动漫\t0.665647029876709\n上网\t上网\t0.6647390524546305\n华茂\t华茂\t0.6621260245641073\n```\n\n### 示例（评测`rapidocr_onnxruntime==1.3.16`）\n1. 安装运行环境\n    ```bash\n    pip install rapidocr_onnxruntime==1.3.16\n    pip install datasets\n    pip install text_rec_metric\n    ```\n2. 获得`pred.txt`文本文件\n    ```python\n    from pathlib import Path\n\n    import cv2\n    import numpy as np\n    from datasets import load_dataset\n    from rapidocr_onnxruntime import RapidOCR\n    from tqdm import tqdm\n\n    engine = RapidOCR()\n\n    dataset = load_dataset(\"SWHL/text_rec_test_dataset\")\n    test_data = dataset[\"test\"]\n\n    content = []\n    for i, one_data in enumerate(tqdm(test_data)):\n        img = np.array(one_data.get(\"image\"))\n        img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)\n\n        result, elapse = engine(img, use_det=False, use_cls=False, use_rec=True)\n        if result is None:\n            rec_res = \"\"\n            elapse = 0\n        else:\n            rec_res, elapse = result[0]\n\n        gt = one_data.get(\"label\", None)\n        content.append(f\"{rec_res}\\t{gt}\\t{elapse}\")\n\n    with open(\"pred.txt\", \"w\", encoding=\"utf-8\") as f:\n        for v in content:\n            f.write(f\"{v}\\n\")\n    ```\n3. 计算指标\n    ```python\n    from text_rec_metric import TextRecMetric\n\n    metric = TextRecMetric()\n\n    pred_path = \"pred.txt\"\n    metric = metric(pred_path)\n    print(metric)\n    ```\n4. 得到结果\n    ```bash\n    {'ExactMatch': 0.8323, 'CharMatch': 0.9355, 'avg_elapse': 0.6836}\n    ```\n\n\n### 指标说明\n#### Exact Match (精确匹配准确率)\n$$\nExact\\ Match = \\frac{1}{N}\\sum_{i=0}^{N} s(p_{i}, g_{i})\n$$\n\n$$\ns(p_{i}, g_{i})  = \\begin{cases}\n    1 \u0026 \\text{if } p_{i} = g_{i} \\\\\n    0 \u0026 \\text{otherwise }\n\\end{cases}\n$$\n\n\n- $N$: 总的文本行个数\n- $p_{i}$: 第 $i$ 条文本行识别结果\n- $g_{i}$: 第 $i$ 条文本行对应的真实标签\n\n#### Char Match (字符级准确率)\n$$\nChar\\ Match = 1 - \\frac{1}{N} \\sum_{i=0}^{N} s(p_{i}, g_{i})\n$$\n\n$$\ns(p_{i}, g_{i}) = 1 - NL(p_{i}, g_{i})\n$$\n\n$$\nNL(p_{i}, g_{i}) = \\frac{Levenshtein(p_{i}, g_{i})}{\\max \\big(len(p_{i}), len(g_{i}) \\big)}\n$$\n\n- $N$: 总的文本行个数\n- $p_{i}$: 第 $i$ 条文本行识别结果\n- $g_{i}$: 第 $i$ 条文本行对应的真实标签\n- $Levenshtein(x, y)$: 求字符串 $x$ 和字符串 $y$ 的编辑距离\n- $max(x, y)$: 求 $x$ 和 $y$ 的最大值\n- $len(x)$: 求所给字符串 $x$ 的长度\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fswhl%2Ftextrecmetric","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fswhl%2Ftextrecmetric","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fswhl%2Ftextrecmetric/lists"}