{"id":13448133,"url":"https://github.com/chardet/chardet","last_synced_at":"2026-04-07T21:01:56.695Z","repository":{"id":39674468,"uuid":"5196969","full_name":"chardet/chardet","owner":"chardet","description":"Python character encoding detector","archived":false,"fork":false,"pushed_at":"2026-03-31T03:20:11.000Z","size":25340,"stargazers_count":2560,"open_issues_count":0,"forks_count":291,"subscribers_count":45,"default_branch":"main","last_synced_at":"2026-04-03T00:39:43.342Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"0bsd","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chardet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"docs/supported-encodings.rst","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2012-07-26T20:30:54.000Z","updated_at":"2026-04-02T18:16:18.000Z","dependencies_parsed_at":"2024-04-28T01:57:25.963Z","dependency_job_id":"9c3761d7-1dc9-4ef5-b513-7bda80206f73","html_url":"https://github.com/chardet/chardet","commit_stats":{"total_commits":324,"total_committers":52,"mean_commits":6.230769230769231,"dds":0.6944444444444444,"last_synced_commit":"98b2acd6216e9a0fa4f47940b9f8adabdfd8aa8a"},"previous_names":[],"tags_count":35,"template":false,"template_full_name":null,"purl":"pkg:github/chardet/chardet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chardet%2Fchardet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chardet%2Fchardet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chardet%2Fchardet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chardet%2Fchardet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chardet","download_url":"https://codeload.github.com/chardet/chardet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chardet%2Fchardet/sbom","scorecard":{"id":274086,"data":{"date":"2025-08-11","repo":{"name":"github.com/chardet/chardet","commit":"8e8dfcd93c572c2cbe37585e01662a90b16fbab6"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.5,"checks":[{"name":"Code-Review","score":3,"reason":"Found 8/24 approved changesets -- score normalized to 3","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/test.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: GNU Lesser General Public License v2.1: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test.yml:13: update your workflow using https://app.stepsecurity.io/secureworkflow/chardet/chardet/test.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test.yml:15: update your workflow using https://app.stepsecurity.io/secureworkflow/chardet/chardet/test.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/test.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/chardet/chardet/test.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test.yml:35: update your workflow using https://app.stepsecurity.io/secureworkflow/chardet/chardet/test.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/test.yml:37: update your workflow using https://app.stepsecurity.io/secureworkflow/chardet/chardet/test.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/test.yml:42: update your workflow using https://app.stepsecurity.io/secureworkflow/chardet/chardet/test.yml/main?enable=pin","Info:   0 out of   4 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   2 third-party GitHubAction dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":4,"reason":"6 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-cpwx-vrp4-4pq7","Warn: Project is vulnerable to: GHSA-gmj6-6f8f-6699","Warn: Project is vulnerable to: GHSA-q2x7-8rv6-6q7h","Warn: Project is vulnerable to: GHSA-9hjg-9r4m-mvj7","Warn: Project is vulnerable to: GHSA-48p4-8xcf-vxj5","Warn: Project is vulnerable to: GHSA-pq67-6m6q-mj2v"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 18 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-17T14:02:44.251Z","repository_id":39674468,"created_at":"2025-08-17T14:02:44.251Z","updated_at":"2025-08-17T14:02:44.251Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31373621,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T17:53:18.093Z","status":"ssl_error","status_checked_at":"2026-04-03T17:53:17.617Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T05:01:36.607Z","updated_at":"2026-04-07T21:01:56.680Z","avatar_url":"https://github.com/chardet.png","language":"Python","readme":"# chardet\n\nUniversal character encoding detector.\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n[![Documentation](https://readthedocs.org/projects/chardet/badge/?version=latest)](https://chardet.readthedocs.io)\n[![codecov](https://codecov.io/github/chardet/chardet/branch/main/graph/badge.svg?token=m5ZQrMd3vk)](https://codecov.io/github/chardet/chardet)\n\nchardet 7.0 is a ground-up, MIT-licensed rewrite of [chardet](https://github.com/chardet/chardet).\nSame package name, same public API — drop-in replacement for chardet 5.x/6.x, just much faster and more accurate.\nPython 3.10+, zero runtime dependencies, works on PyPy.\n\n## Why chardet 7.0?\n\n**98.2% accuracy** on 2,510 test files. **46x faster** than chardet 6.0.0\nand **4.3x faster** than\ncharset-normalizer. **Language\ndetection** for every result. **MIT licensed.**\n\n|                        | chardet 7.0.2 (mypyc) | chardet 7.0.2 (pure) | chardet 6.0.0 | [charset-normalizer] |\n| ---------------------- | :--------------------: | :------------------: | :-----------: | :------------------: |\n| Accuracy (2,510 files) |       **98.2%**        |      **98.2%**       |     88.2%     |        84.2%         |\n| Speed                  |    **555 files/s**     |   **370 files/s**    |  12 files/s   |     130 files/s      |\n| Language detection     |       **95.1%**        |      **95.1%**       |     40.0%     |        59.0%         |\n| Peak memory            |     **26.2 MiB**       |    **26.3 MiB**      |   29.5 MiB    |      101.2 MiB       |\n| Streaming detection    |        **yes**         |       **yes**        |      yes      |          no          |\n| Encoding era filtering |        **yes**         |       **yes**        |      no       |          no          |\n| Supported encodings    |          99            |         99           |      84       |          99          |\n| License                |          MIT           |         MIT          |     LGPL      |         MIT          |\n\n[charset-normalizer]: https://github.com/jawah/charset_normalizer\n\n## Installation\n\n```bash\npip install chardet\n```\n\n## Quick Start\n\n```python\nimport chardet\n\n# Plain ASCII is reported as its superset Windows-1252 by default,\n# keeping with WHATWG guidelines for encoding detection.\nchardet.detect(b\"Hello, world!\")\n# {'encoding': 'Windows-1252', 'confidence': 1.0, 'language': 'en'}\n\n# UTF-8 with typographic punctuation\nchardet.detect(\"It\\u2019s a lovely day \\u2014 let\\u2019s grab coffee.\".encode(\"utf-8\"))\n# {'encoding': 'utf-8', 'confidence': 0.99, 'language': 'es'}\n\n# Japanese EUC-JP\nchardet.detect(\"これは日本語のテストです。文字コードの検出を行います。\".encode(\"euc-jp\"))\n# {'encoding': 'euc-jis-2004', 'confidence': 1.0, 'language': 'ja'}\n\n# Get all candidate encodings ranked by confidence\ntext = \"Le café est une boisson très populaire en France et dans le monde entier.\"\nresults = chardet.detect_all(text.encode(\"windows-1252\"))\nfor r in results:\n    print(r[\"encoding\"], r[\"confidence\"])\n# windows-1252 0.44\n# iso-8859-15 0.44\n# mac-roman 0.42\n# cp858 0.42\n```\n\n### Streaming Detection\n\nFor large files or network streams, use `UniversalDetector` to feed data incrementally:\n\n```python\nfrom chardet import UniversalDetector\n\ndetector = UniversalDetector()\nwith open(\"unknown.txt\", \"rb\") as f:\n    for line in f:\n        detector.feed(line)\n        if detector.done:\n            break\nresult = detector.close()\nprint(result)\n```\n\n### Encoding Era Filtering\n\nRestrict detection to specific encoding eras to reduce false positives:\n\n```python\nfrom chardet import detect_all\nfrom chardet.enums import EncodingEra\n\ndata = \"Москва является столицей Российской Федерации и крупнейшим городом страны.\".encode(\"windows-1251\")\n\n# All encoding eras are considered by default — 4 candidates across eras\nfor r in detect_all(data):\n    print(r[\"encoding\"], round(r[\"confidence\"], 2))\n# windows-1251 0.5\n# mac-cyrillic 0.47\n# kz-1048 0.22\n# ptcp154 0.22\n\n# Restrict to modern web encodings — 1 confident result\nfor r in detect_all(data, encoding_era=EncodingEra.MODERN_WEB):\n    print(r[\"encoding\"], round(r[\"confidence\"], 2))\n# windows-1251 0.5\n```\n\n## CLI\n\n```bash\nchardetect somefile.txt\n# somefile.txt: utf-8 with confidence 0.99\n\nchardetect --minimal somefile.txt\n# utf-8\n\n# Pipe from stdin\ncat somefile.txt | chardetect\n```\n\n## What's New in 7.0\n\n- **MIT license** (previous versions were LGPL)\n- **Ground-up rewrite** — 12-stage detection pipeline using BOM detection, structural probing, byte validity filtering, and bigram statistical models\n- **46x faster** than chardet 6.0.0 with mypyc (**31x** pure Python), **4.3x faster** than charset-normalizer\n- **98.2% accuracy** — +10.0pp vs chardet 6.0.0, +14.0pp vs charset-normalizer\n- **Language detection** — 95.1% accuracy across 49 languages, returned with every result\n- **99 encodings** — full coverage including EBCDIC, Mac, DOS, and Baltic/Central European families\n- **`EncodingEra` filtering** — scope detection to modern web encodings, legacy ISO/Mac/DOS, mainframe, or all\n- **Optional mypyc compilation** — 1.42x additional speedup on CPython\n- **Thread-safe** — `detect()` and `detect_all()` are safe to call concurrently; scales on free-threaded Python\n- **Same API** — `detect()`, `detect_all()`, `UniversalDetector`, and the `chardetect` CLI all work as before\n\n## Documentation\n\nFull documentation is available at [chardet.readthedocs.io](https://chardet.readthedocs.io).\n\n## License\n\n[MIT](LICENSE)\n","funding_links":[],"categories":["Python","Text Processing","资源列表","Data Format \u0026 I/O","文本处理","Text Processing [🔝](#readme)","Awesome Python","Text Data","🐍 Python"],"sub_categories":["文本处理","For Python","Text Processing","Useful Python Tools for Data Analysis"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchardet%2Fchardet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchardet%2Fchardet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchardet%2Fchardet/lists"}