{"id":13475057,"url":"https://github.com/huggingface/tokenizers","last_synced_at":"2025-10-14T15:30:03.035Z","repository":{"id":37358254,"uuid":"219035799","full_name":"huggingface/tokenizers","owner":"huggingface","description":"💥 Fast State-of-the-Art Tokenizers optimized for Research and Production","archived":false,"fork":false,"pushed_at":"2025-09-19T09:46:10.000Z","size":14115,"stargazers_count":10110,"open_issues_count":110,"forks_count":970,"subscribers_count":126,"default_branch":"main","last_synced_at":"2025-09-30T18:02:31.864Z","etag":null,"topics":["bert","gpt","language-model","natural-language-processing","natural-language-understanding","nlp","transformers"],"latest_commit_sha":null,"homepage":"https://huggingface.co/docs/tokenizers","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huggingface.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-11-01T17:52:20.000Z","updated_at":"2025-09-30T14:34:48.000Z","dependencies_parsed_at":"2023-02-16T23:55:26.422Z","dependency_job_id":"c55c8980-521b-4cb9-b365-71af96208429","html_url":"https://github.com/huggingface/tokenizers","commit_stats":{"total_commits":1729,"total_committers":112,"mean_commits":15.4375,"dds":0.4598033545401966,"last_synced_commit":"eb4cc86d4ef63b46d15ced15b24b86d9e6fc7dcf"},"previous_names":[],"tags_count":150,"template":false,"template_full_name":null,"purl":"pkg:github/huggingface/tokenizers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftokenizers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftokenizers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftokenizers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftokenizers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huggingface","download_url":"https://codeload.github.com/huggingface/tokenizers/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Ftokenizers/sbom","scorecard":{"id":472294,"data":{"date":"2025-08-11","repo":{"name":"github.com/huggingface/tokenizers","commit":"ed2cda51e66546ad1c6c816ea3d412d9d50e2327"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":5.3,"checks":[{"name":"Code-Review","score":7,"reason":"Found 20/26 approved changesets -- score normalized to 7","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":10,"reason":"26 commit(s) and 19 issue activity found in the last 90 days -- score normalized to 10","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: jobLevel 'contents' permission set to 'write': .github/workflows/CI.yml:165","Info: topLevel 'contents' permission set to 'read': .github/workflows/CI.yml:19","Warn: no topLevel permission defined: .github/workflows/build_documentation.yml:1","Warn: no topLevel permission defined: .github/workflows/build_pr_documentation.yml:1","Warn: no topLevel permission defined: .github/workflows/delete_doc_comment.yml:1","Warn: no topLevel permission defined: .github/workflows/delete_doc_comment_trigger.yml:1","Warn: no topLevel permission defined: .github/workflows/docs-check.yml:1","Warn: no topLevel permission defined: .github/workflows/node-release.yml:1","Warn: no topLevel permission defined: .github/workflows/node.yml:1","Warn: no topLevel permission defined: .github/workflows/python-release.yml:1","Warn: no topLevel permission defined: .github/workflows/python.yml:1","Warn: no topLevel permission defined: .github/workflows/rust-release.yml:1","Warn: no topLevel permission defined: .github/workflows/rust.yml:1","Warn: no topLevel permission defined: .github/workflows/stale.yml:1","Warn: no topLevel permission defined: .github/workflows/trufflehog.yml:1","Warn: no topLevel permission defined: .github/workflows/upload_pr_documentation.yml:1"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:125: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:126: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/CI.yml:130: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:136: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:144: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/CI.yml:146: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:151: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:169: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:171: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/CI.yml:176: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:40: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:41: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/CI.yml:45: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:52: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:71: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:72: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/CI.yml:76: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:83: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:98: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:99: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/CI.yml:104: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/CI.yml:110: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/CI.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/build_documentation.yml:13: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/build_documentation.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/build_pr_documentation.yml:12: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/build_pr_documentation.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/delete_doc_comment.yml:11: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/delete_doc_comment.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/delete_doc_comment_trigger.yml:10: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/delete_doc_comment_trigger.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/docs-check.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/docs-check.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/docs-check.yml:17: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/docs-check.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docs-check.yml:25: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/docs-check.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/docs-check.yml:36: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/docs-check.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node-release.yml:29: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node-release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/node-release.yml:32: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node-release.yml:39: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node-release.yml:45: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node-release.yml:62: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node-release.yml:66: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node-release.yml:77: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node-release.yml:79: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node-release.yml:89: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:18: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/node.yml:21: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:29: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/node.yml:35: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/node.yml:47: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/node.yml:53: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/node.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-release.yml:155: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-release.yml:158: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-release.yml:163: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-release.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-release.yml:100: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-release.yml:103: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python-release.yml:111: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-release.yml:127: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-release.yml:136: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python-release.yml:137: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-release.yml:143: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:25: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python.yml:37: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:43: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python.yml:57: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:61: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:67: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python.yml:73: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python.yml:80: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:92: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:99: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:108: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/python.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/rust-release.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust-release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust-release.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/rust-release.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust-release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/rust.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust.yml:34: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust.yml:40: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust.yml:46: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust.yml:52: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust.yml:58: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust.yml:64: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust.yml:78: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust.yml:85: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/rust.yml:94: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/rust.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/stale.yml:10: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/stale.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/trufflehog.yml:11: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/trufflehog.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/upload_pr_documentation.yml:11: update your workflow using https://app.stepsecurity.io/secureworkflow/huggingface/tokenizers/upload_pr_documentation.yml/main?enable=pin","Warn: pipCommand not pinned by hash: .github/workflows/docs-check.yml:23","Warn: pipCommand not pinned by hash: .github/workflows/docs-check.yml:30","Warn: pipCommand not pinned by hash: .github/workflows/python-release.yml:109","Warn: pipCommand not pinned by hash: .github/workflows/python.yml:118","Warn: pipCommand not pinned by hash: .github/workflows/python.yml:119","Warn: pipCommand not pinned by hash: .github/workflows/python.yml:120","Info:   0 out of  49 GitHub-owned GitHubAction dependencies pinned","Info:   1 out of  37 third-party GitHubAction dependencies pinned","Info:   0 out of   6 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Packaging","score":10,"reason":"packaging workflow detected","details":["Info: Project packages its releases by way of GitHub Actions.: .github/workflows/rust-release.yml:12"],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 28 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":4,"reason":"6 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-968p-4wvh-cqc8","Warn: Project is vulnerable to: GHSA-v6h2-p8h4-qcjw","Warn: Project is vulnerable to: GHSA-grv7-fg5c-xmjg","Warn: Project is vulnerable to: GHSA-3xgq-45jj-v275","Warn: Project is vulnerable to: GHSA-952p-6rrq-rcjv","Warn: Project is vulnerable to: GHSA-cxjh-pqwp-8mfp"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-19T14:15:42.841Z","repository_id":37358254,"created_at":"2025-08-19T14:15:42.841Z","updated_at":"2025-08-19T14:15:42.841Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279019316,"owners_count":26086711,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","gpt","language-model","natural-language-processing","natural-language-understanding","nlp","transformers"],"created_at":"2024-07-31T16:01:17.025Z","updated_at":"2025-10-14T15:30:03.029Z","avatar_url":"https://github.com/huggingface.png","language":"Rust","readme":"\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"https://huggingface.co/landing/assets/tokenizers/tokenizers-logo.png\" width=\"600\"/\u003e\n    \u003cbr\u003e\n\u003cp\u003e\n\u003cp align=\"center\"\u003e\n    \u003cimg alt=\"Build\" src=\"https://github.com/huggingface/tokenizers/workflows/Rust/badge.svg\"\u003e\n    \u003ca href=\"https://github.com/huggingface/tokenizers/blob/main/LICENSE\"\u003e\n        \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/github/license/huggingface/tokenizers.svg?color=blue\u0026cachedrop\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pepy.tech/project/tokenizers\"\u003e\n        \u003cimg src=\"https://pepy.tech/badge/tokenizers/week\" /\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\nProvides an implementation of today's most used tokenizers, with a focus on performance and\nversatility.\n\n## Main features:\n\n - Train new vocabularies and tokenize, using today's most used tokenizers.\n - Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes\n   less than 20 seconds to tokenize a GB of text on a server's CPU.\n - Easy to use, but also extremely versatile.\n - Designed for research and production.\n - Normalization comes with alignments tracking. It's always possible to get the part of the\n   original sentence that corresponds to a given token.\n - Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.\n\n## Performances\nPerformances can vary depending on hardware, but running the [~/bindings/python/benches/test_tiktoken.py](bindings/python/benches/test_tiktoken.py) should give the following on a g6 aws instance:\n![image](https://github.com/user-attachments/assets/2b913d4b-e488-4cbc-b542-f90a6c40643d)\n\n\n## Bindings\n\nWe provide bindings to the following languages (more to come!):\n  - [Rust](https://github.com/huggingface/tokenizers/tree/main/tokenizers) (Original implementation)\n  - [Python](https://github.com/huggingface/tokenizers/tree/main/bindings/python)\n  - [Node.js](https://github.com/huggingface/tokenizers/tree/main/bindings/node)\n  - [Ruby](https://github.com/ankane/tokenizers-ruby) (Contributed by @ankane, external repo)\n\n## Installation\n\nYou can install from source using:\n```bash\npip install git+https://github.com/huggingface/tokenizers.git#subdirectory=bindings/python\n```\n\nor install the released versions with\n\n```bash\npip install tokenizers\n```\n \n## Quick example using Python:\n\nChoose your model between Byte-Pair Encoding, WordPiece or Unigram and instantiate a tokenizer:\n\n```python\nfrom tokenizers import Tokenizer\nfrom tokenizers.models import BPE\n\ntokenizer = Tokenizer(BPE())\n```\n\nYou can customize how pre-tokenization (e.g., splitting into words) is done:\n\n```python\nfrom tokenizers.pre_tokenizers import Whitespace\n\ntokenizer.pre_tokenizer = Whitespace()\n```\n\nThen training your tokenizer on a set of files just takes two lines of codes:\n\n```python\nfrom tokenizers.trainers import BpeTrainer\n\ntrainer = BpeTrainer(special_tokens=[\"[UNK]\", \"[CLS]\", \"[SEP]\", \"[PAD]\", \"[MASK]\"])\ntokenizer.train(files=[\"wiki.train.raw\", \"wiki.valid.raw\", \"wiki.test.raw\"], trainer=trainer)\n```\n\nOnce your tokenizer is trained, encode any text with just one line:\n```python\noutput = tokenizer.encode(\"Hello, y'all! How are you 😁 ?\")\nprint(output.tokens)\n# [\"Hello\", \",\", \"y\", \"'\", \"all\", \"!\", \"How\", \"are\", \"you\", \"[UNK]\", \"?\"]\n```\n\nCheck the [documentation](https://huggingface.co/docs/tokenizers/index)\nor the [quicktour](https://huggingface.co/docs/tokenizers/quicktour) to learn more!\n","funding_links":[],"categories":["Rust","Libraries","🔧 Supporting Tools","库 Libraries","🤗 Official Libraries","Natural Language Processing","Uncategorized","其他_NLP自然语言处理","Implementations","Tools and Utilities","Other Versions of YOLO","Summary","Serving","Frameworks","Machine Learning","Official Resources","nlp","文本数据和NLP","🔹 **SentencePiece Implementations**","Libraries related to the Natural Language Processing","Machine Learning \u0026 AI"],"sub_categories":["Artificial Intelligence","Unified API \u0026 Cost Management","人工智能 Artificial Intelligence","General Purpose NLP","Uncategorized","其他_文本生成、文本对话","Large Model Serving","General-Purpose Machine Learning","Natural Language Processing"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Ftokenizers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuggingface%2Ftokenizers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Ftokenizers/lists"}