{"id":45355985,"url":"https://github.com/glample/fastbpe","last_synced_at":"2026-02-21T13:01:17.131Z","repository":{"id":37752147,"uuid":"109518935","full_name":"glample/fastBPE","owner":"glample","description":"Fast BPE","archived":false,"fork":false,"pushed_at":"2024-06-18T09:02:52.000Z","size":28,"stargazers_count":678,"open_issues_count":37,"forks_count":102,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-11-10T06:01:58.929Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/glample.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-04T18:16:57.000Z","updated_at":"2025-10-13T14:24:04.000Z","dependencies_parsed_at":"2022-09-01T04:12:45.878Z","dependency_job_id":null,"html_url":"https://github.com/glample/fastBPE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/glample/fastBPE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glample%2FfastBPE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glample%2FfastBPE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glample%2FfastBPE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glample%2FfastBPE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/glample","download_url":"https://codeload.github.com/glample/fastBPE/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glample%2FfastBPE/sbom","scorecard":{"id":429250,"data":{"date":"2025-08-11","repo":{"name":"github.com/glample/fastBPE","commit":"036711f8fdc3265d64e8e123a0761be12c5a8e74"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.3,"checks":[{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Code-Review","score":2,"reason":"Found 4/16 approved changesets -- score normalized to 2","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 5 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-19T02:56:10.816Z","repository_id":37752147,"created_at":"2025-08-19T02:56:10.816Z","updated_at":"2025-08-19T02:56:10.816Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29681468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T12:30:22.644Z","status":"ssl_error","status_checked_at":"2026-02-21T12:29:55.402Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-21T13:01:16.205Z","updated_at":"2026-02-21T13:01:17.111Z","avatar_url":"https://github.com/glample.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# fastBPE\n\nC++ implementation of [Neural Machine Translation of Rare Words with Subword Units](https://arxiv.org/abs/1508.07909), with Python API.\n\n## Installation\n\nCompile with:\n```\ng++ -std=c++11 -pthread -O3 fastBPE/main.cc -IfastBPE -o fast\n```\n\n## Usage:\n\n### List commands\n```\n./fast\nusage: fastbpe \u003ccommand\u003e \u003cargs\u003e\n\nThe commands supported by fastBPE are:\n\ngetvocab input1 [input2]             extract the vocabulary from one or two text files\nlearnbpe nCodes input1 [input2]      learn BPE codes from one or two text files\napplybpe output input codes [vocab]  apply BPE codes to a text file\napplybpe_stream codes [vocab]        apply BPE codes to stdin and outputs to stdout\n```\n\nfastBPE also supports stdin inputs. For instance, these two commands are equivalent:\n```\n./fast getvocab text \u003e vocab\ncat text | ./fast getvocab - \u003e vocab\n```\nBut the first one will memory map the input file to read it efficiently, which can be more than twice faster than stdin on very large files. Similarly, these two commands are equivalent:\n```\n./fast applybpe output input codes vocab\ncat input | ./fast applybpe_stream codes vocab \u003e output\n```\nAlthough the first one will be significantly faster on large datasets, as it uses multi-threading to pre-compute the BPE splits of all words in the input file.\n\n### Learn codes\n```\n./fast learnbpe 40000 train.de train.en \u003e codes\n```\n\n### Apply codes to train\n```\n./fast applybpe train.de.40000 train.de codes\n./fast applybpe train.en.40000 train.en codes\n```\n\n### Get train vocabulary\n```\n./fast getvocab train.de.40000 \u003e vocab.de.40000\n./fast getvocab train.en.40000 \u003e vocab.en.40000\n```\n\n### Apply codes to valid and test\n```\n./fast applybpe valid.de.40000 valid.de codes vocab.de.40000\n./fast applybpe valid.en.40000 valid.en codes vocab.en.40000\n./fast applybpe test.de.40000  test.de  codes vocab.de.40000\n./fast applybpe test.en.40000  test.en  codes vocab.en.40000\n```\n\n## Python API\n\nTo install the Python API, simply run:\n```bash\npython setup.py install\n```\n\n**Note:** For Mac OSX Users, add `export MACOSX_DEPLOYMENT_TARGET=10.x` (x=9 or 10, depending on your version) or `-stdlib=libc++` to the `extra_compile_args` of `setup.py` before/during the above install command, as appropriate.\n\nCall the API using:\n\n```python\nimport fastBPE\n\nbpe = fastBPE.fastBPE(codes_path, vocab_path)\nbpe.apply([\"Roasted barramundi fish\", \"Centrally managed over a client-server architecture\"])\n\n\u003e\u003e ['Ro@@ asted barr@@ am@@ un@@ di fish', 'Centr@@ ally managed over a cli@@ ent-@@ server architecture']\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglample%2Ffastbpe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fglample%2Ffastbpe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglample%2Ffastbpe/lists"}