{"id":20527081,"url":"https://github.com/flashlight/text","last_synced_at":"2025-04-09T13:07:35.156Z","repository":{"id":38328771,"uuid":"471137922","full_name":"flashlight/text","owner":"flashlight","description":"Text utilities, including beam search decoding, tokenizing, and more, built for use in Flashlight.","archived":false,"fork":false,"pushed_at":"2024-03-23T02:26:52.000Z","size":8206,"stargazers_count":61,"open_issues_count":5,"forks_count":13,"subscribers_count":13,"default_branch":"main","last_synced_at":"2024-03-23T03:17:40.022Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/flashlight.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-03-17T20:48:32.000Z","updated_at":"2024-04-14T19:35:00.500Z","dependencies_parsed_at":"2024-01-30T19:37:04.924Z","dependency_job_id":"fde13453-9785-4fce-87f7-476a42325690","html_url":"https://github.com/flashlight/text","commit_stats":{"total_commits":280,"total_committers":26,"mean_commits":10.76923076923077,"dds":0.6535714285714286,"last_synced_commit":"8282dc71bb2da531b876557169121ddfaa52f35c"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flashlight%2Ftext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flashlight%2Ftext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flashlight%2Ftext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flashlight%2Ftext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/flashlight","download_url":"https://codeload.github.com/flashlight/text/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248045231,"owners_count":21038553,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T23:17:13.636Z","updated_at":"2025-04-09T13:07:35.134Z","avatar_url":"https://github.com/flashlight.png","language":"C++","readme":"# Flashlight Text: Fast, Lightweight Utilities for Text\n\n[**Quickstart**](#quickstart)\n| [**Installation**](#building-and-installing)\n| [**Python Documentation**](bindings/python)\n| [**Citing**](#citing)\n\n[![CircleCI](https://circleci.com/gh/flashlight/text.svg?style=shield)](https://app.circleci.com/pipelines/github/flashlight/text) [![Join the chat at https://gitter.im/flashlight-ml/community](https://img.shields.io/gitter/room/flashlight-ml/community)](https://gitter.im/flashlight-ml/community?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge) [![PyPI](https://img.shields.io/pypi/v/flashlight-text?color=dark%20green)](https://pypi.org/project/flashlight-text/) [![PyPI - Format](https://img.shields.io/pypi/format/flashlight-text)](https://pypi.org/project/flashlight-text/#files) [![vcpkg](https://img.shields.io/vcpkg/v/flashlight-text)](https://vcpkg.link/ports/flashlight-text) [![Codecov](https://img.shields.io/codecov/c/github/flashlight/text)](https://codecov.io/gh/flashlight/text) [![GitHub](https://img.shields.io/github/license/flashlight/text?color=light%20green)](https://github.com/flashlight/text/blob/main/LICENSE)\n\n*Flashlight Text* is a fast, minimal library for text-based operations. It features:\n- a high-performance, unopinionated [beam search decoder](flashlight/lib/text/decoder)\n- a fast [tokenizer](flashlight/lib/text/tokenizer)\n- an efficient [`Dictionary`](flashlight/lib/text/dictionary) abstraction\n\n## Quickstart\n\nThe Flashlight Text Python package containing beam search decoder and Dictionary components is available on PyPI:\n```bash\npip install flashlight-text\n```\nTo enable optional KenLM support in Python with the decoder, KenLM must be installed via pip:\n```bash\npip install git+https://github.com/kpu/kenlm.git\n```\n\nSee the [full Python binding documentation](bindings/python) for examples and more.\n\n## Building and Installing\n[**From Source (C++)**](#building-from-source) | [**With `vcpkg` (C++)**](#with-vcpkg) | [**From Source (Python)**](bindings/python#build-instructions) | [**Adding to Your Own Project (C++)**](#adding-flashlight-text-to-a-c-project)\n\n### Requirements\nAt minimum, C++ compilation requires:\n- A C++ compiler with good C++17 support (e.g. gcc/g++ \u003e= 7)\n- [CMake](https://cmake.org/) — version 3.16 or later, and ``make``\n- A Linux-based operating system.\n\n**KenLM Support:** If building with KenLM support, [KenLM](https://github.com/kpu/kenlm/) is required. To toggle KenLM support use the `FL_TEXT_USE_KENLM` CMake option or the `USE_KENLM` environment variable when building the Python bindings.\n\n**Tests:** If building tests, [Google Test](https://github.com/google/googletest) \u003e= 1.10 is required. The `FL_TEXT_BUILD_TESTS` CMake option toggles building tests.\n\nInstructions for building/installing the Python bindings from source [can be found here](bindings/python/README.md).\n\n### Building from Source\n\nBuilding the C++ project from source is simple:\n```bash\ngit clone https://github.com/flashlight/text \u0026\u0026 cd text\ncmake -S . -B build\ncmake --build build --parallel\ncd build \u0026\u0026 ctest \u0026\u0026 cd .. # run tests\ncmake --install build # install at the CMAKE_INSTALL_PREFIX\n```\nTo disable KenLM while building, pass `-DFL_TEXT_USE_KENLM=OFF` to CMake. To disable building tests, pass `-DFL_TEXT_BUILD_TESTS=OFF`.\n\nKenLM can be downloaded and installed automatically if not found on the local system. The `FL_TEXT_BUILD_STANDALONE` option controls this behavior — if disabled, dependencies won't be downloaded and built when building.\n\n#### With [`vcpkg`](https://vcpkg.io/)\n\nFlashlight Text can also be installed and used downstream with the [`vcpkg`](https://vcpkg.io/) package manager. The [port](https://github.com/microsoft/vcpkg/blob/master/ports/flashlight-text/) contains an optional feature with which to build and install with KenLM support:\n```bash\nvcpkg install flashlight-text # no dependencies, or:\nvcpkg install \"flashlight-text[kenlm]\" # install with KenLM\n```\n\n### Adding Flashlight Text to a C++ Project\n\nGiven a simple `project.cpp` file that includes and links to Flashlight Text:\n```c++\n#include \u003ciostream\u003e\n\n#include \u003cflashlight/lib/text/dictionary/Dictionary.h\u003e\n\nint main() {\n  fl::lib::text::Dictionary myDict(\"someFile.dict\");\n  std::cout \u003c\u003c \"Dictionary has \" \u003c\u003c myDict.entrySize()\n            \u003c\u003c \" entries.\"  \u003c\u003c std::endl;\n return 0;\n}\n```\n\nThe following CMake configuration links Flashlight and sets include directories:\n\n```cmake\ncmake_minimum_required(VERSION 3.10)\nset(CMAKE_CXX_STANDARD 17)\nset(CMAKE_CXX_STANDARD_REQUIRED ON)\n\nadd_executable(myProject project.cpp)\n\nfind_package(flashlight-text CONFIG REQUIRED)\ntarget_link_libraries(myProject PRIVATE flashlight::flashlight-text)\n```\n\nTo link against the library providing KenLM support, use the `flashlight::flashlight-text-kenlm` imported target:\n```cmake\ntarget_link_libraries(myProject\n  PRIVATE\n  flashlight::flashlight-text\n  # transitively links KenLM\n  flashlight::flashlight-text-kenlm\n)\n```\n\n### Contributing and Contact\nContact: jacobkahn@meta.com\n\nFlashlight Text is actively developed. See\n[CONTRIBUTING](CONTRIBUTING.md) for more on how to help out.\n\n## Citing\nYou can cite [Flashlight](https://arxiv.org/abs/2201.12465) using:\n```\n@misc{kahn2022flashlight,\n      title={Flashlight: Enabling Innovation in Tools for Machine Learning},\n      author={Jacob Kahn and Vineel Pratap and Tatiana Likhomanenko and Qiantong Xu and Awni Hannun and Jeff Cai and Paden Tomasello and Ann Lee and Edouard Grave and Gilad Avidov and Benoit Steiner and Vitaliy Liptchinsky and Gabriel Synnaeve and Ronan Collobert},\n      year={2022},\n      eprint={2201.12465},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n\n## License\nFlashlight Text is under an MIT license. See [LICENSE](LICENSE) for more information.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflashlight%2Ftext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflashlight%2Ftext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflashlight%2Ftext/lists"}