Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/entelecheia/lexikanon
A HyFI plugin for Tokenizers
https://github.com/entelecheia/lexikanon
hyfi hyfi-plugins tokenizer
Last synced: 17 days ago
JSON representation
A HyFI plugin for Tokenizers
- Host: GitHub
- URL: https://github.com/entelecheia/lexikanon
- Owner: entelecheia
- License: mit
- Created: 2023-04-20T00:07:16.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-27T18:36:50.000Z (23 days ago)
- Last Synced: 2024-10-27T22:44:26.814Z (23 days ago)
- Topics: hyfi, hyfi-plugins, tokenizer
- Language: Python
- Homepage: https://lexikanon.entelecheia.ai/
- Size: 6.07 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Lexikanon: A HyFI-based library for Tokenizers
[![pypi-image]][pypi-url]
[![version-image]][release-url]
[![release-date-image]][release-url]
[![license-image]][license-url]
[![DOI][zenodo-image]][zenodo-url]
[![codecov][codecov-image]][codecov-url]
[![jupyter-book-image]][docs-url]A HyFI-based library for the creation, training, and utilization of tokenizers.
- Documentation: [https://lexikanon.entelecheia.ai][docs-url]
- GitHub: [https://github.com/entelecheia/lexikanon][repo-url]
- PyPI: [https://pypi.org/project/lexikanon][pypi-url]Lexikanon is a high-performance Python library specifically engineered for the creation, training, and utilization of tokenizers, which are fundamental components in both natural language processing (NLP) and artificial intelligence (AI). Drawing its name from the Greek words λέξη (meaning "word") and κάνων (meaning "maker"), Lexikanon encapsulates its primary purpose of enabling users to develop robust tokenizers tailored for different languages and specific tasks. Built on the [Hydra Fast Interface (HyFI)](https://hyfi.entelecheia.ai) framework, Lexikanon stands as a HyFI-based library. This makes it seamlessly pluggable into any HyFI-oriented project, although it can also function as a standalone library.
## Citation
```tex
@software{lee_2023_8248118,
author = {Young Joon Lee},
title = {Lexikanon: A HyFI-based library for Tokenizers},
month = aug,
year = 2023,
publisher = {Zenodo},
version = {v0.6.2},
doi = {10.5281/zenodo.8248117},
url = {https://doi.org/10.5281/zenodo.8248117}
}
``````tex
@software{lee_2023_hyfi,
author = {Young Joon Lee},
title = {Lexikanon: A HyFI-based library for Tokenizers},
year = 2023,
publisher = {GitHub},
url = {https://github.com/entelecheia/lexikanon}
}
```## Changelog
See the [CHANGELOG] for more information.
## Contributing
Contributions are welcome! Please see the [contributing guidelines] for more information.
## License
This project is released under the [MIT License][license-url].
[zenodo-image]: https://zenodo.org/badge/DOI/10.5281/zenodo.8248117.svg
[zenodo-url]: https://doi.org/10.5281/zenodo.8248117
[codecov-image]: https://codecov.io/gh/entelecheia/lexikanon/branch/main/graph/badge.svg?token=KGST5XVW3F
[codecov-url]: https://codecov.io/gh/entelecheia/lexikanon
[pypi-image]: https://img.shields.io/pypi/v/lexikanon
[license-image]: https://img.shields.io/github/license/entelecheia/lexikanon
[license-url]: https://github.com/entelecheia/lexikanon/blob/main/LICENSE
[version-image]: https://img.shields.io/github/v/release/entelecheia/lexikanon?sort=semver
[release-date-image]: https://img.shields.io/github/release-date/entelecheia/lexikanon
[release-url]: https://github.com/entelecheia/lexikanon/releases
[jupyter-book-image]: https://jupyterbook.org/en/stable/_images/badge.svg
[repo-url]: https://github.com/entelecheia/lexikanon
[pypi-url]: https://pypi.org/project/lexikanon
[docs-url]: https://lexikanon.entelecheia.ai
[changelog]: https://github.com/entelecheia/lexikanon/blob/main/CHANGELOG.md
[contributing guidelines]: https://github.com/entelecheia/lexikanon/blob/main/CONTRIBUTING.md