https://github.com/teragrep/blf_01
Tokenizer for Teragrep
https://github.com/teragrep/blf_01
java teragrep tokenization tokenizer unstructured-data
Last synced: about 1 year ago
JSON representation
Tokenizer for Teragrep
- Host: GitHub
- URL: https://github.com/teragrep/blf_01
- Owner: teragrep
- License: agpl-3.0
- Created: 2023-01-17T10:30:13.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-02T14:33:31.000Z (over 1 year ago)
- Last Synced: 2025-04-22T12:12:10.788Z (about 1 year ago)
- Topics: java, teragrep, tokenization, tokenizer, unstructured-data
- Language: Java
- Homepage: https://teragrep.com
- Size: 9.17 MB
- Stars: 0
- Watchers: 2
- Forks: 4
- Open Issues: 4
-
Metadata Files:
- Readme: README.adoc
- License: LICENSE
Awesome Lists containing this project
README
= BLF_01
Tokenizer used to split extremely large inputs into major and minor tokens with pre-set delimiters (splitters)
== Features
* Fast tokenization of large inputs
* Tokenization splits input into major and minor tokens
* Permutations are generated from major tokens
* Configurable delimiters for major and minor tokens (character or pattern)
== Documentation
See the official documentation on https://docs.teragrep.com[docs.teragrep.com].
== Limitations
Uses Java version 1.8 other versions might not work correctly.
Expects InputStream as input for tokenization.
== How to [compile/use/implement]
See tests for how to implement.
Import the Tokenizer class.
== Contributing
You can involve yourself with our project by https://github.com/teragrep/blf_01/issues/new/choose[opening an issue] or submitting a pull request.
Contribution requirements:
. *All changes must be accompanied by a new or changed test.* If you think testing is not required in your pull request, include a sufficient explanation as why you think so.
. Security checks must pass
. Pull requests must align with the principles and http://www.extremeprogramming.org/values.html[values] of extreme programming.
. Pull requests must follow the principles of Object Thinking and Elegant Objects (EO).
Read more in our https://github.com/teragrep/teragrep/blob/main/contributing.adoc[Contributing Guideline].
=== Contributor License Agreement
Contributors must sign https://github.com/teragrep/teragrep/blob/main/cla.adoc[Teragrep Contributor License Agreement] before a pull request is accepted to organization's repositories.
You need to submit the CLA only once. After submitting the CLA you can contribute to all Teragrep's repositories.