Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/anki-code/tokenize-output
Get identifiers, names, paths, URLs and words from the terminal command output.
https://github.com/anki-code/tokenize-output
console output shell terminal tokenization tokenizer
Last synced: about 1 month ago
JSON representation
Get identifiers, names, paths, URLs and words from the terminal command output.
- Host: GitHub
- URL: https://github.com/anki-code/tokenize-output
- Owner: anki-code
- License: bsd-2-clause
- Created: 2020-05-22T19:39:27.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-04-12T09:37:39.000Z (over 1 year ago)
- Last Synced: 2024-09-17T15:34:03.193Z (2 months ago)
- Topics: console, output, shell, terminal, tokenization, tokenizer
- Language: Python
- Homepage:
- Size: 38.1 KB
- Stars: 6
- Watchers: 2
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
Get identifiers, names, paths, URLs and words from the command output.
The xontrib-output-search for xonsh shell is using this library.
If you like the idea click ⭐ on the repo and stay tuned by watching releases.## Install
```shell script
pip install -U tokenize-output
```## Usage
You can use `tokenize-output` command as well as export the tokenizers in Python:
```python
from tokenize_output.tokenize_output import *
tokenizer_split("Hello world!")
# {'final': set(), 'new': {'Hello', 'world!'}}
```#### Words tokenizing
```shell script
echo "Try https://github.com/xxh/xxh" | tokenize-output -p
# Try
# https://github.com/xxh/xxh
```#### JSON, Python dict and JavaScript object tokenizing
```shell script
echo '{"Try": "xonsh shell"}' | tokenize-output -p
# Try
# shell
# xonsh
# xonsh shell
```#### env tokenizing
```shell script
echo 'PATH=/one/two:/three/four' | tokenize-output -p
# /one/two
# /one/two:/three/four
# /three/four
# PATH
```## Development
### Tokenizers
Tokenizer is a functions which extract tokens from the text.| Priority | Tokenizer | Text example | Tokens |
| ---------| ---------- | ----- | ------ |
| 1 | **dict** | `{"key": "val as str"}` | `key`, `val as str` |
| 2 | **env** | `PATH=/bin:/etc` | `PATH`, `/bin:/etc`, `/bin`, `/etc` |
| 3 | **split** | `Split me \n now!` | `Split`, `me`, `now!` |
| 4 | **strip** | `{Hello}!.` | `Hello` |You can create your tokenizer and add it to `tokenizers_all` in `tokenize_output.py`.
Tokenizing is a recursive process where every tokenizer returns `final` and `new` tokens.
The `final` tokens directly go to the result list of tokens. The `new` tokens go to all
tokenizers again to find new tokens. As result if there is a mix of json and env data
in the output it will be found and tokenized in appropriate way.### How to add tokenizer
You can start from `env` tokenizer:
1. [Prepare regexp](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L10)
2. [Prepare tokenizer function](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L57-L70)
3. [Add the function to the list](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L139-L144) and [to the preset](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tokenize_output/tokenize_output.py#L147).
4. [Add test](https://github.com/tokenizer/tokenize-output/blob/25b930cfadf8291e72a72144962e411e47d28139/tests/test_tokenize.py#L34-L35).
5. Now you can test and debug (see below).### Test and debug
Run tests:
```shell script
cd ~
git clone https://github.com/anki-code/tokenize-output
cd tokenize-output
python -m pytest tests/
```
To debug the tokenizer:
```shell script
echo "Hello world" | ./tokenize-output -p
```## Related projects
* [xontrib-output-search][XONTRIB_OUTPUT_SEARCH] for [xonsh shell][XONSH][XONTRIB_OUTPUT_SEARCH]: https://github.com/anki-code/xontrib-output-search
[XONSH]: https://xon.sh/