https://github.com/guuzaa/tc
đ A simple and efficient token count program written in Rust!
https://github.com/guuzaa/tc
rust tokencounter unicode-characters wordcounter
Last synced: 6 months ago
JSON representation
đ A simple and efficient token count program written in Rust!
- Host: GitHub
- URL: https://github.com/guuzaa/tc
- Owner: guuzaa
- License: mit
- Created: 2024-09-08T12:16:38.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-14T13:18:41.000Z (almost 2 years ago)
- Last Synced: 2024-09-14T23:37:17.249Z (almost 2 years ago)
- Topics: rust, tokencounter, unicode-characters, wordcounter
- Language: Rust
- Homepage:
- Size: 38.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# đ Token Count (tc) đĻ
A simple and efficient token count program written in Rust! đ
English | [įŽäŊ䏿](docs/README-zh-CN.md) | [įšéĢ䏿](docs/README-zh-TW.md) | [æĨæŦčĒ](docs/README-ja-JP.md) | [íęĩė´](docs/README-ko-KR.md) | [Deutsch](docs/README-de-DE.md)
## đ Description
This Rust implementation of the classic `wc` (word count) command-line tool allows you to count lines, words, characters, and even tokens in text files or from standard input. It's fast, reliable, and supports Unicode! đâ¨
## đ¯ Features
- Count lines đ
- Count words đ¤
- Count characters (including multi-byte Unicode characters) đĄ
- Count tokens using various tokenizer models đĸ
- Process multiple files đ
- Read from standard input đĨī¸
- Supports various languages (English, Korean, Japanese, and more!) đ
## đ ī¸ Installation
There are two ways to install tc:
### Option 1: Install from source
1. Make sure you have Rust installed on your system. If not, get it from [rust-lang.org](https://www.rust-lang.org/tools/install) đĻ
2. Clone this repository:
```
git clone https://github.com/guuzaa/tc.git
cd tc
```
3. Build the project:
```
cargo build --release
```
4. The executable will be available at `target/release/tc`
### Option 2: Install pre-built binaries
1. Go to the [Releases page](https://github.com/guuzaa/tc/releases) of the tc repository.
2. Download the latest release for your operating system and architecture.
3. Extract the downloaded archive.
4. Move the `tc` executable to a directory in your system's PATH (e.g., `/usr/local/bin` on Unix-like systems).
5. You can now use tc from anywhere in your terminal!
## đ Usage
### Options:
- `-l, --lines`: Show line count đ
- `-w, --words`: Show word count đ¤
- `-c, --chars`: Show character count đĄ
- `-t, --tokens`: Show token count đĸ
- `--model `: Choose tokenizer model (default: gpt3)
Available models:
- `gpt3`: r50k_base
- `edit`: p50k_edit
- `code`: p50k_base
- `chatgpt`: cl100k_base
- `gpt4o`: o200k_base
If no options are specified, all counts (lines, words, characters, and tokens) will be shown.
### Examples:
1. Count lines, words, and characters in a file:
```
tc example.txt
```
2. Count only words in multiple files:
```
tc -w file1.txt file2.txt file3.txt
```
3. Count lines and characters from standard input:
```
echo "Hello, World!" | tc -lc
```
4. Count tokens using the ChatGPT tokenizer:
```
tc -t --model chatgpt example.txt
```
5. Count everything in files with different languages:
```
tc english.txt korean.txt japanese.txt
```
## đ¤ Contributing
Contributions are welcome! Feel free to submit issues or pull requests. đ
## đ License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. đ
## đ Acknowledgements
- The Rust community for their amazing tools and support đĻâ¤ī¸
- The original Unix `wc` command for inspiration đĨī¸
- The editor Cursor đ¤
Happy counting! đđđ