https://github.com/schneiderfelipe/chat-splitter
Split chat messages by maximum chat completion token count
https://github.com/schneiderfelipe/chat-splitter
ai artificial-intelligence chat chatgpt gpt-4 nlp openai split text tiktoken tokenizer
Last synced: about 1 year ago
JSON representation
Split chat messages by maximum chat completion token count
- Host: GitHub
- URL: https://github.com/schneiderfelipe/chat-splitter
- Owner: schneiderfelipe
- License: mit
- Created: 2023-07-11T18:17:45.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-20T23:48:52.000Z (about 2 years ago)
- Last Synced: 2025-05-31T04:12:05.121Z (about 1 year ago)
- Topics: ai, artificial-intelligence, chat, chatgpt, gpt-4, nlp, openai, split, text, tiktoken, tokenizer
- Language: Rust
- Homepage: https://schneiderfelipe.github.io/posts/chat-splitter-first-release/
- Size: 21.5 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# chat-splitter
[![Build Status]][actions]
[![Latest Version]][crates.io]
[![Documentation]][docs.rs]
[Build Status]: https://github.com/schneiderfelipe/chat-splitter/actions/workflows/rust.yml/badge.svg
[actions]: https://github.com/schneiderfelipe/chat-splitter/actions/workflows/rust.yml
[Latest Version]: https://img.shields.io/crates/v/chat_splitter.svg
[crates.io]: https://crates.io/crates/chat_splitter
[Documentation]: https://img.shields.io/docsrs/chat-splitter
[docs.rs]: https://docs.rs/chat-splitter
> For more information,
> please refer to the [blog announcement](https://schneiderfelipe.github.io/posts/chat-splitter-first-release/).
When utilizing the [`async_openai`](https://github.com/64bit/async-openai) [Rust](https://www.rust-lang.org/) crate,
it is crucial to ensure that you do not exceed
the [maximum number of tokens](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them) specified by [OpenAI](https://openai.com/)'s [chat models](https://platform.openai.com/docs/api-reference/chat).
[`chat-splitter`](https://crates.io/crates/chat_splitter) categorizes chat messages into 'outdated' and 'recent' messages,
allowing you to split them based on both the maximum
message count and the maximum chat completion token count.
The token counting functionality is provided by
[`tiktoken_rs`](https://github.com/zurawiki/tiktoken-rs).
## Usage
Here's a basic example:
```rust
// Get all your previously stored chat messages...
let mut stored_messages = /* get_stored_messages()? */;
// ...and split into 'outdated' and 'recent',
// where 'recent' always fits the context size.
let (outdated_messages, recent_messages) =
ChatSplitter::default().split(&stored_messages);
```
For a more detailed example,
see [`examples/chat.rs`](https://github.com/schneiderfelipe/chat-splitter/blob/main/examples/chat.rs).
## Contributing
Contributions to `chat-splitter` are welcome!
If you find a bug or have a feature request,
please [submit an issue](https://github.com/schneiderfelipe/chat-splitter/issues).
If you'd like to contribute code,
please feel free to [submit a pull request](https://github.com/schneiderfelipe/chat-splitter/pulls).
License: MIT