https://github.com/corani/go-semantic-splitter
https://github.com/corani/go-semantic-splitter
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/corani/go-semantic-splitter
- Owner: corani
- Created: 2024-09-10T09:17:26.000Z (9 months ago)
- Default Branch: master
- Last Pushed: 2024-09-10T09:32:27.000Z (9 months ago)
- Last Synced: 2025-02-02T05:41:33.617Z (4 months ago)
- Language: Go
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# go-semantic-splitter
This is a Go implementation of the rolling window semantic splitter algorithm. The algorithm splits
a document into sentences then calculates an embedding vector for each of them. It calculates the
similarity between each sentence and the mean of the preceding window (of size 5), then uses this
information to do the final chunking. This takes into account the semantic similarity and the
min/max chunk size.## Usage
```bash
./build.sh
./bin/splitter
```## Resources
- [aurelio-labs/semantic-router](https://github.com/aurelio-labs/semantic-router/blob/main/semantic_router/splitters/rolling_window.py)