https://github.com/mathsgod/semantic-splitter-php
This is a simple tool to split a text into sentences by semantic meaning. Each sentence will be grouped together if they have similar meaning. Each sentence should be separated by a newline character.
https://github.com/mathsgod/semantic-splitter-php
Last synced: 5 months ago
JSON representation
This is a simple tool to split a text into sentences by semantic meaning. Each sentence will be grouped together if they have similar meaning. Each sentence should be separated by a newline character.
- Host: GitHub
- URL: https://github.com/mathsgod/semantic-splitter-php
- Owner: mathsgod
- License: mit
- Created: 2024-07-15T04:42:31.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-17T08:48:12.000Z (almost 2 years ago)
- Last Synced: 2025-10-27T16:54:44.228Z (8 months ago)
- Language: PHP
- Homepage:
- Size: 16.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Semantic Text Splitter
This is a simple tool to split a text into sentences by semantic meaning. Each sentence will be grouped together if they have similar meaning. Each sentence should be separated by a newline character.
## Installation
```bash
composer require mathsgod/semantic-splitter-php
```
## Usage
```php
$splitter = new TextSplitter\SemanticTextSplitter(new MyEmbeddingRetriever());
$sentences= $splitter->split("I am a sentence.
I am another sentence.
I am a sentence that is a question?
這是一個中文句子。
這是另一個中文句子。
如果句子意思接近, 這個工具會把他們放在一起。");
print_r($sentences);
```
/// Output
```
Array
(
[0] => I am a sentence.
I am another sentence.
I am a sentence that is a question?
[1] => 這是一個中文句子。
這是另一個中文句子。
如果句子意思接近, 這個工具會把他們放在一起。
如果句子意思不接近, 這個工具會把他們分開。
作者: 陳大文
)
```
### Embedding retriever
Semantic Text Splitter requires an embedding retriever to work. You can implement your own retriever by implementing the `TextSplitter\EmbeddingRetrieverInterface` interface.
```php
class MyEmbeddingRetriever implements TextSplitter\EmbeddingRetrieverInterface
{
public function getEmbedding(string $text): array
{
// Implement your own embedding retriever here
// for example, you can use OpenAI to get the embedding of the text
return [0.1, 0.2, 0.3];
}
}
```