Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/IngestAI/embedditor

⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.
https://github.com/IngestAI/embedditor

datapreprocessing datascience embedding-vectors embeddings genai laravel llm markup-language ml nlp nltk php vector-database vector-search vectorization veml

Last synced: about 2 months ago
JSON representation

⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.

Awesome Lists containing this project

README

        







Embedditor


Embedditor is the open-source MS Word equivalent for embedding that helps you get the most out of your vector search.

[![PHP version](https://img.shields.io/badge/PHP%208.2-brightgreen)](http://php.org)
[![Laravel version](https://img.shields.io/badge/Laravel%2010.x-green.svg)](https://conventionalcommits.org)


Website
Discord
Twitter
Documentation
Try demo on IngestAI

# Get the most out of your vector search

Embedditor is an open source embedding pre-reprocessing editor, that helps you edit GPT / LLM embeddings just as if it's a Microsoft Word document, so you can get the most out of your vector search, while significanty reducing costs of embedding and vector storage.

# Join Our Community



[![Stargazers repo roster for @embedditor/embedditor](https://reporoster.com/stars/embedditor/embedditor)](https://github.com/embedditor/embedditor/stargazers)

# Features
**Rich editor Interface**

- ⚡ Join and split one or multiple chunks with a few clicks
- ⚡ Edit embedding metadata and tokens
- ⚡ Exclude words, sentences, or even parts of chunks from embedding
- ⚡ Select the parts of chunk you want to be embedded
- ⚡ Add additional information to your mebeddings, like url links or images
- ⚡ Get a nice looking HTML-markup for your AI search results
- ⚡ Save your pre-processed embedding files in .veml or .jason formats

**Pre-processing automation**
- ⚡ Filteer our from vectorization most of the 'noise', like punctuations or stop-words
- ⚡ Remove from embedidng unsignificant, requently used words with TF-IDF algorithm
- ⚡ Normalize your embedding tokens before vectorization

# Benefits
**Rich Spreadsheet Interface**

- ⚡ Optimized relevance of the content retrieved from a vector database
- ⚡ Improved efficiency and accuracy in your AI / LLM-related applications
- ⚡ Visually better looking search results with images, url links, etc
- ⚡ Increased cost-efficiency with up to 30% cost-reduction on embedding and vector storage
- ⚡ Full control over your data, effortlessly deploying Embedditor locally on your PC or dedicated envirement
- ⚡ Save your pre-processed or ready embeddings in .json or .veml format to use it in LangChain, Chromat or any other Vector DB

## Quick try
**Sign up for free and try it in [IngestAI](https://ingestai.io/signup).**

# GUI

Access Dashboard using: [http://localhost:8080/](http://localhost:8080/)

# Screenshots

![1](https://embedditor.ai/images/embedditor_ui_01.png)
![2](https://embedditor.ai/images/embedditor_ui_02.png)
![3](https://embedditor.ai/images/embedditor_ui_03.png)
![4](https://embedditor.ai/images/embedditor_ui_04.png)

## Installation

1. Copy .env.example into .env

2. Set the following settings in the .env

`OPENAI_API_KEY=`

3. Setup the project

- `php artisan migrate`
- `php artisan db:seed`
- `php artisan storage:link`