{"id":13490315,"url":"https://github.com/IngestAI/embedditor","last_synced_at":"2025-03-28T06:30:46.075Z","repository":{"id":168558285,"uuid":"642841279","full_name":"IngestAI/embedditor","owner":"IngestAI","description":"⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens +  remove stop-words and punctuation with one click,  add images, and download in .veml to share it with your team.","archived":false,"fork":false,"pushed_at":"2023-11-21T16:28:56.000Z","size":1826,"stargazers_count":221,"open_issues_count":2,"forks_count":15,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-10-31T03:35:34.401Z","etag":null,"topics":["datapreprocessing","datascience","embedding-vectors","embeddings","genai","laravel","llm","markup-language","ml","nlp","nltk","php","vector-database","vector-search","vectorization","veml"],"latest_commit_sha":null,"homepage":"https://embedditor.ai","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IngestAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-19T13:17:04.000Z","updated_at":"2024-10-21T10:53:52.000Z","dependencies_parsed_at":"2024-10-31T03:41:59.355Z","dependency_job_id":null,"html_url":"https://github.com/IngestAI/embedditor","commit_stats":null,"previous_names":["embedditor/embedditor","ingestai/embedditor"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IngestAI%2Fembedditor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IngestAI%2Fembedditor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IngestAI%2Fembedditor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IngestAI%2Fembedditor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IngestAI","download_url":"https://codeload.github.com/IngestAI/embedditor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245984257,"owners_count":20704787,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datapreprocessing","datascience","embedding-vectors","embeddings","genai","laravel","llm","markup-language","ml","nlp","nltk","php","vector-database","vector-search","vectorization","veml"],"created_at":"2024-07-31T19:00:44.664Z","updated_at":"2025-03-28T06:30:44.543Z","avatar_url":"https://github.com/IngestAI.png","language":"PHP","readme":"\u003ch1 align=\"center\" style=\"border-bottom: none\"\u003e\n    \u003cdiv\u003e\n        \u003ca href=\"https://embedditor.ai\"\u003e\n            \u003cimg src=\"https://embedditor.ingestai.co/images/logo.jpg\" width=\"80\" /\u003e\n            \u003cbr\u003e\n            Embedditor\n        \u003c/a\u003e\n    \u003c/div\u003e\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003eEmbedditor is the open-source MS Word equivalent for embedding that helps you get the most out of your vector search.\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![PHP version](https://img.shields.io/badge/PHP%208.2-brightgreen)](http://php.org)\n[![Laravel version](https://img.shields.io/badge/Laravel%2010.x-green.svg)](https://conventionalcommits.org)\n\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://embedditor.ai\"\u003e\u003cb\u003eWebsite\u003c/b\u003e\u003c/a\u003e •\n    \u003ca href=\"https://discord.gg/7gF8dVv86E\"\u003e\u003cb\u003eDiscord\u003c/b\u003e\u003c/a\u003e •\n    \u003ca href=\"https://twitter.com/embedditor\"\u003e\u003cb\u003eTwitter\u003c/b\u003e\u003c/a\u003e •\n    \u003ca href=\"https://github.com/embedditor/embedditor/wiki/Embedditor-Docs\"\u003e\u003cb\u003eDocumentation\u003c/b\u003e\u003c/a\u003e •\n    \u003ca href=\"https://ingestai.io\"\u003e\u003cb\u003eTry demo on IngestAI\u003c/b\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n# Get the most out of your vector search\n\nEmbedditor is an open source embedding pre-reprocessing editor, that helps you edit GPT / LLM embeddings just as if it's a Microsoft Word document, so you can get the most out of your vector search, while significanty reducing costs of embedding and vector storage.\n\n# Join Our Community\n\n\u003ca href=\"https://discord.gg/7gF8dVv86E\" target=\"_blank\"\u003e\n\u003cimg src=\"https://discordapp.com/api/guilds/978672019442905198/widget.png?style=banner3\" alt=\"\"\u003e\n\u003c/a\u003e\n\n[![Stargazers repo roster for @embedditor/embedditor](https://reporoster.com/stars/embedditor/embedditor)](https://github.com/embedditor/embedditor/stargazers)\n\n# Features\n**Rich editor Interface**\n\n- ⚡ Join and split one or multiple chunks with a few clicks\n- ⚡ Edit embedding metadata and tokens\n- ⚡ Exclude words, sentences, or even parts of chunks from embedding\n- ⚡ Select the parts of chunk you want to be embedded\n- ⚡ Add additional information to your mebeddings, like url links or images\n- ⚡ Get a nice looking HTML-markup for your AI search results\n- ⚡ Save your pre-processed embedding files in .veml or .jason formats\n\n**Pre-processing automation**\n- ⚡ Filteer our from vectorization most of the 'noise', like punctuations or stop-words\n- ⚡ Remove from embedidng unsignificant, requently used words with TF-IDF algorithm\n- ⚡ Normalize your embedding tokens before vectorization\n\n# Benefits\n**Rich Spreadsheet Interface**\n\n- ⚡ Optimized relevance of the content retrieved from a vector database\n- ⚡ Improved efficiency and accuracy in your AI / LLM-related applications\n- ⚡ Visually better looking search results with images, url links, etc\n- ⚡ Increased cost-efficiency with up to 30% cost-reduction on embedding and vector storage\n- ⚡ Full control over your data, effortlessly deploying Embedditor locally on your PC or dedicated envirement\n- ⚡ Save your pre-processed or ready embeddings in .json or .veml format to use it in LangChain, Chromat or any other Vector DB\n\n\n## Quick try\n**Sign up for free and try it in [IngestAI](https://ingestai.io/signup).**\n\n# GUI\n\nAccess Dashboard using: [http://localhost:8080/](http://localhost:8080/)\n\n# Screenshots\n\n![1](https://embedditor.ai/images/embedditor_ui_01.png)\n![2](https://embedditor.ai/images/embedditor_ui_02.png)\n![3](https://embedditor.ai/images/embedditor_ui_03.png)\n![4](https://embedditor.ai/images/embedditor_ui_04.png)\n\n\u003c!-- # Rich Spreadsheet Interface\n\n- ⚡ **Basic Operations**: Create, Read, Update and Delete Tables, Columns, and Rows\n- ⚡ **Fields Operations**: Sort, Filter, Hide / Unhide Columns\n- ⚡ **Multiple Views Types**: Grid (By default), Gallery, Form View, and Kanban View\n- ⚡ **View Permissions Types**: Collaborative Views, \u0026 Locked Views\n- ⚡ **Share Bases / Views**: either Public or Private (with Password Protected)\n- ⚡ **Variant Cell Types**: ID, LinkToAnotherRecord, Lookup, Rollup, SingleLineText, Attachment, Currency, Formula, etc\n- ⚡ **Access Control with Roles**: Fine-grained Access Control at different levels\n- ⚡ **and more** --\u003e\n\n\u003c!-- ### FAQ\n\n**What is embedding (vectorization)?**\n\n**What are embeddings?**\n\n**What is vector search?**\n\n**What is embeddings metadata?**\n\n**What is embedding tokens?**\n\n**What is void embedding tokens?**\nA void (embedding) tokens are words in your content (embedding metadata), that will appear in your vector search results but are filtered out of embedding and so won’t be found with vector search.\n\n**What is hidden embedding token?**\nA hidden embedding token is a token that will be embedded for vector storage but doesn’t appear in your metadata – the content you will retrieve using vector search.\n\n**What size have embeddings?**\nEmbedding your content to vector space increases its size, requiring up to 10X of storage space than your row content. That is why filtering out unnecessary and low-relevant tokens not only improves your vector search but also helps you reduce cost of embedding and storage. --\u003e\n\n\n## Installation\n\n1. Copy .env.example into .env\n\n2. Set the following settings in the .env\n\n\n    `OPENAI_API_KEY=`\n\n\n3. Setup the project\n\n- `php artisan migrate`\n- `php artisan db:seed`\n- `php artisan storage:link`\n","funding_links":[],"categories":["PHP"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIngestAI%2Fembedditor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FIngestAI%2Fembedditor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIngestAI%2Fembedditor/lists"}