{"id":50525565,"url":"https://github.com/b7s/embedding.cpp","last_synced_at":"2026-06-03T07:31:32.618Z","repository":{"id":360745745,"uuid":"1251255016","full_name":"b7s/embedding.cpp","owner":"b7s","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-27T17:26:44.000Z","size":140,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-27T19:12:26.770Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/b7s.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-27T11:54:07.000Z","updated_at":"2026-05-27T17:35:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/b7s/embedding.cpp","commit_stats":null,"previous_names":["b7s/embedding.cpp"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/b7s/embedding.cpp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b7s%2Fembedding.cpp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b7s%2Fembedding.cpp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b7s%2Fembedding.cpp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b7s%2Fembedding.cpp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/b7s","download_url":"https://codeload.github.com/b7s/embedding.cpp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b7s%2Fembedding.cpp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33853998,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-03T02:00:06.370Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-03T07:31:29.146Z","updated_at":"2026-06-03T07:31:32.604Z","avatar_url":"https://github.com/b7s.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Embedding.cpp\n\nText embedding tool via `BERT` models upon [ggml](https://github.com/ggerganov/ggml), with critical bug fixes and improvements over the upstream.\n\n## Improvements Over Upstream\n\nThis fork includes three critical bug fixes that make the library actually functional:\n\n### 1. Fix SIGILL on tokenizer load (`tokenizer.cpp`)\n\n`bert_tokenizer::load()` declared a `bool` return type but had no `return` statement. The compiler placed a `ud2` (undefined instruction) after the function body, causing an immediate **SIGILL (exit code 132)** on every call. This made the library completely unusable.\n\n**Fix:** Added `return true;` at the end of `bert_tokenizer::load()`.\n\n### 2. Fix SIGSEGV from use-after-free (`bert.cpp`)\n\nIn `bert_eval_batch()`, `ggml_free(ctx0)` was called *before* reading `gf-\u003enodes[]` and `ggml_used_mem(ctx0)`. In release builds (`-O2`), the optimizer reuses the freed memory, causing a **SIGSEGV** crash.\n\n**Fix:** Moved `ggml_free(ctx0)` to after all reads from `gf` and `ctx0`.\n\n### 3. Fix garbage embeddings from wrong graph node (`bert.cpp`)\n\n`bert_eval_batch()` read `gf-\u003enodes[n_nodes - 2]` which is an intermediate `ggml_div` node producing the scalar `1.0f / length` — **not** the embedding vector. The actual normalized embedding is `gf-\u003enodes[n_nodes - 1]` (the final `ggml_scale` output). This caused garbage embeddings with magnitude ~6.7e22 and mostly zero values.\n\n**Fix:** Changed to `embeddings_tensor = gf-\u003enodes[gf-\u003en_nodes - 1]`.\n\n---\n\n## Feature (Origin)\n\n* Plain C/C++ implementation without dependencies\n* Inherit support for various architectures from ggml (x86 with AVX2, ARM, etc.)\n* Choose your model size from 32/16/4 bits per model weight\n* all-MiniLM-L6-v2 with 4bit quantization is only 14MB. Inference RAM usage depends on the length of the input\n* Sample cpp server over tcp socket and a python test client\n* Benchmarks to validate correctness and speed of inference\n\n## Feature (Improve)\n\n* Build tokenizer with [tokenizers-cpp](https://github.com/mlc-ai/tokenizers-cpp).\n* Can correctly handle asian writing (CJK, and so on).\n* Can process cased/uncased with respect to origin config in `tokenizer.json`.\n* Upgrade to use [GGUF](https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md) model file format. So it is easy to expand and keep compatible.\n* **Critical bug fixes** listed above — without these, the upstream code does not produce usable embeddings.\n\n\u003e With above, we can run embedding.cpp with more models like [m3e](), [e5]() and so on.\n\n## Limitation\n\n* Only support bert base model for embedding. other architecture like SGPT is not supported.\n* Only run on CPU.\n* All outputs are mean pooled and normalized.\n* Batching support is WIP.\n* Lack of real batching means that this library is slower than it could be in usecases where you have multiple sentences.\n\n## Usage\n\n### Checkout submodules\n\n```sh\ngit submodule update --init --recursive\n```\n\n### Build\n\nBy default, it build both\n- the native binaries, like the example server, with static libraries;\n- and the dynamic library for usage from e.g. Python.\n\n```sh\nmkdir build\ncd build\ncmake .. -DCMAKE_BUILD_TYPE=Release\nmake\ncd ..\n```\n\n\u003e rust should be installed. see [rust](https://www.rust-lang.org/tools/install) or run `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`\n\n### Converting models to gguf format\n\nConverting models is similar to llama.cpp. Use models/convert-to-gguf.py to make hf models into either f32 or f16 gguf models.\nThen use ./build/bin/quantize to turn those into Q4_0, 4bit per weight models.\n\nThere is also models/run_conversions.sh which creates all 4 versions (f32, f16, Q4_0, Q4_1) at once.\n\n```sh\npip install -r requirements.txt\ncd models\n# Clone a model from hf\ngit clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2\n# Run conversions to 4 ggml formats (f32, f16, Q4_0, Q4_1)\nsh run_conversions.sh all-MiniLM-L6-v2\n```\n\n## Acknowledgments\n\nThis project is a fork of [embedding.cpp](https://github.com/FFengIll/embedding.cpp) by FFengIll, which itself is a fork of [bert.cpp](https://github.com/skeskinen/bert.cpp) by skeskinen. Thank you to the original authors and contributors for the foundational work that made this possible.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fb7s%2Fembedding.cpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fb7s%2Fembedding.cpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fb7s%2Fembedding.cpp/lists"}