{"id":47593842,"url":"https://github.com/dariasmyr/fts-engine","last_synced_at":"2026-04-01T17:50:20.762Z","repository":{"id":274866358,"uuid":"924324494","full_name":"dariasmyr/fts-engine","owner":"dariasmyr","description":"A modular full-text search engine in Go with instant indexing, pluggable indexers, and configurable pre-search filters.","archived":false,"fork":false,"pushed_at":"2026-03-19T08:59:15.000Z","size":75993,"stargazers_count":11,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-03-19T11:39:51.503Z","etag":null,"topics":["fulltext-search","fuzzy-search","ngram-analysis","ngrams","stemming","trie"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dariasmyr.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-29T19:54:15.000Z","updated_at":"2026-03-19T08:59:18.000Z","dependencies_parsed_at":"2025-07-25T14:40:12.193Z","dependency_job_id":null,"html_url":"https://github.com/dariasmyr/fts-engine","commit_stats":null,"previous_names":["dariasmyr/fts-hw","dariasmyr/fulltextsearch-engine","dariasmyr/fts-engine"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/dariasmyr/fts-engine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariasmyr%2Ffts-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariasmyr%2Ffts-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariasmyr%2Ffts-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariasmyr%2Ffts-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dariasmyr","download_url":"https://codeload.github.com/dariasmyr/fts-engine/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariasmyr%2Ffts-engine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31290625,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T13:12:26.723Z","status":"ssl_error","status_checked_at":"2026-04-01T13:12:25.102Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fulltext-search","fuzzy-search","ngram-analysis","ngrams","stemming","trie"],"created_at":"2026-04-01T17:50:19.837Z","updated_at":"2026-04-01T17:50:20.737Z","avatar_url":"https://github.com/dariasmyr.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Full-Text Search Test Engine\n\nReusable full-text search engine in Go with configurable indexes, token pipeline, and snapshot support.\n\n![Demo](docs/demo.gif)\n\n## What this repository provides\n\n- Public library API in `pkg/fts`.\n- Public index implementations in `pkg/index/*`:\n  - `radix`\n  - `slicedradix`\n  - `trigram`\n  - `hamt`\n  - `hamtpointered`\n- Public text processing pipeline in `pkg/textproc`.\n- Public key generators in `pkg/keygen`.\n- Public probabilistic filters in `pkg/filter`.\n- CLI entrypoint in `cmd/fts` with:\n  - `prod` mode (run with configurable filters and interactive CUI)\n  - `experiment` mode (collect indexing metrics)\n\n## Library usage\n\n### 1) Install\n\n```bash\ngo get github.com/dariasmyr/fts-engine@latest\n```\n\nIf you test against local source:\n\n```go\nreplace github.com/dariasmyr/fts-engine =\u003e /absolute/path/to/fts-engine\n```\n\n### 2) Quickstart\n\n```go\npackage main\n\nimport (\n\t\"context\"\n\t\"fmt\"\n\n\t\"github.com/dariasmyr/fts-engine/pkg/fts\"\n\t\"github.com/dariasmyr/fts-engine/pkg/index/radix\"\n\t\"github.com/dariasmyr/fts-engine/pkg/keygen\"\n)\n\nfunc main() {\n\tengine := fts.New(radix.New(), keygen.Word)\n\n\t_ = engine.IndexDocument(context.Background(), \"doc-1\", \"Wikipedia: Rosa is a French hotel barge\")\n\tres, _ := engine.SearchDocuments(context.Background(), \"french hotel\", 10)\n\n\tfmt.Println(res.TotalResultsCount)\n}\n```\n\n### 3) Snapshots\n\n#### Simple file snapshot (index + filter in separate files)\n\nUse `pkg/ftsbuiltin` for built-in name-based codecs (`radix`, `bloom`, etc.) without manual codec registry wiring.\n\nFor a more advanced in-memory `io.Writer`/`io.Reader` example with one combined payload, see `examples/client-library/snapshot-buffer-filter/main.go`.\n\nSave index + filter snapshots:\n\n```go\npackage main\n\nimport (\n\t\"context\"\n\t\"os\"\n\n\t\"github.com/dariasmyr/fts-engine/pkg/fts\"\n\t\"github.com/dariasmyr/fts-engine/pkg/ftsbuiltin\"\n\t\"github.com/dariasmyr/fts-engine/pkg/keygen\"\n)\n\nfunc main() {\n\topts := ftsbuiltin.FilterOptions{BloomExpectedItems: 1_000_000, BloomBitsPerItem: 10, BloomK: 7}\n\n\tidx, _ := ftsbuiltin.BuildIndex(\"radix\")\n\tflt, _ := ftsbuiltin.BuildFilter(\"bloom\", opts)\n\tsvc := fts.New(idx, keygen.Word, fts.WithFilter(flt))\n\t_ = svc.IndexDocument(context.Background(), \"doc-1\", \"file snapshot demo\")\n\n\tindexOut, _ := os.Create(\"./data/segments/default.fidx\")\n\tdefer indexOut.Close()\n\t_ = ftsbuiltin.SaveIndexSnapshot(indexOut, \"radix\", idx)\n\n\tfilterOut, _ := os.Create(\"./data/segments/default.fflt\")\n\tdefer filterOut.Close()\n\t_ = ftsbuiltin.SaveFilterSnapshot(filterOut, \"bloom\", flt)\n}\n```\n\nLoad snapshots from files:\n\n```go\npackage main\n\nimport (\n\t\"context\"\n\t\"fmt\"\n\t\"os\"\n\n\t\"github.com/dariasmyr/fts-engine/pkg/fts\"\n\t\"github.com/dariasmyr/fts-engine/pkg/ftsbuiltin\"\n\t\"github.com/dariasmyr/fts-engine/pkg/keygen\"\n)\n\nfunc main() {\n\tindexIn, _ := os.Open(\"./data/segments/default.fidx\")\n\tdefer indexIn.Close()\n\tloadedIndex, _ := ftsbuiltin.LoadIndexSnapshot(indexIn)\n\n\tfilterIn, _ := os.Open(\"./data/segments/default.fflt\")\n\tdefer filterIn.Close()\n\tloadedFilter, _ := ftsbuiltin.LoadFilterSnapshot(filterIn)\n\n\trestored := fts.New(loadedIndex.Index, keygen.Word, fts.WithFilter(loadedFilter.Filter))\n\tres, _ := restored.SearchDocuments(context.Background(), \"snapshot\", 10)\n\tfmt.Println(res.TotalResultsCount)\n}\n```\n\n### 4) Custom pipeline and language presets\n\nDefault preset shortcut:\n\n```go\nengine := fts.New(radix.New(), keygen.Word, ftspreset.English())\n```\n\nAvailable presets:\n\n- `textproc.DefaultEnglishPipeline()`\n- `textproc.DefaultRussianPipeline()`\n- `textproc.DefaultMultilingualPipeline()`\n- `ftspreset.English()` / `ftspreset.Russian()` / `ftspreset.Multilingual()`\n\nCustom pipeline:\n\n```go\npipe := textproc.NewPipeline(\n\ttextproc.AlnumTokenizer{},\n\ttextproc.LowercaseFilter{},\n\ttextproc.MinLengthOrNumericFilter{MinLength: 2},\n\ttextproc.EnglishStopwordFilter{},\n\ttextproc.EnglishStemFilter{},\n)\n\nengine := fts.New(radix.New(), keygen.Word, fts.WithPipeline(pipe))\n```\n\n## Run main app (local testing via config)\n\nUse this only when you want to test the repository app itself (`cmd/fts`), not when embedding the library into your service.\n\n1) Create config from template:\n\n```bash\ncp ./config/config_local_example.yaml ./config/config_local.yaml\n```\n\n2) Run with config:\n\n```bash\ngo run ./cmd/fts --config=./config/config_local.yaml\n```\n\nImportant config fields:\n\n```yaml\nfts:\n  engine: \"trie\"       # trie|kv\n  index: \"radix\"       # radix|slicedradix|trigram|hamt|hamtpointered\n  keygen: \"word\"       # word|trigram\n  filter: \"none\"       # none|bloom|cuckoo|ribbon\n  snapshot:\n    enabled: true\n    path: \"./data/segments/default.fidx\"\n    split_files: true\n    index_path: \"./data/segments/local.index.fidx\"\n    filter_path: \"./data/segments/loca.filter.fidx\"\n    load_on_start: true\n    save_on_build: true\n    buffer_size: 1048576\n    flush_threshold: 262144\n    sync_file: true\n  bloom:\n    expected_items: 1000000\n    bits_per_item: 10\n    k: 7\n  cuckoo:\n    bucket_count: 262144\n    bucket_size: 4\n    max_kicks: 500\n  ribbon:\n    expected_items: 1000000\n    extra_cells: 250000\n    window_size: 24      # 1..32\n    seed: 0\n    max_attempts: 5\n  pipeline:\n    lowercase: true\n    stopwords_en: true\n    stopwords_ru: false\n    stem_en: true\n    stem_ru: false\n    min_length: 3\nmode:\n  type: \"prod\"        # prod|experiment\n```\n\nSnapshot fields (`fts.snapshot`):\n\n- `enabled`: enable snapshot persistence flow in CLI prod mode.\n- `path`: final snapshot artifact path.\n- `split_files`: if true, save/load index and filter in separate files.\n- `index_path`: optional explicit path for index snapshot file in split mode.\n- `filter_path`: optional explicit path for filter snapshot file in split mode.\n- `load_on_start`: if true and snapshot exists, load it and skip rebuild.\n- `save_on_build`: if true, save snapshot after indexing finishes.\n- `buffer_size`: writer buffer size used during save.\n- `flush_threshold`: buffered flush threshold used by the built-in save helper.\n- `sync_file`: fsync temp file before atomic rename.\n\n## CLI modes\n\n- `prod`:\n  - runs engine with configurable pipeline and interactive CUI search,\n  - if `fts.snapshot.enabled=true` and `load_on_start=true` and snapshot exists: loads snapshot and skips re-index,\n  - otherwise indexes documents and (if `save_on_build=true`) persists snapshot atomically.\n- `experiment`:\n  - always indexes current input and prints memory/index stats,\n  - does not run CUI snapshot restore flow.\n\n## Ribbon filter usage\n\nRibbon is a static filter. In `fts` it is used via `BufferedStaticFilter`.\n\nBuild ribbon from file with a custom parser:\n\n```go\nopts := ftsbuiltin.FilterOptions{\n\tRibbonExpectedItems: 1_000_000, // estimated unique keys\n\tRibbonExtraCells:    250_000,   \n\tRibbonWindowSize:    16,       \n\tRibbonSeed:          0,       \n\tRibbonMaxAttempts:   5,\n}\n\nrf, _ := filter.NewRibbonFilter(\n\topts.RibbonExpectedItems,\n\topts.RibbonExtraCells,\n\topts.RibbonWindowSize,\n\topts.RibbonSeed,\n)\n\n_ = rf.BuildWithRetriesFromFileWithParser(\"./data/keys.txt\", parseKeysFile, opts.RibbonMaxAttempts)\n\nout, _ := os.Create(\"./data/segments/ribbon.filter.fidx\")\ndefer out.Close()\n_ = rf.Serialize(out)\n```\n\nMinimal parser example (line-by-line keys):\n\n```go\nfunc parseKeysFile(path string, emit func([]byte) bool) error {\n\tf, err := os.Open(path)\n\tif err != nil {\n\t\treturn err\n\t}\n\tdefer f.Close()\n\n\ts := bufio.NewScanner(f)\n\tfor s.Scan() {\n\t\tkey := strings.TrimSpace(s.Text())\n\t\tif key == \"\" {\n\t\t\tcontinue\n\t\t}\n\t\tif !emit([]byte(key)) {\n\t\t\tbreak\n\t\t}\n\t}\n\n\treturn s.Err()\n}\n```\n\nLoad ribbon filter from file:\n\n```go\nin, _ := os.Open(\"./data/segments/ribbon.filter.fidx\")\ndefer in.Close()\n\nribbonFilter, _ := filter.LoadRibbonFilter(in)\n\nfmt.Println(ribbonFilter.Contains([]byte(\"market\")))\n```\n\nFull runnable example (default parser save, custom parser save, load from file, normalized `Contains`) is in `examples/client-library/ribbon-file/main.go`.\n\n### Standalone filter `Contains` with normalization\n\nUse this when you store normalized keys in filter and later want to check a raw user word.\n\nExample: indexed key is `beauty`, user enters `beautiful`.\nWith stemming, both become `beauti`, so normalized check returns `true`.\n\n```go\npipe := textproc.NewPipeline(\n\ttextproc.AlnumTokenizer{},\n\ttextproc.LowercaseFilter{},\n\ttextproc.EnglishStemFilter{},\n)\n\nindexedTerms := []string{\"beauty\", \"hotel\"}\nnormalizedKeys := make([]string, 0, len(indexedTerms))\nfor _, term := range indexedTerms {\n\tkeys, _ := fts.NormalizeToKeys(term, pipe, keygen.Word)\n\tnormalizedKeys = append(normalizedKeys, keys...)\n}\n\nrf, _ := filter.NewRibbonFilter(uint32(len(normalizedKeys)), 32, 24, 0)\nstream := func(emit func([]byte) bool) error {\n\tfor _, key := range normalizedKeys {\n\t\tif !emit([]byte(key)) {\n\t\t\tbreak\n\t\t}\n\t}\n\treturn nil\n}\n\n_ = rf.BuildWithRetriesFromKeyStream(stream, 5)\n\nraw := rf.Contains([]byte(\"beautiful\")) // false: filter stores normalized keys\n\nnormalized, _ := fts.ContainsNormalized(rf, \"beautiful\", pipe, keygen.Word)\n\nfmt.Println(\"raw\", raw, \"normalized\", normalized) // raw=false normalized=true\n```\n\n`ContainsNormalized` applies pipeline + keygen and checks all normalized keys via `Contains`.\n\n## Tests\n\nRun all tests:\n\n```bash\ngo test ./...\n```\n\nRun only public packages:\n\n```bash\ngo test ./pkg/...\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdariasmyr%2Ffts-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdariasmyr%2Ffts-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdariasmyr%2Ffts-engine/lists"}