{"id":47593842,"url":"https://github.com/dariasmyr/fts-engine","last_synced_at":"2026-07-04T20:00:24.550Z","repository":{"id":274866358,"uuid":"924324494","full_name":"dariasmyr/fts-engine","owner":"dariasmyr","description":"Modular full-text search engine in Go with pluggable indexes, filters, and customizable text processing pipelines. You can instantly index your docs (trie, n-grams, HAMT), apply probabilistic filters, and experimenting with search performance via interactive CUI.","archived":false,"fork":false,"pushed_at":"2026-06-28T13:39:21.000Z","size":10286,"stargazers_count":18,"open_issues_count":2,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2026-06-28T15:20:51.817Z","etag":null,"topics":["approximate-matching","data-structures","experimental","fts","golang","hamt","indexing","information-retrieval","n-grams","prefix-search","radix","radix-tree","search","search-algorithms","search-engine","text-processing","trie"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dariasmyr.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-29T19:54:15.000Z","updated_at":"2026-06-15T21:47:17.000Z","dependencies_parsed_at":"2025-07-25T14:40:12.193Z","dependency_job_id":null,"html_url":"https://github.com/dariasmyr/fts-engine","commit_stats":null,"previous_names":["dariasmyr/fts-hw","dariasmyr/fulltextsearch-engine","dariasmyr/fts-engine"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/dariasmyr/fts-engine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariasmyr%2Ffts-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariasmyr%2Ffts-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariasmyr%2Ffts-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariasmyr%2Ffts-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dariasmyr","download_url":"https://codeload.github.com/dariasmyr/fts-engine/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariasmyr%2Ffts-engine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35133834,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-04T02:00:05.987Z","response_time":113,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-matching","data-structures","experimental","fts","golang","hamt","indexing","information-retrieval","n-grams","prefix-search","radix","radix-tree","search","search-algorithms","search-engine","text-processing","trie"],"created_at":"2026-04-01T17:50:19.837Z","updated_at":"2026-07-04T20:00:24.527Z","avatar_url":"https://github.com/dariasmyr.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fast Turtle Search Engine\n\nReusable full-text search library for Go.\n\nIt provides:\n\n- mutable in-memory search via `pkg/fts`\n- built-in indexes in `pkg/index/slicedradix` and `pkg/index/hamt`\n- query-string, phrase, boolean, field-scoped, and prefix search\n- optional pipelines, stemming, and language presets via `pkg/textproc` and `pkg/ftspreset`\n- mutable snapshots and sealed read-only segments via `pkg/ftspersist`\n- per-request diagnostics and aggregated search stats via `pkg/ftsstats`\n\n## Public API Surface\n\nFor external integrations, prefer these public packages:\n\n- `pkg/fts` - core engine, document model, query API\n- `pkg/index/slicedradix` - exact, positional, and prefix index\n- `pkg/index/hamt` - exact and positional index\n- `pkg/keygen` - token-to-key generators\n- `pkg/ftspersist` - recommended snapshot and segment persistence API\n- `pkg/segment` - lower-level sealed segment API\n- `pkg/textproc` - tokenizers and filters\n- `pkg/ftspreset` - ready-to-use pipeline presets\n- `pkg/filter` - bloom, cuckoo, and ribbon filters\n- `pkg/ftsstats` - aggregated search observability\n\n`cmd/*`, `internal/*`, and `benchmarks/*` are repository-owned tooling, not the main library surface.\n\n## Requirements\n\n- Go `1.25+`\n\n## Install\n\n```bash\ngo get github.com/dariasmyr/fts-engine@latest\n```\n\n## Quickstart\n\n```go\npackage main\n\nimport (\n\t\"context\"\n\t\"fmt\"\n\n\t\"github.com/dariasmyr/fts-engine/pkg/fts\"\n\t\"github.com/dariasmyr/fts-engine/pkg/index/slicedradix\"\n\t\"github.com/dariasmyr/fts-engine/pkg/keygen\"\n)\n\nfunc main() {\n\tengine := fts.New(slicedradix.New(), keygen.Word)\n\n\t_ = engine.Index(context.Background(), fts.Document{ID: \"doc-1\", Fields: map[string]fts.Field{fts.DefaultField: {Value: \"Wikipedia: Rosa is a French hotel barge\"}}})\n\t_ = engine.Index(context.Background(), fts.Document{ID: \"doc-2\", Fields: map[string]fts.Field{fts.DefaultField: {Value: \"Rosa runs hotel operations in France\"}}})\n\n\tres, err := engine.SearchDocuments(context.Background(), \"french hotel\", 10)\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\tfmt.Printf(\"results=%d\\n\", res.TotalResultsCount)\n\tfor _, item := range res.Results {\n\t\tfmt.Printf(\"id=%s unique=%d total=%d\\n\", item.ID, item.UniqueMatches, item.TotalMatches)\n\t}\n}\n```\n\nNotes:\n\n- `fts.New(...)` creates a single-field service backed by `fts.DefaultField`\n- in practice that means the regular single-field index uses the field name `_default`\n- if you do not set a pipeline, the default behavior is alphanumeric tokenization plus lowercasing\n- add `fts.WithScorer(fts.BM25())` or `fts.WithScorer(fts.TFIDF())` when you want score-based ranking\n\n## Choosing an Index\n\n| Index | Capabilities | When to use |\n| --- | --- | --- |\n| `slicedradix` | exact, positional, prefix | best default if you need prefix queries |\n| `hamt` | exact, positional | use when you do not need prefix queries |\n\nPrefix queries require an index that implements `fts.PrefixIndex`. Among the built-in mutable indexes, that means `slicedradix`.\n\n## Pipelines and Presets\n\nUse a preset when the defaults fit your language mix:\n\n```go\nengine := fts.New(slicedradix.New(), keygen.Word, ftspreset.Multilingual())\n```\n\nAvailable presets:\n\n- `ftspreset.English()`\n- `ftspreset.Russian()`\n- `ftspreset.Multilingual()`\n\nUse `pkg/textproc` when you want an explicit pipeline:\n\n```go\npipe := textproc.NewPipeline(\n\ttextproc.AlnumTokenizer{},\n\ttextproc.LowercaseFilter{},\n\ttextproc.MinLengthOrNumericFilter{MinLength: 2},\n\ttextproc.EnglishStopwordFilter{},\n\ttextproc.EnglishStemFilter{},\n)\n\nengine := fts.New(slicedradix.New(), keygen.Word, fts.WithPipeline(pipe))\n```\n\nEach `fts.Field` can also override the service-level pipeline with its own `Field.Pipeline`.\n\n## Search API\n\nUse:\n\n- `SearchDocuments(...)` for query-string parsing\n- `SearchPlainText(...)` for bag-of-words input without query syntax\n- `SearchField(...)`, `SearchFields(...)` for field-scoped search\n- `SearchPhrase(...)`, `SearchPhraseNear(...)` and field variants for phrase queries\n- `SearchFieldClauses(...)` when different fields need different subqueries\n\nSupported query-string syntax:\n\n- `hotel`\n- `french hotel`\n- `\"hotel barge\"`\n- `+hotel -market`\n- `title:hotel`\n- `title:\"hotel barge\"`\n- `bar*`\n- `+(title:hotel title:french) -market`\n\nProgrammatic queries are available through the AST types in `pkg/fts` such as `TermQuery`, `PhraseQuery`, `PrefixQuery`, and `BooleanQuery`.\n\nField behavior summary:\n\n- with `fts.New(...)`, documents are indexed only into `_default`\n- with `fts.NewMultiField(...)`, the service keeps a separate index per field name\n- field indexes in multi-field mode are created lazily on first indexing of that field\n- searching a field that has no index does not return an error; it returns zero matches\n\nWhat that means for different search entry points:\n\n- `SearchDocuments(...)` on a single-field service searches only `_default`\n- `SearchDocuments(...)` on a multi-field service searches across the currently existing field indexes\n- `SearchField(...)`, `SearchPhraseField(...)`, `SearchPhraseNearField(...)`, and field-scoped query syntax like `title:hotel` return zero matches when that field has never been indexed\n- `SearchFields(...)`, `SearchPhraseFields(...)`, `SearchPhraseNearFields(...)`, and `SearchQueryFields(...)` search only the provided fields that currently exist; missing fields are ignored\n- `SearchFieldClauses(...)` behaves the same way per clause: a clause targeting a missing field contributes no matches\n- prefix search behaves the same with one extra rule: if the field exists but its index does not support prefix search, that field contributes no prefix matches\n\n## Multi-Field Services\n\nUse `fts.NewMultiField(...)` when documents have separate searchable fields:\n\n```go\nfactory := func(string) (fts.Index, error) {\n\treturn slicedradix.New(), nil\n}\n\nengine := fts.NewMultiField(factory, keygen.Word)\n\n_ = engine.Index(context.Background(), fts.Document{\n\tID: \"doc-1\",\n\tFields: map[string]fts.Field{\n\t\t\"title\": {Value: \"French hotel\"},\n\t\t\"body\":  {Value: \"Rosa runs hotel operations in France\"},\n\t},\n})\n\nres, _ := engine.SearchField(context.Background(), \"title\", \"hotel\", 10)\nfmt.Println(res.TotalResultsCount)\n```\n\nIn this mode, you usually create one index per field through the factory. The engine calls the factory the first time a field needs to be indexed and then reuses that index for future documents in the same field.\n\n## Persistence\n\nThe recommended persistence surface for library consumers is `pkg/ftspersist`.\n\n| Mode | Writable after load | Recommended API | Notes |\n| --- | --- | --- | --- |\n| snapshot | yes | `SaveSnapshot`, `LoadSnapshot` | restores a mutable service |\n| segment | no | `SaveSegment`, `LoadSegment` | restores a sealed read-only service |\n\nImportant details:\n\n- snapshot and segment formats are different and not interchangeable\n- `mmap` is available only for segments via `ftspersist.SegmentLoadOptions{Access: ftspersist.AccessMmap}`\n- `pkg/segment` is a lower-level API for raw segment files; prefer `pkg/ftspersist` unless you need direct segment access\n- if you persist built-in indexes through snapshots, or built-in filters through snapshots or segments, call `ftsbuiltin.RegisterSnapshotCodecs()` once at startup\n\nCurrent working persistence examples:\n\n- `examples/client-library/snapshot-save-files/main.go`\n- `examples/client-library/snapshot-load-files/main.go`\n- `examples/client-library/snapshot-load-files-low-level/main.go`\n- `examples/client-library/segment-save-files/main.go`\n- `examples/client-library/segment-load-files/main.go`\n- `examples/client-library/segment-load-files-low-level/main.go`\n- `examples/client-library/segment-load-mmap/main.go`\n\nSee `examples/client-library/README.md` for the exact run order. The load examples expect artifacts created by the corresponding save examples.\n\n## Diagnostics and Stats\n\nPer-request diagnostics are opt-in:\n\n```go\nctx := fts.WithDiagnostics(context.Background())\nres, _ := engine.SearchDocuments(ctx, \"postgres checkpoint\", 10)\n\nfmt.Println(res.Diagnostics.LogicalQueryType)\nfmt.Println(res.Diagnostics.ExecutionStrategy)\nfmt.Println(res.Diagnostics.Timings.Total)\n```\n\nFor aggregated observability across many requests, use `pkg/ftsstats`:\n\n```go\nstats := ftsstats.NewSearchStats(64)\nstats.ObserveResult(\"postgres checkpoint\", res, nil)\nsnap := stats.Snapshot()\nfmt.Println(len(snap.ByStrategy))\n```\n\n## Client Examples\n\n`examples/client-library` contains the examples that match the current public API.\n\n- `default` - minimal in-memory usage\n- `preset` - preset pipeline via `pkg/ftspreset`\n- `custom-options` - custom pipeline and filter\n- `snapshot-*` - mutable snapshot save and restore\n- `segment-*` - sealed segment save and restore, including `mmap`\n\nAll of these examples currently build and run from repository root.\n\n## Repository Tooling\n\nThis repository also contains project-specific tooling:\n\n- `demo/` - demo app module\n- `benchmarks/` - benchmark suite and reports\n\nIf you need those flows, use their local docs instead of treating them as the main library entry point.\n\n## Tests\n\nRun public-package tests:\n\n```bash\ngo test ./pkg/...\n```\n\nRun all tests:\n\n```bash\ngo test ./...\n```\n\nThe repository uses a root `go.work` workspace. Run multi-module commands from repository root or from a submodule directory inside this workspace. The child modules also keep a local `replace ../` fallback so current Go tooling can resolve the root library module consistently during module-local commands.\n\nRun the demo module tests:\n\n```bash\n(cd demo \u0026\u0026 go test ./...)\n```\n\nRun the benchmarks module tests:\n\n```bash\n(cd benchmarks \u0026\u0026 go test ./...)\n```\n\nAfter Go build/test checks pass, run repository dependency policy checks:\n\n```bash\ngo run ./tools/depcheck\n```\n\n`depcheck` is a post-toolchain architecture check. It validates only the repository's stable architecture boundaries:\n\n- `pkg/*` may depend only on `pkg/*` inside this repository\n- `examples/*` may import only public `pkg/*`\n- `demo/*` may import only public `pkg/*` and `demo/internal/*`\n- `benchmarks/*` may import only public `pkg/*`, `benchmarks/internal/*`, and `benchmarks/adapters/*`\n\n`depcheck` does not try to validate every possible external dependency or historical path name. Its scope is the permanent internal module and package boundary policy.\n\nIt also does not duplicate `internal` import restrictions that the Go toolchain already enforces during `go build` and `go test`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdariasmyr%2Ffts-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdariasmyr%2Ffts-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdariasmyr%2Ffts-engine/lists"}