{"id":50727115,"url":"https://github.com/backloghq/termlog","last_synced_at":"2026-06-10T05:01:26.815Z","repository":{"id":355779294,"uuid":"1229553197","full_name":"backloghq/termlog","owner":"backloghq","description":"Log-structured full-text search index for TypeScript — segment-based posting lists with LSM compaction, BM25 ranking, zero native dependencies","archived":false,"fork":false,"pushed_at":"2026-05-30T12:39:45.000Z","size":358,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-07T16:24:42.465Z","etag":null,"topics":["aws-s3","bm25","embedded-database","fts","full-text-search","indexing","log-structured","lsm","search-index","typescript"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/backloghq.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-05T06:51:20.000Z","updated_at":"2026-05-30T12:11:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/backloghq/termlog","commit_stats":null,"previous_names":["backloghq/termlog"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/backloghq/termlog","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/backloghq%2Ftermlog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/backloghq%2Ftermlog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/backloghq%2Ftermlog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/backloghq%2Ftermlog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/backloghq","download_url":"https://codeload.github.com/backloghq/termlog/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/backloghq%2Ftermlog/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34137570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-s3","bm25","embedded-database","fts","full-text-search","indexing","log-structured","lsm","search-index","typescript"],"created_at":"2026-06-10T05:01:26.007Z","updated_at":"2026-06-10T05:01:26.809Z","avatar_url":"https://github.com/backloghq.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# @backloghq/termlog\n\nLog-structured full-text search index — segment-based posting lists with LSM compaction, BM25 ranking, zero native dependencies.\n\n## Install\n\n```\nnpm install @backloghq/termlog\n```\n\n## Usage\n\n```ts\nimport { TermLog } from \"@backloghq/termlog\";\n\nconst index = await TermLog.open({ dir: \"./my-index\" });\n\nawait index.add(\"doc-1\", \"the quick brown fox\");\nawait index.add(\"doc-2\", \"the lazy dog\");\nawait index.flush();\n\nconst results = await index.search(\"fox\", { limit: 10 });\n// [{ docId: \"doc-1\", score: 0.655... }]  (BM25 — exact value depends on corpus)\n\nawait index.remove(\"doc-1\");\nawait index.close();\n```\n\n## Why\n\nExisting FTS engines (Lucene, Tantivy) require native deps or a JVM. Most pure-JS alternatives serialize the index to a single in-memory blob — fine for small corpora, but they hit per-file size cliffs in the tens of thousands of documents. Termlog uses immutable on-disk segments with LSM compaction so the corpus scales without those ceilings.\n\n## Architecture\n\n- **Posting lists** — `term → [docId, tf]`, compressed with VByte / delta encoding. (Positions reserved for a future release.)\n- **Term dictionary** — sorted on disk; binary search for lookup.\n- **Segments** — self-contained immutable files (term dict + postings). New writes create a new segment. Compaction merges N segments into 1.\n- **Query execution** — boolean (AND/OR) via posting iterators (zigzag merge for AND, union scan for OR), BM25 scoring on top.\n- **Storage** — abstracted via `StorageBackend`; local FS by default, S3 via [@backloghq/termlog-s3](https://github.com/backloghq/termlog-s3).\n\n## S3 backend\n\nS3 support is provided by the companion package [@backloghq/termlog-s3](https://github.com/backloghq/termlog-s3):\n\n```bash\nnpm install @backloghq/termlog @backloghq/termlog-s3\n```\n\n```ts\nimport { TermLog } from \"@backloghq/termlog\";\nimport { S3Backend } from \"@backloghq/termlog-s3\";\nimport { S3Client } from \"@aws-sdk/client-s3\";\n\nconst index = await TermLog.open({\n  dir: \"my-index\",\n  backend: new S3Backend({\n    client: new S3Client({ region: \"us-east-1\" }),\n    bucket: \"my-bucket\",\n    prefix: \"my-index/\",\n  }),\n});\n```\n\nSee the [termlog-s3 README](https://github.com/backloghq/termlog-s3) for IAM permissions, lifecycle rules, and MinIO/LocalStack usage.\n\n## Options\n\n| Option | Default | Description |\n|---|---|---|\n| `fanout` | 4 | Same-tier segment count that triggers a merge (size-tiered LSM) |\n| `flushThreshold` | 1000 | Docs in write buffer before auto-flush |\n| `k1` | 1.2 | BM25 term-saturation parameter |\n| `b` | 0.75 | BM25 length-normalization parameter |\n\n## Errors\n\n| Class | When thrown |\n|---|---|\n| `ManifestCorruptionError` | manifest.json contains invalid JSON |\n| `ManifestVersionError` | manifest version is outside the supported range |\n| `SegmentCorruptionError` | CRC32 mismatch or missing segment file (`.region` tells you which) |\n| `MappingCorruptionError` | docids.snap or docids.log is corrupt |\n| `TokenizerMismatchError` | reopening an index with a different tokenizer config |\n| `IndexLockedError` | another process holds the advisory `.lock` file |\n| `WriteStreamError` | base class for streaming write failures (S3 multipart, etc.) |\n\n## Stats\n\n| Method | Returns | Description |\n|---|---|---|\n| `docCount()` | `number` | Documents indexed across all flushed segments |\n| `segmentCount()` | `number` | Number of active on-disk segments |\n| `estimatedBytes()` | `number` | Approximate in-memory footprint (postings buffers + sidecar arrays + Maps); lower-bound estimate for memory-budget callers |\n\n## Multi-writer / S3 safety\n\nTermlog is **single-writer per index directory**. On local FS an advisory `.lock` file prevents concurrent opens in the same process group. On S3 (or any shared storage) there is no distributed lock — you must ensure at most one writer per index path.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbackloghq%2Ftermlog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbackloghq%2Ftermlog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbackloghq%2Ftermlog/lists"}