{"id":19252472,"url":"https://github.com/zigtools/trigram-bench","last_synced_at":"2025-09-07T23:33:30.508Z","repository":{"id":238015226,"uuid":"795690063","full_name":"zigtools/trigram-bench","owner":"zigtools","description":"For use in workspace symbols (WIP)","archived":false,"fork":false,"pushed_at":"2024-05-04T18:53:22.000Z","size":67,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-06-10T14:34:57.932Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Zig","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zigtools.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"zigtools","open_collective":"zigtools"}},"created_at":"2024-05-03T20:22:44.000Z","updated_at":"2024-05-05T19:38:40.000Z","dependencies_parsed_at":"2024-05-03T21:50:55.035Z","dependency_job_id":null,"html_url":"https://github.com/zigtools/trigram-bench","commit_stats":null,"previous_names":["zigtools/trigram-bench"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zigtools/trigram-bench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zigtools%2Ftrigram-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zigtools%2Ftrigram-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zigtools%2Ftrigram-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zigtools%2Ftrigram-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zigtools","download_url":"https://codeload.github.com/zigtools/trigram-bench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zigtools%2Ftrigram-bench/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259907227,"owners_count":22930155,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T18:27:06.860Z","updated_at":"2025-06-23T17:33:06.667Z","avatar_url":"https://github.com/zigtools.png","language":"Zig","readme":"# Trigram Bench\n\nFor use in ZLS workspace symbols.\n\n## Problem Statement\n\nDefinitions:\n- Document: A Zig file.\n- Declaration: A `u32` that represents a declaration. Only a name is exposed. The rest is considered a ZLS implementation detail for practical purposes.\n- Symbol / Declaration Name: A `[]const u8` that is the name of a declaration.\n- Index: Preprocessed state representing a document's symbols used to perform a search.\n- Query / Search: A search query that will be matched with a trigram search described below.\n- Trigram: A window over a string with window size 3 and stride 1. Our trigrams are Unicode-based and not byte-based and are case sensitive.\n  Example: `Counter` is composed of the trigrams `Cou`, `oun`, `unt`, `nte` `ter`.\n\n### Indexing\n\nWe begin by obtaining 10,000 symbols extracted from `zigwin32`'s `everything.zig` found in `symbols.txt`. Each symbol is given a declaration, from `0` to `9_999`. A list `declarations : Declaration (u32) -\u003e Symbol ([]const u8)` exists and is used for checking the correctness of a search.\n\nEach symbol is then split into its constituent trigrams, and a mapping `trigram_to_decls : Trigram -\u003e []const Declaration` is created. This maps trigrams to the declarations whose names contain the trigram.\n\nCurrently there is one indexing method in `common.zig`. If your query method requires a different kind of indexing, let me (Auguste) know.\n\n### Querying\n\nWe begin by splitting our query into its constituent trigrams. Then we access `trigram_to_decls` with each trigram and obtain the intersection of each decl list.\n\nLet's walk through a made up example. We're searching for `Alloc`, which is composed of the trigrams `All`, `llo`, and `loc`.\n\nWe access the `trigram_to_decls` mapping for each trigram and obtain the following declaration lists for each trigram:\n```\nAll -\u003e { 0, 1, 3, 4, 7, 8 }\nllo -\u003e { 2, 3, 5, 6, 9 }\nloc -\u003e { 0, 2, 3, 10 }\n```\n\nWe perform the intersection and obtain:\n```\nAll ∩ llo ∩ loc -\u003e { 2, 3 }\n```\n\nTo check if this result makes sense, we can access `declarations`:\n```\n2 -\u003e Allocator\n3 -\u003e ArenaAllocator\n```\n\nSuccess!\n\n### Challenge\n\nCan you come up with the most effective way of indexing/querying this data? Anything is permitted as long as it replicates the test results with a reasonable balance of memory usage and performance. If a trade-off between indexing and query time appears, we'd rather make indexing faster to prevent delays for users not/rarely utilizing workspace symbols.\n\nPlease use a `std.AutoArrayHashMapUnmanaged` as it's required for the binary fuse filter (this bench only operates on a single document, but ZLS will operate on thousands, so the filter used to prevent unnecessary accesses).\n\nThe current solutions are:\n- `merge.zig`\n- `hashmap.zig`\n\n## Useful Stats\n\nOur `trigram_to_decls` mapping has 10505 elements with an average of ~21 declarations (can be repeated) per trigram.\n","funding_links":["https://github.com/sponsors/zigtools","https://opencollective.com/zigtools"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzigtools%2Ftrigram-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzigtools%2Ftrigram-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzigtools%2Ftrigram-bench/lists"}