{"id":13414769,"url":"https://github.com/akade/Akade.IndexedSet","last_synced_at":"2025-03-14T22:32:10.995Z","repository":{"id":39790505,"uuid":"447369112","full_name":"akade/Akade.IndexedSet","owner":"akade","description":"A convenient data structure supporting efficient in-memory indexing and querying, including range queries and fuzzy string matching.","archived":false,"fork":false,"pushed_at":"2024-10-23T18:03:46.000Z","size":292,"stargazers_count":61,"open_issues_count":1,"forks_count":6,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-24T01:24:42.516Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/akade.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"akade"}},"created_at":"2022-01-12T20:59:02.000Z","updated_at":"2024-10-23T18:03:38.000Z","dependencies_parsed_at":"2023-12-24T19:42:16.027Z","dependency_job_id":"1ef6db41-9aa3-438d-a35f-55b55b14157a","html_url":"https://github.com/akade/Akade.IndexedSet","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akade%2FAkade.IndexedSet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akade%2FAkade.IndexedSet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akade%2FAkade.IndexedSet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akade%2FAkade.IndexedSet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/akade","download_url":"https://codeload.github.com/akade/Akade.IndexedSet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221513932,"owners_count":16835746,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T21:00:36.192Z","updated_at":"2024-10-26T08:30:27.046Z","avatar_url":"https://github.com/akade.png","language":"C#","readme":"﻿# Akade.IndexedSet\n\n![.Net Version](https://img.shields.io/badge/dynamic/xml?color=%23512bd4\u0026label=version\u0026query=%2F%2FTargetFrameworks%5B1%5D\u0026url=https://raw.githubusercontent.com/akade/Akade.IndexedSet/main/Akade.IndexedSet/Akade.IndexedSet.csproj\u0026logo=.net)\n[![CI Build](https://github.com/akade/Akade.IndexedSet/actions/workflows/ci-build.yml/badge.svg?branch=master)](https://github.com/akade/Akade.IndexedSet/actions/workflows/ci-build.yml)\n[![NuGet version (Akade.IndexedSet)](https://img.shields.io/nuget/v/Akade.IndexedSet.svg)](https://www.nuget.org/packages/Akade.IndexedSet/)\n[![MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/akade/Akade.IndexedSet#readme)\n[![Static Badge](https://img.shields.io/badge/API%20Docs-DNDocs-43bc00?logo=readme\u0026logoColor=white)](https://www.robiniadocs.com/d/akadeinde/api/Akade.IndexedSet.IndexedSet-1.html)\n\n\nA convenient data structure supporting efficient in-memory indexing and querying, including range queries and fuzzy string matching.\nIn a nutshell, it allows you to write LINQ-like queries *without* enumerating through the entire list. If you are currently completely enumerating\nthrough your data, expect huge [speedups](docs/Benchmarks.md) and much better scalability!\n\n\u003c!--TOC--\u003e\n  - [Overview](#overview)\n    - [Design Goals](#design-goals)\n    - [Performance and Operation-Support of the different indices:](#performance-and-operation-support-of-the-different-indices)\n      - [General queries](#general-queries)\n      - [String queries](#string-queries)\n  - [Features](#features)\n    - [Unique index](#unique-index)\n    - [Non-unique index](#non-unique-index)\n    - [Range index](#range-index)\n    - [String indices and fuzzy matching](#string-indices-and-fuzzy-matching)\n    - [Multi-key indices: All indices can be used with multiple keys](#multi-key-indices-all-indices-can-be-used-with-multiple-keys)\n    - [Computed or compound key](#computed-or-compound-key)\n    - [Concurrency and Thread-Safety](#concurrency-and-thread-safety)\n    - [No reflection and no expressions - convention-based index naming](#no-reflection-and-no-expressions-convention-based-index-naming)\n  - [FAQs](#faqs)\n    - [How do I use multiple index types for the same property?](#how-do-i-use-multiple-index-types-for-the-same-property)\n    - [How do I update key values if the elements are already in the set?](#how-do-i-update-key-values-if-the-elements-are-already-in-the-set)\n    - [How do I do case-insensitive (fuzzy) string matching (Prefix, FullTextIndex)?](#how-do-i-do-case-insensitve-fuzzy-string-matching-prefix-fulltextindex)\n  - [Roadmap](#roadmap)\n\u003c!--/TOC--\u003e\n\n## Overview\n\nA sample showing different queries as you might want do for a report:\n\n```csharp\n// typically, you would query this from the db\nvar data = new Purchase[] {\n        new(Id: 1, ProductId: 1, Amount: 1, UnitPrice: 5),\n        new(Id: 2, ProductId: 1, Amount: 2, UnitPrice: 5),\n        new(Id: 6, ProductId: 4, Amount: 3, UnitPrice: 12),\n        new(Id: 7, ProductId: 4, Amount: 8, UnitPrice: 10) // discounted price\n        };\n\nIndexedSet\u003cint, Purchase\u003e set = data.ToIndexedSet(x =\u003e x.Id)\n                                    .WithIndex(x =\u003e x.ProductId)\n                                    .WithRangeIndex(x =\u003e x.Amount)\n                                    .WithRangeIndex(x =\u003e x.UnitPrice)\n                                    .WithRangeIndex(x =\u003e x.Amount * x.UnitPrice)\n                                    .WithIndex(x =\u003e (x.ProductId, x.UnitPrice))\n                                    .Build();\n\n// efficient queries on configured indices\n// in contrast to standard LINQ, they do not enumerate the entire list!\n_ = set.Where(x =\u003e x.ProductId, 4);\n_ = set.Range(x =\u003e x.Amount, 1, 3, inclusiveStart: true, inclusiveEnd: true); \n_ = set.GreaterThanOrEqual(x =\u003e x.UnitPrice, 10);\n_ = set.MaxBy(x =\u003e x.Amount * x.UnitPrice);\n_ = set.Where(x =\u003e (x.ProductId, x.UnitPrice), (4, 10));\n```\n\n### Design Goals\n- Much faster solution than (naive) LINQ-based full-enumeration\n- Syntax close to LINQ-Queries\n- Easy to use with a fluent builder API\n- Reflection \u0026 Expression-free to be AOT \u0026 Trimming friendly (for example for Blazor/WebASM)\n- It's not a db - in-memory only\n\n### Performance and Operation-Support of the different indices:\n\nBelow, you find runtime complexities. Benchmarks can be found [here](docs/Benchmarks.md)\n\n#### General queries\n\n- n: total number of elements\n- m: number of elements in the return set\n- ✔: Supported\n- ⚠: Supported but throws if not exactly 1 item was found\n- ❌: Not-supported\n\n| Query     | Unique-Index | NonUnique-Index | Range-Index     |\n| --------- | ------------ | --------------- | --------------- |\n| Single    | ⚠ O(1)      | ⚠ O(1)         | ⚠ O(log n)    |\n| Where     | ✔ O(1)       | ✔ O(m)         | ✔ O(log n + m) |\n| Range     | ❌           | ❌             | ✔ O(log n + m)  |\n| \u003c / \u003c=    | ❌           | ❌             | ✔ O(log n + m)  |\n| \u003e / \u003e=    | ❌           | ❌             | ✔ O(log n + m)  |\n| OrderBy   | ❌           | ❌             | ✔ O(m)          |\n| Max/Min   | ❌           | ❌             | ✔ O(1)          |\n\n#### String queries\n\n- w: length of query word\n- D: maximum distance in fuzzy query\n- r: number of items in result set\n\n| Query           | Prefix-Index | FullText-Index |\n| ----------------| ------------ | ---------------|\n| StartWith       | ✔ O(w+r)      | ✔ O(w+r)       |\n| Contains        | ❌           | ✔ O(w+r)        |\n| Fuzzy StartWith | ✔ O(w+D+r)    | ✔ O(w+D+r)     |\n| Fuzzy Contains  | ❌           | ✔ O(w+D+r)      |\n\n\u003e ℹ FullText indices use a lot more memory than prefix indices and are more expensive to construct. Only\nuse FullText indices if you really require it.\n\n\n## Features\n\n### Unique index\nDictionary-based, O(1), access on keys:\n\n```csharp\nIndexedSet\u003cint, Data\u003e set = IndexedSetBuilder\u003cData\u003e.Create(a =\u003e a.PrimaryKey)\n                                                   .WithUniqueIndex(x =\u003e x.SecondaryKey)\n                                                   .Build();\n\n_ = set.Add(new(PrimaryKey: 1, SecondaryKey: 5));\n\n// fast access via primary key\nData data = set[1];\n\n// fast access via secondary key\ndata = set.Single(x =\u003e x.SecondaryKey, 5);\n```\n\n\u003e ℹ Entities do not require a primary key. `IndexedSet\u003cTPrimaryKey, TData\u003e` inherits from `IndexedSet\u003cTData\u003e`\nbut provides convenient access to the automatically added unique index: `set[primaryKey]` instead \nof `set.Single(x =\u003e x.PrimaryKey, primaryKey)`.\n\n\n### Non-unique index\nDictionary-based, O(1), access on keys (single value) with multiple values (multiple keys):\n\n```csharp\nIndexedSet\u003cint, Data\u003e set = new Data[] { new(PrimaryKey: 1, SecondaryKey: 5), new(PrimaryKey: 2, SecondaryKey: 5) }\n        .ToIndexedSet(x =\u003e x.PrimaryKey)\n        .WithIndex(x =\u003e x.SecondaryKey)\n        .Build();\n\n// fast access via secondary key\nIEnumerable\u003cData\u003e data = set.Where(x =\u003e x.SecondaryKey, 5);\n```\n\n### Range index\nBinary-heap based O(log(n)) access for range based, smaller than (or equals) or bigger than (or equals) and orderby queries. Also useful to do paging sorted on exactly one index.\n\n```csharp\nIndexedSet\u003cData\u003e set = IndexedSetBuilder.Create(new Data[] { new(1, SecondaryKey: 3), new(2, SecondaryKey: 4) })\n                                        .WithRangeIndex(x =\u003e x.SecondaryKey)\n                                        .Build();\n\n// fast access via range query\nIEnumerable\u003cData\u003e data = set.Range(x =\u003e x.SecondaryKey, 1, 5);\n\n// fast max \u0026 min key value or elements\nint maxKey = set.Max(x =\u003e x.SecondaryKey);\ndata = set.MaxBy(x =\u003e x.SecondaryKey);\n\n// fast larger or smaller than\ndata = set.LessThan(x =\u003e x.SecondaryKey, 4);\n\n// fast ordering \u0026 paging\ndata = set.OrderBy(x =\u003e x.SecondaryKey, skip: 10).Take(10); // second page of 10 elements\n```\n\n### String indices and fuzzy matching\nPrefix- \u0026 Suffix-Trie based indices for efficient StartWith \u0026 String-Contains queries including support\nfor fuzzy matching.\n\n```csharp\nIndexedSet\u003cType\u003e data = typeof(object).Assembly.GetTypes()\n                                               .ToIndexedSet()\n                                               .WithPrefixIndex(x =\u003e x.Name)\n                                               .WithFullTextIndex(x =\u003e x.FullName)\n                                               .Build();\n\n// fast prefix or contains queries via indices\n_ = data.StartsWith(x =\u003e x.Name, \"Int\");\n_ = data.Contains(x =\u003e x.FullName, \"Int\");\n\n// fuzzy searching is supported by prefix and full text indices\n// the following will also match \"String\"\n_ = data.FuzzyStartsWith(x =\u003e x.Name, \"Strang\", 1);\n_ = data.FuzzyContains(x =\u003e x.FullName, \"Strang\", 1);\n```\n\n### Multi-key indices: All indices can be used with multiple keys\nThere are overloads for all indices that allow to use multiple keys. \n\nYou can have a unique index where each element can have multiple keys:\n\n```csharp\n\nIndexedSet\u003cint, Data\u003e set = IndexedSetBuilder\u003cData\u003e.Create(a =\u003e a.PrimaryKey)\n\t\t\t\t\t\t\t\t\t\t\t\t   .WithUniqueIndex(x =\u003e x.AlternativeKeys) // Where AlternativeKeys returns an IEnumerable\u003cint\u003e\n\t\t\t\t\t\t\t\t\t\t\t\t   .Build();\n\n_ = set.Add(new(PrimaryKey: 1, AlternativeKeys: new[] { 3, 4 }));\nset.Single(x =\u003e x.AlternativeKeys, 3); // returns above element\n```\n\nThe same applies for all other index types, for example for non-unique indices:\n\n```csharp\nIndexedSet\u003cint, GraphNode\u003e set = IndexedSetBuilder\u003cGraphNode\u003e.Create(a =\u003e a.Id)\n                                                             .WithIndex(x =\u003e x.ConnectsTo) // Where ConnectsTo returns an IEnumerable\u003cint\u003e\n                                                             .Build();\n\n//   1   2\n//   |\\ /\n//   | 3\n//    \\|\n//     4\n\n_ = set.Add(new(Id: 1, ConnectsTo: new[] { 3, 4 }));\n_ = set.Add(new(Id: 2, ConnectsTo: new[] { 3 }));\n_ = set.Add(new(Id: 3, ConnectsTo: new[] { 1, 2, 3 }));\n_ = set.Add(new(Id: 4, ConnectsTo: new[] { 1, 3 }));\n\n// For readability, it is recommended to write the name for the parameter contains\nIEnumerable\u003cGraphNode\u003e nodesThatConnectTo1 = set.Where(x =\u003e x.ConnectsTo, contains: 1); // returns nodes 3 \u0026 4\nIEnumerable\u003cGraphNode\u003e nodesThatConnectTo3 = set.Where(x =\u003e x.ConnectsTo, contains: 1); // returns nodes 1 \u0026 2 \u0026 3\n\n// Non-optimized Where(x =\u003e x.Contains(...)) query:\nnodesThatConnectTo1 = set.FullScan().Where(x =\u003e x.ConnectsTo.Contains(1)); // returns nodes 3 \u0026 4, but enumerates through the entire set\n```\n\n\u003e :information_source: For range queries, this introduces a small overhead as the results are filtered to be distinct: \n\u003e i.e. `O(log n + m log m)` instead of `O(log n + m)`.\n\n\u003e :information_source: Multi-key string indices are marked experimental. Read more at [Experimental Features](docs/ExperimentalFeatures.md#AkadeIndexedSetEXP0001)\n\n\n### Computed or compound key\n\nThe data structure also allows to use computed or compound keys:\n\n```csharp\nvar data = new RangeData[] { new(Start: 2, End: 10) };\nIndexedSet\u003cRangeData\u003e set = data.ToIndexedSet()\n                                .WithIndex(x =\u003e (x.Start, x.End))\n                                .WithIndex(x =\u003e x.End - x.Start)\n                                .WithIndex(ComputedKey.SomeStaticMethod)\n                                .Build();\n// fast access via indices\nIEnumerable\u003cRangeData\u003e result = set.Where(x =\u003e (x.Start, x.End), (2, 10));\nresult = set.Where(x =\u003e x.End - x.Start, 8);\nresult = set.Where(ComputedKey.SomeStaticMethod, 42);\n```\n\u003e ℹ For more samples, take a look at the unit tests.\n\n### Concurrency and Thread-Safety\n\nThe \"normal\" indexedset is not thread-safe, however, a ReaderWriterLock-based implementation is available.\nJust call `BuildConcurrent()` instead of `Build()`:\n\n```csharp\nConcurrentIndexedSet\u003cRangeData\u003e set = data.ToIndexedSet()\n                                          .WithIndex(x =\u003e (x.Start, x.End))\n                                          .BuildConcurrent();\n```\n\n\u003e ⚠ The concurrent implementation needs to materialize all query results.\u003cbr /\u003e\n\u003e `OrderBy` and `OrderByDescending` take an additional `count` parameter to avoid unnecessary materialization.\n\u003e You can judge the overhead [here](docs/Benchmarks.md#ConcurrentSet)\n### No reflection and no expressions - convention-based index naming\n\nWe are using the [CallerArgumentExpression](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.callerargumentexpressionattribute)-Feature \nof .Net 6/C# 10 to provide convention-based naming of the indices:\n- `set.Where(x =\u003e (x.Prop1, x.Prop2), (1, 2))` tries to use an index named `\"x =\u003e (x.Prop1, x.Prop2)\"`\n- `set.Where(ComputedKeys.NumberOfDays, 5)` tries to use an index named `\"ComputedKeys.NumberOfDays\"`\n- **Hence, be careful what you pass in. \n\u003e :information_source: The following naming conventions are recommended:\n\u003e - Use x as parameter name in any lambdas that determines an index name.\n\u003e - Do not use parentheses in any lambda that determines an index name.\n\u003e - Do not use block bodied in any lambda that determines an index name. \n\u003e - For complex indices, use a static method.\n\u003e [C# Analyzers](./Analyzers/Readme.md) are shipped with the package to spot incorrect index names.\n\nReasons\n- Simple and yet effective:\n  - Allows computed, compound, custom values etc. to be indexed without adding complexity...\n- Performance: No reflection at work and no (runtime) code-gen necessary\n- AOT-friendly including full trimming support\n\n## FAQs\n\n### How do I use multiple index types for the same property?\n\nUse \"named\" indices by using static methods:\n\n```csharp\nrecord Data(int PrimaryKey, int SecondaryKey);\n\nIndexedSet\u003cint, Data\u003e set = IndexedSetBuilder\u003cData\u003e.Create(x =\u003e x.PrimaryKey)\n                                                   .WithUniqueIndex(DataIndices.UniqueIndex)\n                                                   .WithRangeIndex(x =\u003e x.SecondaryKey)\n                                                   .Build();\n_ = set.Add(new(1, 4));\n// querying unique index:\nData data = set.Single(DataIndices.UniqueIndex, 4); // Uses the unique index\nData data2 = set.Single(x =\u003e x.SecondaryKey, 4); // Uses the range index\nIEnumerable\u003cData\u003e inRange = set.Range(x =\u003e x.SecondaryKey, 1, 10); // Uses the range index\n```\n\n\u003e ℹ We recommend using the lambda syntax for \"simple\" properties and static methods for more complicated ones. It's easy to read, resembles \"normal\" LINQ-Queries and all the magic strings are compiler generated.\n\n### How do I update key values if the elements are already in the set?\n**The implementation requires any keys of any type to never change the value while the instance is within the set**.\nYou can manually remove, update and add an object. However, there are some helper methods for that - which is especially\nuseful for the concurrent variant as it provides thread-safe serialized access.\n\n```csharp\n// updating a mutable property\n_ = set.Update(dataElement, e =\u003e e.MutableProperty = 7);\n// updating an immutable property\n_ = set.Update(dataElement, e =\u003e e with { SecondaryKey = 12 });\n// be careful: the dataElement still refers to the \"old\" record after the update method\n_ = set.Update(dataElement, e =\u003e e with { SecondaryKey = 12 });\n\n// updating in an concurrent set\nconcurrentSet.Update(set =\u003e\n{\n    // serialized access to the inner IndexedSet, where you can safely use above update methods\n    // in an multi-threaded environment\n});\n```\n\n### How do I do case-insensitve (fuzzy) string matching (Prefix, FullTextIndex)?\nRemember that you can index whatever you want, including computed properties. This also applies for fuzzy matching:\n\n```csharp\nIndexedSet\u003cData\u003e set = IndexedSetBuilder\u003cData\u003e.Create(x =\u003e x.PrimaryKey)\n                                              .WithFullTextIndex(x =\u003e x.Text.ToLowerInvariant())\n                                              .Build();\nIEnumerable\u003cData\u003e matches = set.FuzzyContains(x =\u003e x.Text.ToLowerInvariant(), \"Search\", maxDistance: 2);\n```\n\n## Roadmap\nPotential features (not ordered):\n- [x] Thread-safe version\n- [x] Easier updating of keys\n- [x] More index types (Trie)\n- [x] Range insertion and corresponding `.ToIndexedSet().WithIndex(x =\u003e ...).[...].Build()`\n- [x] Refactoring to allow a primarykey-less set: this was an artificial restriction that is not necessary\n- [x] Benchmarks\n- [x] Simplification of string indices, i.e. Span/String based overloads to avoid `AsMemory()`...\n- [x] Analyzers to help with best practices\n- [x] Multi-key everything: All index types can be used with multiple keys per element.\n- [ ] Tree-based range index for better insertion performance\n- [ ] Aggregates (i.e. sum or average: interface based on state \u0026 add/removal state update functions)\n- [ ] Custom (equality) comparer for indices\n- [ ] Helper functions for search scenarios (Searching in multiple properties, text-reprocessing \u0026 result merging)\n- [ ] Becnhmark vs elastic search\n\nIf you have any suggestion or found a bug / unexpected behavior, open an issue! I will also review PRs and integrate them if they fit the project.\n","funding_links":["https://github.com/sponsors/akade"],"categories":["Algorithms and Data structures","others"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakade%2FAkade.IndexedSet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fakade%2FAkade.IndexedSet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakade%2FAkade.IndexedSet/lists"}