{"id":15029274,"url":"https://github.com/attaswift/btree","last_synced_at":"2025-05-16T05:00:19.392Z","repository":{"id":41526929,"uuid":"48014723","full_name":"attaswift/BTree","owner":"attaswift","description":"Fast sorted collections for Swift using in-memory B-trees","archived":false,"fork":false,"pushed_at":"2022-02-23T10:17:30.000Z","size":2131,"stargazers_count":1319,"open_issues_count":16,"forks_count":75,"subscribers_count":31,"default_branch":"master","last_synced_at":"2025-05-08T15:18:35.494Z","etag":null,"topics":["btree","collection","orderedcollection","search-trees","swift"],"latest_commit_sha":null,"homepage":"","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/attaswift.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-12-15T02:32:26.000Z","updated_at":"2025-04-12T16:09:01.000Z","dependencies_parsed_at":"2022-09-12T18:40:48.853Z","dependency_job_id":null,"html_url":"https://github.com/attaswift/BTree","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attaswift%2FBTree","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attaswift%2FBTree/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attaswift%2FBTree/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/attaswift%2FBTree/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/attaswift","download_url":"https://codeload.github.com/attaswift/BTree/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254471028,"owners_count":22076582,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["btree","collection","orderedcollection","search-trees","swift"],"created_at":"2024-09-24T20:10:10.704Z","updated_at":"2025-05-16T05:00:17.718Z","avatar_url":"https://github.com/attaswift.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fast Sorted Collections for Swift\u003cbr\u003eUsing In-Memory B-Trees\n\n[![Swift 4.0](https://img.shields.io/badge/Swift-4.0-blue.svg)](https://swift.org)\n[![License](https://img.shields.io/badge/licence-MIT-blue.svg)](https://github.com/attaswift/BTree/blob/master/LICENSE.md)\n[![Platform](https://img.shields.io/badge/platforms-macOS%20∙%20iOS%20∙%20watchOS%20∙%20tvOS-blue.svg)](https://developer.apple.com/platforms/)\n\n[![Build Status](https://travis-ci.org/attaswift/BTree.svg?branch=master)](https://travis-ci.org/attaswift/BTree)\n[![Code Coverage](https://codecov.io/github/attaswift/BTree/coverage.svg?branch=master)](https://codecov.io/github/attaswift/BTree?branch=master)\n\n[![Carthage compatible](https://img.shields.io/badge/Carthage-compatible-4BC51D.svg)](https://github.com/Carthage/Carthage)\n[![CocoaPod Version](https://img.shields.io/cocoapods/v/BTree.svg)](http://cocoapods.org/pods/BTree)\n\n* [Overview](#overview)\n* [Reference Documentation](#api)\n* [Optimizing Collections: The Book](#book)\n* [What Are B-Trees?](#what)\n* [Why In-Memory B-Trees?](#why)\n* [Laundry List of Issues with Standard Collection Types](#boo)\n* [B-Trees to the Rescue!](#yay)\n* [Implementation Notes](#notes)\n* [Remark on Performance of Imported Generics](#generics)\n\n### \u003ca name=\"overview\"\u003eOverview\u003c/a\u003e\n\nThis project provides an efficient in-memory B-tree implementation in pure Swift, and several useful\nsorted collection types that use B-trees for their underlying storage.\n\n-   [`Map\u003cKey, Value\u003e`][Map] implements a sorted mapping from unique comparable keys to arbitrary values.\n    It is like `Dictionary` in the standard library, but it does not require keys to be hashable, \n    it has strong guarantees on worst-case performance, and it maintains its elements in a well-defined\n    order.\n\n-   [`List\u003cElement\u003e`][List] implements a random-access collection of arbitrary elements. \n    It is like `Array` in the standard library, but lookup, insertion and removal of elements at\n    any index have logarithmic complexity. \n    (`Array` has O(1) lookup, but insertion and removal at an arbitrary index costs O(n).)\n    Concatenation of two lists of any size, inserting a list into another list at any position,\n    removal of any subrange of elements, or extraction of an arbitrary sub-list are also\n    operations with O(log(*n*)) complexity.\n\n-   [`SortedSet\u003cElement\u003e`][SortedSet] implements a sorted collection of unique comparable elements.\n    It is like `Set` in the standard library, but lookup, insertion and removal of any element\n    has logarithmic complexity. Elements in an `SortedSet` are kept sorted in ascending order.\n    Operations working on full sets (such as taking the union, intersection or difference) \n    can take as little as O(log(*n*)) time if the elements in the source sets aren't interleaved.\n\n-   [`SortedBag\u003cElement\u003e`][SortedBag] implements a sorted [multiset][multiset] with\n    comparable elements. This is a generalization of a set that allows multiple instances of the same value.\n    (The standard library does not include such a collection, although you can use a dictionary to emulate one \n    by storing the multiplicities of the keys as values.)\n    The implementation provided in this package stores each duplicate element separately, which may come useful\n    if your elements are reference types with identities or you have some other means to distinguish between equal elements.\n    `SortedBag` operations have the same time complexities as the equivalent operations in `SortedSet`.\n\n-   [`BTree\u003cKey, Value\u003e`][BTree] is the underlying primitive collection that serves as base storage\n    for all of the above collections. It is a general sorted key-value store with full support\n    for elements with duplicate keys; it provides a sum of all operations individually provided\n    by the higher-level abstractions above (and more!).\n\n    The `BTree` type is public; you may want to use it if you need a collection flavor that \n    isn't provided by default (such as a multimap) \n    or if you need to use an operation that isn't exposed by the wrappers.\n    \nAll of these collections are structs and they implement the same copy-on-write value semantics as\nstandard Swift collection types like `Array` and `Dictionary`. (In fact, copy-on-write works even\nbetter with these than standard collections; continue reading to find out why!)\n\n[Map]: http://attaswift.github.io/BTree/api/Structs/Map.html\n[List]: http://attaswift.github.io/BTree/api/Structs/List.html\n[SortedSet]: http://attaswift.github.io/BTree/api/Structs/SortedSet.html\n[SortedBag]: http://attaswift.github.io/BTree/api/Structs/SortedBag.html\n[multiset]: https://en.wikipedia.org/wiki/Set_(abstract_data_type)#Multiset\n\nThe latest version of `BTree` requires Swift 4. (The last release supporting Swift 3 was 4.0.2.)\n\n### \u003ca name=\"api\"\u003e[Reference Documentation][doc]\u003c/a\u003e\n\nThe project includes [a nicely formatted reference document][doc] generated from the documentation comments\nembedded in its source code.\n\n[doc]: http://attaswift.github.io/BTree/api\n\n### \u003ca name=\"book\"\u003e[Optimizing Collections: The Book][OptimizingCollections]\u003c/a\u003e\n\nIf you want to learn more about how this package works, the book\n[Optimizing Collections][OptimizingCollections] includes detailed explanations of\nmany of the algorithms and optimization tricks implemented by this package – and so, so much more.\nIt is written by the same author, and published by the fine folks at objc.io.\nBuying a copy of the book is not only a nice way to support this project, it also gets you something quite interesting to read.\nWin-win!\n\n[![Optimizing Collections (eBook)](docs/images/OptimizingCollections.png)][OptimizingCollections]\n\n[OptimizingCollections]: https://www.objc.io/books/optimizing-collections/\n\n### \u003ca name=\"what\"\u003eWhat Are B-Trees?\u003c/a\u003e\n\n[B-trees][B-tree wiki] are search trees that provide a sorted key-value store with excellent performance\ncharacteristics.  In essence, each node maintains a sorted array of its own elements, and\nanother array for its children.  The tree is kept balanced by three constraints: \n\n1. Only the root node is allowed to be less than half full.\n2. No node may be larger than the maximum size.\n3. The leaf nodes are all at the same level.\n\nCompared to other popular search trees such as [red-black trees][red-black tree] or [AVL trees][avl wiki], \nB-trees have huge nodes: nodes often contain hundreds (or even thousands) of key-value pairs and children.\n\nThis module implements a \"vanilla\" B-tree where every node contains full key-value pairs. \n(The other popular type is the [B+-tree][b-plus tree] where only leaf nodes contain values; \ninternal nodes contain only copies of keys.\nThis often makes more sense on an external storage device with a fixed block size, but it seems less useful for\nan in-memory implementation.)\n\nEach node in the tree also maintains the count of all elements under it. \nThis makes the tree an [order statistic tree], where efficient positional lookup is possible.\n\n[B-tree wiki]: https://en.wikipedia.org/wiki/B-tree\n[red-black tree]: https://github.com/attaswift/RedBlackTree\n[avl wiki]: https://en.wikipedia.org/wiki/AVL_tree\n[order statistic tree]: https://en.wikipedia.org/wiki/Order_statistic_tree\n[b-plus tree]: https://en.wikipedia.org/wiki/B%2B_tree\n\n### \u003ca name=\"why\"\u003eWhy In-Memory B-Trees?\u003c/a\u003e\n\nThe Swift standard library offers heavily optimized arrays and hash tables, but omits linked lists and\ntree-based data structures. This is a result of the Swift engineering team spending resources \n(effort, code size) on the abstractions that provide the biggest bang for the buck. \n\n\u003e Indeed, the library lacks even a basic [double-ended queue][deque] construct -- \n\u003e although Cocoa's `Foundation` framework does include one in `NSArray`.\n\n[deque]: https://github.com/attaswift/Deque\n\nHowever, some problems call for a wider variety of data structures. \n\nIn the past, linked lists and low-order search trees such as red-black trees were frequently employed;\nhowever, the performance of these constructs on modern hardware is greatly limited\nby their heavy use of pointers.\n\n[B-trees][B-tree wiki] were originally invented in the 1970s as a data structure for slow external storage\ndevices. As such, they are strongly optimized for locality of reference: \nthey prefer to keep data in long contiguous buffers and they keep pointer derefencing to a minimum.\n(Dereferencing a pointer in a B-tree usually meant reading another block of data from the spinning hard drive,\nwhich is a glacially slow device compared to the main memory.)\n\nToday's computers have multi-tiered memory architectures; they rely on caching to keep the system\nperformant. This means that locality of reference has become a hugely important property for in-memory\ndata structures, too.\n\nArrays are the epitome of reference locality, so the Swift stdlib's heavy emphasis on `Array` as the\nuniversal collection type is well justified.\n\nFor example, using a single array to hold a sorted list of items has quite horrible (quadratic) asymptotic\ncomplexity when there are many elements. However, up to a certain maximum size, a simple array is in fact \nthe most efficient way to represent a sorted list.\n\n![Typical benchmark results for sorted collections](docs/images/Sorted%20Collections%20in%20Swift.png)\n\nThe benchmark above demonstrates this really well: insertion of *n* elements into a sorted array \ncosts O(n^2) when there are many items, but for many reasonably sized data sets, it is still much faster \nthan creating a red-black tree with its fancypants O(n * log(n)) complexity. \n\nNear the beginning of the curve, up to about *eighteen thousand items*, a sorted array implementation\nimported from an external module is very consistently about 6-7 times faster than a red-black tree, with a\nslope that is indistinguishable from O(n * log(n)).\n\nEven after it catches up to quadratic complexity, in this particular benchmark, \nit takes about a *hundred thousand items* for the sorted\narray to become slower than the red-black tree! \n\n\u003e The exact cutoff point depends on the type/size of elements that you work with, and the capabilities \n\u003e of the compiler. This benchmark used tiny 8-byte integer elements, hence the huge number.\n\n\u003e The benchmark is based on [my own red-black tree implementation][red-black tree] that uses a single flat array to store\n\u003e node data. A [more typical implementation][airspeed-velocity] would store each node in a separately allocated object, so\n\u003e it would likely be even slower.\n\n[airspeed-velocity]: http://airspeedvelocity.net/2015/07/22/a-persistent-tree-using-indirect-enums-in-swift/\n\n\u003e The chart above is a [log-log plot][loglog] which makes it easy to compare the polynomial exponents of \n\u003e the complexity curves of competing algorithms at a glance. The slope of a quadratic algorithm on a log-log chart\n\u003e (like insertion into a sorted array---the green curves) is twice of that of a \n\u003e linear algorithm (like appending *n* items to an unsorted array---light blue curve) or a quasilinear one \n\u003e (like inserting into a red-black tree, red curve).\n\n[loglog]: https://en.wikipedia.org/wiki/Log–log_plot\n\n\u003e Note that the big gap between collections imported from\n\u003e stdlib and those imported from external modules is caused by a [limitation in the current Swift compiler/ABI](#perf):\n\u003e when this limitation is lifted, the gap will narrow considerably, which will reduce the element count\n\u003e at which you'll be able to reap the benefits of lower asymptotic complexity.\n\n\u003e (This effect is already visible (albeit in reverse) on the benchmark for the \"inlined\" sorted array (light green), \n\u003e which is essentially the same code as the regular one (dark green) except it was implemented\n\u003e in the same module as the benchmarking loop, so the compiler has more options to optimize away\n\u003e witness tables and other levels of abstraction. That line starts curving up much sooner, at about 2000 \n\u003e items--imagine having a B-tree implementation that's equally fast! Or better, try it yourself and report your\n\u003e results. Producing benchmarks like this takes a lot of time and effort.) :-)\n\n\nThis remarkable result is due in large part to the vast number of (to a CPU, random-looking) memory \nreferences that are needed to operate on red-black trees. \nTheir [intricate ballet of tree rotations][rbtree-animation] looks mighty impressive \nto us mere humans, but to the delicate caches of your poor CPU, \nit looks more like a drunken elephant [moshing at a thrash metal concert][moshing].\n\n[rbtree-animation]: https://youtu.be/m9tse9Gr2pE?t=209\n[moshing]: https://en.wikipedia.org/wiki/Moshing\n\nMeanwhile, the humble `Array` does the only thing it knows: sliding around\nlong contiguous memory regions. It does this over and over, ad nauseum. It doesn't look impressive,\nbut (up to a point) it fits well with how computers work.\n\nSo a small `Array` is perfect for maintaining a sorted list. But what if the list gets too long?\nThe B-tree's answer is to simply cut the array in half, and to create a new index tree node on top to allow \nit to quickly find its way around this more complex list structure. \nThese internal index nodes can also consist of arrays of elements and node references, \ncreating a nice recursive data structure.\n\nBecause their fanout number is so high, B-trees are extremely shallow: for a B-tree with order 100 (which\nis actually rather on the low end), you can fit a billion items into a tree that's not more than five levels deep.\n\nOnce you accept that small arrays are fast, it is easy to see why B-trees work so well: unless it holds more\nelements than its order, a B-tree quite literally **is** just an `Array`. \nSo it has the same performance behavior as an `Array` for a small number of elements, \nand when it grows larger it prevents a quadratic upswing by never allowing its arrays to get too large.\nThe yellow curve on the benchmark above demonstrates this behavior well.\n\nConsider that each node in a typical B-tree can hold about *ten full levels of a red-black tree* \n(or AVL trees or whatever binary tree you like). \nLooking up an item in a B-tree node still requires a binary search of the node\narray, but this search works on a contiguous memory region, while the conventional search tree\nis fiddling around with loading pointer values and dereferencing them.\n\nSo it makes perfect sense to employ B-trees as an in-memory data structure.\n\nThink about this, though: how many times do you need to work with a hundred thousand\nsorted items in a typical app? Or even twenty thousand? Or even just two thousand? The most interesting\nbenefits of B-trees often occur at element counts well over a hundred thousand.\nHowever, B-trees are not much slower than arrays for low element counts (remember, they *are* arrays in that\ncase), so it makes sense to use them when there's even a slight chance that the count will get large.\n\n### \u003ca name=\"boo\"\u003eLaundry List of Issues with Standard Collection Types\u003c/a\u003e\n\nThe data structures implemented by `Array`, `Dictionary` and `Set` are remarkably versatile:\na huge class of problems is easily and efficiently solved by simple combinations of these abstractions.\nHowever, they aren't without drawbacks: you have probably run into cases when the standard collections\nexhibit suboptimal behavior:\n\n1.  Insertion and removal in the middle of an `Array` can be slow when there are many items. (Keep the previous section in mind, though.)\n\n2.  The all-or-nothing [copy-on-write behavior][cow] of `Array`, `Dictionary` and `Set` can lead to performance problems\n    that are hard to detect and fix.\n    If the underlying storage buffer is being shared by multiple collection instances, the modification of a single element \n    in any of the instances requires creating a full copy of every element. \n    \n    It is not at all obvious from the code when this happens, and it is even harder to reliably check for. \n    You can't (easily) write unit tests to check against accidental copying of items with value semantics!\n\n3.  With standard collection types, you often need to think about memory management.\n\n    Arrays and dictionaries never release memory until they're entirely deallocated; \n    a long-lived collection may hold onto a large piece of memory due to an earlier, temporary spike in the \n    number of its elements. This is a form of subtle resource leak that can be hard to detect.\n    On memory-constrained systems, wasting too much space may cause abrupt process termination.\n\n    Appending a new element to an array, or inserting a new element into a dictionary or a set are \n    usually constant time operations, but they sometimes take O(*n*) time when the collection exhausts its allocated capacity.\n    These spikes in execution time are often undesired, but preventing them requires careful size analysis.  \n    If you reserve too little space, you'll still get spikes; if you reserve too much, you're wasting memory.\n    \n4.  The order of elements in a `Dictionary` or a `Set` is undefined, and it isn't even stable:\n    it may change after seemingly simple mutations. Two collections with the exact same set of elements may store\n    them in wildly different order.\n\n5.  Hashing collections require their keys to be `Hashable`. If you want to use your own type as the key, \n    you need to write a hash function yourself. It is annoyingly hard to write a good hash function, and \n    it is even harder to test that it doesn't produce too many collisions for the sets of values your code \n    will typically use.\n\n6.  The possibility of hash collisions make `Dictionary` and `Set` badly suited for tasks which require\n    guaranteed worst-case performance. (E.g. server code may face low-bandwidth denial of service attacks due to\n    [artificial hash collisions][hash dos].)\n\n7.  Array concatenation takes O(*n*) time, because it needs to put a copy of every element from both arrays \n    into a new contiguous buffer.\n\n8.  Merging dictionaries or taking the union/intersection etc. of two sets are all costly\n    O(*n*) operations, even if the elements aren't interleaved at all.\n\n9.  Creating an independently editable sub-dictionary or subset requires elementwise iteration over either\n    the entire collection, or the entire set of potential target items. This is often impractical, especially\n    when the collection is large but sparse.\n    \n    Getting an independently editable sub-array out of an array takes time that is linear in the size of the result. \n    (`ArraySlice` is often helpful, but it is most effective as a short-lived read-only view in temporary local variables.)\n\n\nThese issues don't always matter. In fact, lots of interesting problems can be solved without \nrunning into any of them. When they do occur, the problems they cause are often insignificant.\nEven when they cause significant problems, it is usually straightforward to work around them by chosing a\nslightly different algorithm. \n\nBut sometimes you run into a case where the standard collection types are too slow, \nand it would be too painful to work around them.\n    \n[hash dos]: http://arstechnica.com/business/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/\n[cow]: https://en.wikipedia.org/wiki/Copy-on-write\n\n\n### \u003ca name=\"yay\"\u003eB-Trees to the Rescue!\u003c/a\u003e\n\nB-trees solve all of the issues above. \n(Of course, they come with a set of different issues of their own. Life is hard.)\n\nLet's enumerate:\n\n1.  Insertion or removal from any position in a B-tree-based data structure takes O(log(*n*)) time, no matter what.\n\n2.  Like standard collection types, B-trees implement full copy-on-write value semantics.\n    Copying a B-tree into another variable takes O(1) time; mutations of a copy do not affect the original instance.\n    \n    However, B-trees implement a greatly improved version of copy-on-write that is not all-or-nothing: \n    each node in the tree may be independently shared with other trees. \n    \n    If you need to insert/remove/update a single element, B-trees will copy at most O(log(*n*)) elements to satisfy\n    value semantics, even if the tree was entirely shared before the mutation.\n\n3.  Storage management in B-trees is granular; you do not need to reserve space for a B-tree in advance, and\n    it never allocates more memory than it needs to store the actual number of elements it contains.\n    \n    Storage is gradually allocated and released in small increments as the tree grows and shrinks.\n    Storage is only copied when mutating shared elements, and even then it is done in small batches.\n    \n    The performance of B-trees is extremely stable, with no irregular spikes ever.\n    \n    (Note that there is a bit of leeway in allocations to make it easy to balance the tree. \n    In the worst case, a B-tree may only fill 50% of the space it allocates. The ratio is typically \n    much higher than that, though.)\n\n4.  B-trees always keep their items sorted in ascending key order, and they provide efficient positional lookups.\n    You can get the *i*th smallest/largest item in a tree in O(log(*n*)) time.\n\n5.  Keys of a B-tree need to be `Comparable`, not `Hashable`. It is often significantly easier to \n    write comparison operators than hash functions; it is also much easier to verify that the implementation works \n    correctly. A buggy `\u003c` operator will typically lead to obvious issues that are relatively easy to catch; \n    a badly collisioning hash may go undetected for years.\n\n6.  Adversaries (or blind chance) will never produce a set of elements for which B-trees behave especially badly.\n    The performance of B-trees only depends on the size of the tree, not its contents. \n    (Provided that key comparison also behaves uniformly, of course. \n    If you allow multi-megabyte strings as keys, you're gonna have a bad time.)\n\n7.  Concatenation of any two B-trees takes O(log(*n*)) time. For trees that aren't of a trivial size, the result \n    will share some of its nodes with the input trees, deferring most copying until the time the tree needs to be modified.\n    (Which may never happen.) Copy-on-write really shines with B-trees!\n    \n8.  Merging the contents of two B-trees into a single tree takes O(*n*) time in the worst case, but\n    if the elements aren't too badly interleaved, it can often finish in O(log(*n*)) time by linking entire subtrees\n    into the result in one go.\n    \n    Set operations on the keys of a B-tree (such as calculating the intersection set, subtraction set, \n    symmetric difference, etc.) also exploit the same trick for a huge performance boost.\n    If the input trees are mutated versions of the same original tree, these operations are also able \n    to skip elementwise processing of entire subtrees that are shared between the inputs.\n\n9.  The `SubSequence` of a B-tree is also a B-tree. You can slice and dice B-trees any way you like:\n    getting a fully independent copy of any prefix, suffix or subrange in a tree only takes O(log(*n*)) time.\n    You can then take the subtree you extracted and insert it into another tree; this also costs O(log(*n*)), \n    no matter where in the tree you want to put it. (You do need to keep the order of keys correct, though.)\n\n\n### \u003ca name=\"notes\"\u003eImplementation Notes\u003c/a\u003e\n\n-   [`BTree`][BTree] is a generic struct with copy-on-write value semantics.  Internally, it stores its data in\n    nodes with a fixed maximum size, arranged in a tree.  `BTree` type provides a full set of hand-tuned \n    high-level operations to work with elements of a B-tree.\n    \n    Nodes are represented by instances of a [reference type][BTreeNode] that is not exported as public API.\n    (Low-level access to individual tree nodes would be tricky to get right, and it would prevent\n    future optimizations, such as moving node counts up to parent nodes.)\n\n-   By default, the tree order (a.k.a., the fanout, or the maximum number of children) is set such\n    that [each node stores about 16KiB data][bTreeNodeSize]. Larger node sizes make lookups faster, while\n    insertion/removal becomes slower -- 16KiB is a good enough approximation of the optimal node size\n    on most modern systems.  (But you can also set a custom node size if you know better. Note though\n    that you cannot mix-n-match trees of different orders.)  Thus, on a 64-bit system, a B-tree\n    holding `Int` elements will store about 2047 elements per node. Wow!\n\n[bTreeNodeSize]: https://github.com/attaswift/BTree/blob/master/Sources/BTreeNode.swift#L23\n\n-   Individual B-tree nodes may be independently shared between multiple B-trees.  When mutating a\n    (partially or fully) shared tree, copy-on-write is restricted to only clone the nodes whose subtree is\n    actually affected by the mutation. This has the following consequences:\n  \n    - Nodes cannot contain a reference to their parent node, because it is not necessarily unique. \n    \n    - Mutations of shared trees are typically much cheaper than copying the entire collection at once, \n      which is what standard collection types do.\n      \n    - The root node is never shared between trees that are not equal.\n\n-   [`BTree`][BTree] allows elements with duplicate keys to be stored in the tree. \n    (In fact, `List` works by using the same (empty) key for all elements.) \n\n    All methods that take a key to find an element [let you (optionally) specify][BTreeKeySelector] if you\n    want to work with the first or last matching element, or if you're happy with any match. The latter\n    option is sometimes faster as it often allows the search to stop at the topmost matching element. There\n    is also a selector that looks for the element *after* the specified key -- this can be nice to determine\n    the position of the end of a range of matching items.\n\n-   Each node keeps track of the number of items in its entire subtree, so \n    [efficient positional lookup][BTree.elementAtOffset]\n    is possible.  For any *i*, you can get, set, remove or insert the *i*th item in the tree in log(n) time.\n\n-   There is a [`BTreeIterator`][BTreeIterator] and a [`BTreeIndex`][BTreeIndex] that provide the\n    usual generator/indexing semantics. While individual element lookup usually takes O(log(n))\n    operations, iterating over all elements via these interfaces requires linear time. Using the\n    generator is faster than indexing, so you should prefer using it whenever possible. \n    There are methods to start an iterator from the middle of the tree: \n    from any offset, any index, or any key.\n    \n-   Note that [`forEach`][BTree.forEach] has a specialized recursive implementation, \n    which makes it the fastest way to iterate over B-trees. There is even a variant that allows you\n    to stop the iteration when you had seen enough items and want to get off the carousel.\n\n-   [`BTreeCursor`][BTreeCursor] is an easy-to-use, general-purpose batch editing facility that allows you to\n    manipulate the elements of a B-tree conveniently and highly efficiently. You can use a cursor to\n    walk over the contents of a tree, modifying/inserting/removing elements as needed without a\n    per-element log(n) lookup overhead. If you need to insert or remove a bunch or consecutive elements,\n    it is better to use the provided bulk removal/insertion methods than to process them individually \n    (Range operations have O(log(*n*)) complexity vs. elementwise processing takes O(*k* * log(n)).)\n    \n-   Internally, navigation in a B-Tree is based on abstract primitives that maintain a path to a particular\n    position in the tree, as described by the [`BTreePath`][BTreePath] protocol. The methods directly\n    provided by this protocol are too low-level for convenient use, but the protocol has extension methods\n    built on top of these that support familiar concepts like moving back and forth step by step, jumping to\n    a specific offset in the tree, or looking up a particular key.\n    \n    Indexes, generators and cursors use their particular implementation of `BTreePath` to represent their\n    own path flavors. All three of them maintain a path of nodes from the root of the tree to a particular\n    slot of a particular node, but the details are very different:\n    \n    - A [`BTreeIndex`][BTreeIndex] may not hold a strong reference to its tree, because that would \n      interfere with copy-on-write when you want to mutate the tree at a certain index. Thus, indices\n      are wrappers around a [`BTreeWeakPath`][BTreeWeakPath], which uses weak references, and \n      needs to tread very carefully in order to detect when one of its references gets out of date.\n      \n    - Meanwhile a [`BTreeIterator`][BTreeIterator] is supposed to support standalone iteration over the\n      contents of the tree, so it must contain strong references. It uses a\n      [`BTreeStrongPath`][BTreeStrongPath] to represent the path of its next element. While an iterator only\n      needs to be able to move one step forward, `BTreeStrongPath` supports the full tree navigation API,\n      making it very useful elsewhere in the codebase whenever we need a kind of read-only cursor into a\n      tree. For example, the tree merging algorithm uses strong paths to represent its current positions in\n      its input trees.\n      \n    - Finally, a [`BTreeCursor`][BTreeCursor] needs to maintain a path where each node is uniquely\n      held by the cursor, ready for mutation. (A cursor owns its own copy of the tree, and does\n      not share it with the outside world until it is finished.) \n      This special path flavor is implemented by [`BTreeCursorPath`][BTreeCursorPath].\n      To speed things up, this struct intentionally breaks the node counts on its current path, \n      to allow for super speedy elementwise insertions and removals. The counts are carefully recalculated\n      whenever the path moves off a node's branch in the tree.\n          \n[BTreePath]: https://github.com/attaswift/BTree/blob/master/Sources/BTreePath.swift\n[BTreeWeakPath]: https://github.com/attaswift/BTree/blob/master/Sources/BTreeIndex.swift#L87\n[BTreeStrongPath]: https://github.com/attaswift/BTree/blob/master/Sources/BTreeIterator.swift#L74\n[BTreeCursorPath]: https://github.com/attaswift/BTree/blob/master/Sources/BTreeCursor.swift#L96\n\n-   It would be overkill to create an explicit path to look up or modify a single element in the tree\n    on its own, so `BTree` also provides a [set of recursive methods][BTree-lookups] that \n    implement the same sort of lookups and simple mutations. \n    They are faster when you need to retrieve a single item, but they aren't efficient when called repeatedly.\n    \n[BTree-lookups]: https://github.com/attaswift/BTree/blob/master/Sources/BTree.swift#L280-L419\n\n-   `BTree` includes a [bulk loading algorithm][BTree.bulkLoad] that efficiently initializes fully loaded\n    trees from any sorted sequence. You can also specify a fill factor that's less than 100% if you expect to\n    insert data into the middle of the tree later; leaving some space available may reduce work to keep the\n    tree balanced. The bulk loader can optionally filter out duplicate keys for you. It verifies that the\n    elements are in the correct order and traps if they aren't.\n    \n    The bulk loader is based on a general [`BTreeBuilder`][BTreeBuilder] struct that specializes on\n    appending elements to a newly created tree. Beside individual elements, it also supports efficiently \n    appending entire B-trees. This comes useful in optimized tree merging algorithms.\n\n[BTree.bulkLoad]: http://attaswift.github.io/BTree/api/Structs/BTree.html#/s:FV5BTree5BTreecuRd__s8SequenceWd__8Iterator7Element_zTxq__rFT14sortedElementsqd__14dropDuplicatesSb5orderSi10fillFactorSd_GS0_xq__\n[BTreeBuilder]: https://github.com/attaswift/BTree/blob/master/Sources/BTreeBuilder.swift\n    \n-   [Constructing a B-tree from an unsorted sequence of elements][BTree.unsorted-load] inserts the elements into the tree one by\n    one; no buffer is allocated to sort elements before loading them into the tree. This is done more\n    efficiently than calling [an insertion method][BTree.insert] with each element one by one, but it is likely still slower than\n    a quicksort. (So sort elements on your own if you can spare the extra memory.)\n\n[BTree.insert]: http://attaswift.github.io/BTree/api/Structs/BTree.html#/Insertion\n[BTree.unsorted-load]: http://attaswift.github.io/BTree/api/Structs/BTree.html#/s:FV5BTree5BTreecuRd__s8SequenceWd__8Iterator7Element_zTxq__rFTqd__14dropDuplicatesSb5orderSi_GS0_xq__\n\n-   The package contains O(log(n)) methods to [extract a range of elements as a new B-tree][BTree.subtree]\n    and to [insert a B-tree into another B-tree][BTreeCursor.insertTree]. (Keys need to remain sorted\n    correctly, though.)\n    \n-   Merge operations (such as [`BTree.union`][BTree.union] and [`BTree.symmetricDifference`)][BTree.symmetricDifference]\n    are highly tuned to detect when they can skip over entire subtrees on their input, linking them into the result or \n    skipping their contents as required. For input trees that contain long runs of distinct elements, these operations\n    can finish in as little as O(log(*n*)) time. These algorithms are expressed on top of a general\n    tree merging construct called [`BTreeMerger`][BTreeMerger].\n\n[BTree]: http://attaswift.github.io/BTree/api/Structs/BTree.html\n[BTreeNode]: https://github.com/attaswift/BTree/blob/master/Sources/BTreeNode.swift\n[BTreeKeySelector]: http://attaswift.github.io/BTree/api/Enums/BTreeKeySelector.html\n[BTreeIterator]: http://attaswift.github.io/BTree/api/Structs/BTreeIterator.html\n[BTreeIndex]: http://attaswift.github.io/BTree/api/Structs/BTreeIndex.html\n[BTreeCursor]: http://attaswift.github.io/BTree/api/Classes/BTreeCursor.html\n[BTree.elementAtOffset]: http://attaswift.github.io/BTree/api/Structs/BTree.html#/s:FV5BTree5BTree7elementFT8atOffsetSi_Txq__\n[BTree.forEach]: http://attaswift.github.io/BTree/api/Structs/BTree.html#/s:FV5BTree5BTree7forEachFzFzTxq__T_T_\n[BTreeCursor.insertTree]: http://attaswift.github.io/BTree/api/Classes/BTreeCursor.html#/s:FC5BTree11BTreeCursor6insertFGVS_5BTreexq__T_\n[BTree.subtree]: http://attaswift.github.io/BTree/api/Structs/BTree.html#/s:FV5BTree5BTree7subtreeFT4fromx2tox_GS0_xq__\n[BTree.union]: http://attaswift.github.io/BTree/api/Structs/BTree.html#/s:FV5BTree5BTree5unionFTGS0_xq__2byOS_21BTreeMatchingStrategy_GS0_xq__\n[BTree.symmetricDifference]: http://attaswift.github.io/BTree/api/Structs/BTree.html#/s:FV5BTree5BTree19symmetricDifferenceFTGS0_xq__2byOS_21BTreeMatchingStrategy_GS0_xq__\n[BTreeMerger]: https://github.com/attaswift/BTree/blob/master/Sources/BTreeMerger.swift#L318\n\n### \u003ca name=\"generics\"\u003eRemark on Performance of Imported Generics\u003c/a\u003e\n\u003ca name=\"perf\"\u003e\u003c/a\u003e\n\nCurrent versions of the Swift compiler are unable to specialize generic types that are imported from \nexternal modules other than the standard library. (In fact, it is not entirely incorrect to say that \nthe standard library works as if it was compiled each time anew as part of every Swift module rather than linked in \nas an opaque external binary.)\n\nThis limitation puts a considerable limit on the raw performance achievable by collection types imported\nfrom external modules, especially if they are parameterized with simple, extremely optimizable \nvalue types such as `Int` or even `String`.\nRelying on `import` will incur a *10-200x slowdown* when your collection is holding these most basic \nvalue types. (The effect is much reduced for reference types, though.)\n\nWithout access to the full source code of the collection, the compiler is unable to optimize away abstractions\nlike virtual dispatch tables, function calls and the rest of the *fluff* we've learned to mostly ignore\ninside a module. In cross-module generics, even retrieving a single `Int` will necessarily go through \nat least one lookup to a virtual table. This is because the code that implements the unspecialized generic also executes \nfor type parameters that contain reference types, whose reference count needs to be maintained.\n\nIf raw performance is essential, currently the only way out of this pit is to put the collection's code inside \nyour module. (Other than hacking stdlib to include these extra types, of course -- but that is a bad idea\nfor a thousand obvious reasons.) However, having each module maintain its own set of collections would smell \nhorrible, plus it would make it hard or impossible to transfer collection instances across module boundaries.\nPlus, if this strategy would be used across many modules, it would lead to a C++ templates-style (or worse) code explosion.\nA better (but still rather unsatisfactory) workaround is to compile the collection code with the single module \nthat benefits most from specialization. The rest of the modules will still have access to it, if in a much slower way.\n\nThe Swift compiler team has plans to address this issue in future compiler versions, e.g., by allowing library authors \nto manually specialize generics for a predetermined set of type parameters.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fattaswift%2Fbtree","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fattaswift%2Fbtree","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fattaswift%2Fbtree/lists"}