{"id":13613216,"url":"https://github.com/timtadh/fs2","last_synced_at":"2025-06-29T18:05:10.387Z","repository":{"id":26340624,"uuid":"29789408","full_name":"timtadh/fs2","owner":"timtadh","description":"B+ Tree - List - File Structures 2 - Memory Mapped File Structures for Go","archived":false,"fork":false,"pushed_at":"2019-03-29T14:49:15.000Z","size":322,"stargazers_count":409,"open_issues_count":5,"forks_count":36,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-04-13T15:42:01.519Z","etag":null,"topics":["btree","go","golang","list","mmap"],"latest_commit_sha":null,"homepage":"http://hackthology.com/fs2/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timtadh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-01-24T19:53:13.000Z","updated_at":"2025-04-05T06:52:05.000Z","dependencies_parsed_at":"2022-07-27T08:02:13.902Z","dependency_job_id":null,"html_url":"https://github.com/timtadh/fs2","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/timtadh/fs2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timtadh%2Ffs2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timtadh%2Ffs2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timtadh%2Ffs2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timtadh%2Ffs2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timtadh","download_url":"https://codeload.github.com/timtadh/fs2/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timtadh%2Ffs2/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262642953,"owners_count":23341817,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["btree","go","golang","list","mmap"],"created_at":"2024-08-01T20:00:41.804Z","updated_at":"2025-06-29T18:05:10.042Z","avatar_url":"https://github.com/timtadh.png","language":"Go","funding_links":[],"categories":["Go","Repositories"],"sub_categories":[],"readme":"# fs2 - File Structures 2\n\nby Tim Henderson (tadh@case.edu)\n\nLicensed under the GNU GPL version 3 or at your option any later version. If you\nneed another licensing option please contact me directly.\n\n### What is this?\n\n1. A [B+ Tree](#b-tree) implementation\n2. A [list](#mmlist) implementation supporting O(1) Append, Pop, Get and Set operations.\n3. A [command](#fs2-generic) to generate type specific wrappers around the above\n   structures. It's generic, in Go, kinda.\n4. A [platform](#fmap) for implementing memory mapped high performance file\n   structures in Go.\n\n### Why did you make this?\n\nIn my academic research some of the algorithms I work on (such as frequent\nsubgraph mining) have exponential characteristics in their memory usage. In\norder to run these algorithms on larger data sets they need to be able to\ntransparently cache less popular parts of the data to disk. However, in general\nit is best to keep as much data in memory as possible.\n\nOf course, there are many options for such data stores. I could have used a off\nthe shelf database, however I also want to explore ways to achieve higher\nperformance than those solutions offer.\n\n#### Have you worked on this type of system before?\n\nI have! In the golang world I believe I was the first to implement a disk backed\nB+ Tree. Here is an [early\ncommit](https://github.com/timtadh/file-structures/commit/aedff4a077e16eb87e2d0f8ed4bc676debf7c572)\nfrom  my [file-structures\nrepository](https://github.com/timtadh/file-structures). Note the date: February\n21, 2010. Go was made public in November of 2009 (the first weekly was November\n06, 2009). I started work on the B+ Tree in January of 2010.\n\nThis particular experiment is a follow up to my work in the file-structures\nrepository. I have used those structures successfully many times but I want to\nexperiment with new ways of doing things to achieve better results. Besides my\ndisk backed versions of these structures you can also find good implementations\nof in memory version in my\n[data-structures repository](https://github.com/timtadh/data-structures).\n\n#### Limitations\n\nCurrently, fs2 is only available for Linux. I am looking for assistance making a\ndarwin (Mac OS) port and a Windows port. Please get in touch if you are\ninterested and have a Mac or Windows PC.\n\n## B+ Tree\n\n[docs](https://godoc.org/github.com/timtadh/fs2/bptree)\n\nThis is a disk backed low level \"key-value store\". The closest thing similar to\nwhat it offers is [Bolt DB](https://github.com/boltdb/bolt).  My [blog\npost](http://hackthology.com/lessons-learned-while-implementing-a-btree.html)\nis a great place to start to learn more about the ubiquitous B+ Tree.\n\n### Features\n\n1. Variable length size key or fixed sized keys. Fixed sized keys should be kept\n   relatively short, less than 1024 bytes (the shorter the better). Variable\n   length keys can be up to 2^31 - 1 bytes long.\n\n2. Variable length values or fixed sized values. Fixed sized values should also\n   be kept short, less than 1024 bytes. Variable length values can be up to\n   2^31 - 1 bytes long.\n\n3. Duplicate key support. Duplicates are kept out of the index and only occur in\n   the leaves.\n\n4. Data is only written to disk when you tell it (or when need due to OS level\n   page management).\n\n5. Simple (but low level) interface.\n\n6. Can operate in either a anonymous memory map or in a file backed memory map.\n   If you plan to have a very large tree (even one that never needs to be\n   persisted) it is recommend you use a file backed memory map. The OS treats\n   pages in the file cache different than pages which are not backed by files.\n\n7. The command `fs2-generic` can generate a wrapper specialized to your data\n   type. Typing saved! To use `go install github.com/timtadh/fs2/fs2-generic`.\n   Get help with `fs2-generic --help`\n\n### Limitations\n\n1. Not thread safe and therefore no transactions which you only need with\n   multiple threads.\n\n2. Maximum fixed key/value size is ~1350 bytes.\n\n3. Maximum variable length key/value size is 2^31 - 1\n\n4. This is not a database. You could make it into a database or build a database\n   on top of it.\n\n### Quick Start\n\n[usage docs on godoc](https://godoc.org/github.com/timtadh/fs2/bptree#BpTree)\n\nImporting\n\n```go\nimport (\n\t\"github.com/timtadh/fs2/bptree\"\n\t\"github.com/timtadh/fs2/fmap\"\n)\n```\n\nCreating a new B+ Tree (fixed key size, variable length value size).\n\n```go\nbf, err := fmap.CreateBlockFile(\"/path/to/file\")\nif err != nil {\n\tlog.Fatal(err)\n}\ndefer bf.Close()\nbpt, err := bptree.New(bf, 8, -1)\nif err != nil {\n\tlog.Fatal(err)\n}\n// do stuff with bpt\n```\n\nOpening a B+ Tree\n\n```go\nbf, err := fmap.OpenBlockFile(\"/path/to/file\")\nif err != nil {\n\tlog.Fatal(err)\n}\ndefer bf.Close()\nbpt, err := bptree.Open(bf)\nif err != nil {\n\tlog.Fatal(err)\n}\n// do stuff with bpt\n```\n\nAdd a key/value pair. Note, since this is low level you have to serialize your\nkeys and values. The length of the []byte representing the key must exactly\nmatch the key size of the B+ Tree. You can find out what that was set to by\ncalled `bpt.KeySize()`\n\n```go\nimport (\n\t\"encoding/binary\"\n)\n\nvar key uint64 = 12\nvalue := \"hello world\"\nkBytes := make([]byte, 8)\nbinary.PutUvarint(kBytes, key)\nerr := bpt.Add(kBytes, []byte(value))\nif err != nil {\n\tlog.Fatal(err)\n}\n```\n\nAs you can see it can be a little verbose to serialize and de-serialize your\nkeys and values. So be sure to wrap that up in utility functions or even to wrap\nthe interface of the B+ Tree so that client code does not have to think about\nit.\n\nSince a B+Tree is a \"multi-map\" meaning there may be more than one value per\nkey. There is no \"Get\" method. To retrieve the values associated with a key use\nthe `Find` method.\n\n```go\n{\n\tvar key, value []byte\n\tkvi, err := bpt.Find(kBytes)\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tfor key, value, err, kvi = kvi(); kvi != nil; key, value, err, kvi = kvi() {\n\t\t// do stuff with the keys and values\n\t}\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n}\n```\n\nThat interface is easy to misuse if you do not check the error values as show in\nthe example above. An easier interface is provided for all of the iterators\n(Range, Find, Keys, Values, Iterate) called the Do interface.\n\n```go\nerr = bpt.DoFind(kBytes, func(key, value []byte) error {\n\t// do stuff with the keys and values\n\treturn nil\n})\nif err != nil {\n\tlog.Fatal(err)\n}\n```\n\nIt is recommended that you always use the Do\\* interfaces. The other is provided\nif the cost of extra method calls is too high.\n\nRemoval is also slightly more complicated due to the duplicate keys.  This\nexample will remove all key/value pairs associated with the given key:\n\n```go\nerr = bpt.Remove(kBytes, func(value []byte) bool {\n\treturn true\n})\nif err != nil {\n\tlog.Fatal(err)\n}\n```\n\nto remove just the one I added earlier do:\n\n```go\nerr = bpt.Remove(kBytes, func(v []byte) bool {\n\treturn bytes.Equal(v, []byte(value))\n})\nif err != nil {\n\tlog.Fatal(err)\n}\n```\n\nThat wraps up the basic usage. If you want to ensure that the bytes you have\nwritten are in fact on disk you have 2 options\n\n1. call bf.Sync() - Note this uses the async mmap interface under the hood. The\n   bytes are not guaranteed to hit the disk after this returns but they will go\n   there soon.\n\n2. call bf.Close()\n\n\n## MMList\n\n[docs](https://godoc.org/github.com/timtadh/fs2/mmlist)\n\nA Memory Mapped List. This list works more like a stack and less like a queue.\nIt is not a good thing to build a job queue on. It is a good thing to build a\nlarge set of items which can be efficiently randomly sampled. It uses the same\n`varchar` system that the B+Tree uses so it can store variably sized items up to\n2^31 - 1 bytes long.\n\nOperations\n\n1. `Size` O(1)\n2. `Append` O(1)\n3. `Pop` O(1)\n4. `Get` O(1)\n5. `Set` O(1)\n6. `Swap` O(1)\n7. `SwapDelete` O(1)\n\nI will consider implementing a `Delete` method. However, it will be `O(n)` since\nthis is implemented a bit like an `ArrayList` under the hood. The actual way it\nworks is there is a B+Tree which indexes to list index blocks. The list index\nblocks hold pointers (511 of them) to varchar locations. I considered having a\nrestricted 2 level index but that would have limited the size of the list to a\nmaximum of ~1 billion items which was uncomfortably small to me. In the future\nthe implementation may change to use something more like an ISAM index which\nwill be a bit more compact for this use case.\n\n### Quickstart\n\n```go\npackage main\n\nimport (\n\t\"binary\"\n\t\"log\"\n)\n\nimport (\n\t\"github.com/timtadh/fs2/fmap\"\n\t\"github.com/timtadh/fs2/mmlist\"\n)\n\nfunc main() {\n\tfile, err := fmap.CreateBlockFile(\"/tmp/file\")\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tdefer file.Close()\n\tlist, err := mmlist.New(file)\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tidx, err := list.Append([]byte(\"hello\"))\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tif d, err := list.Get(idx); err != nil {\n\t\tlog.Fatal(err)\n\t} else if !bytes.Equal([]byte(\"hello\"), d) {\n\t\tlog.Fatal(\"bytes where not hello\")\n\t}\n\tif err := list.Set(idx, \"bye!\"); err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tif d, err := list.Get(idx); err != nil {\n\t\tlog.Fatal(err)\n\t} else if !bytes.Equal([]byte(\"bye!\"), d) {\n\t\tlog.Fatal(\"bytes where not bye!\")\n\t}\n\tif d, err := list.Pop(); err != nil {\n\t\tlog.Fatal(err)\n\t} else if !bytes.Equal([]byte(\"bye!\"), d) {\n\t\tlog.Fatal(\"bytes where not bye!\")\n\t}\n}\n```\n\n## `fs2-generic`\n\nA command to generate type specialized wrappers around fs2 structures.\n\nSince Go does not support generics and is not going to support generics anytime\nsoon, this program will produce a wrapper specialized to the supplied types. It\nis essentially manually implementing type specialized generics in a very limited\nform.  All fs2 structures operate on sequences of bytes, aka `[]byte`, because\nthey memory mapped and file backed structures. Therefore, the supplied types\nmust be serializable to be used as keys and values in an fs2 structure.\n\n### How to install\n\nAssuming you already have the code downloaded and in your GOPATH just run:\n\n    $ go install github.com/timtadh/fs2/fs2-generic\n\n#### How to generate a wrapper for the B+ Tree\n\n    $ fs2-generic \\\n        --output=src/output/package/file.go \\\n        --package-name=package \\\n        bptree \\\n            --key-type=my/package/name/Type \\\n            --key-serializer=my/package/name/Func \\\n            --key-deserializer=my/package/name/Func \\\n            --value-type=my/package/name/Type \\\n            --value-serializer=my/package/name/Func \\\n            --value-deserializer=my/package/name/Func\n\n#### How to generate a wrapper for the MMList\n\n    $ fs2-generic \\\n        --output=src/output/package/file.go \\\n        --package-name=package \\\n        mmlist \\\n            --item-type=my/package/name/Type \\\n            --item-serializer=my/package/name/Func \\\n            --item-deserializer=my/package/name/Func\n\n#### Variations\n\nSupplying a pointer type:\n\n    --key-type=*my/package/name/Type\n    --value-type=*my/package/name/Type\n\nSerializer Type Signature (let T be a type parameter)\n\n    func(T) ([]byte)\n\nDeserializer Type Signature (let T be a type parameter)\n\n    func([]byte) T\n\nFixed sized types can have their sizes specified with\n\n    --key-size=\u003c# of bytes\u003e\n    --value-size=\u003c# of bytes\u003e\n\nIf the generated file is going into the same package that the types and\nfunction are declared in one should drop the package specifiers\n\n    $ fs2-generic \\\n        --output=src/output/package/file.go \\\n        --package-name=package \\\n        bptree \\\n            --key-type=KeyType \\\n            --key-serializer=SerializeKey \\\n            --key-deserializer=DeserializeKey \\\n            --value-type=ValueType \\\n            --value-serializer=SerializeValue \\\n            --value-deserializer=DeserializeValue\n\nIf `nil` is not a valid \"empty\" value for your type (for instance it is an\ninteger, a float, or a struct value) then your must supply a valid \"empty\"\nvalue. Here is an example of a tree with int32 keys and float64 values:\n\n    $ fs2-generic \\\n        --output=src/output/package/file.go \\\n        --package-name=package \\\n        bptree \\\n            --key-type=int32 \\\n            --key-size=4 \\\n            --key-empty=0 \\\n            --key-serializer=SerializeInt32 \\\n            --key-deserializer=DeserializeInt32 \\\n            --value-type=float64 \\\n            --value-size=8 \\\n            --value-empty=0.0 \\\n            --value-serializer=SerializeFloat64 \\\n            --value-deserializer=DeserializeFloat64\n\n### Using with `go generate`\n\nThe fs2-generic command can be used on conjunction with `go generate`. To do so\nsimply create a `.go` file in the package where the generated code should live.\nFor example, let's pretend that we want to create a B+Tree with 3 dimension\ninteger points as keys and float64's as values. Lets create a package structure\nfor that (assuming you are in the root of your $GOPATH)\n\n    mkdir ./src/edu/cwru/eecs/pointbptree/\n    touch ./src/edu/cwru/eecs/pointbptree/types.go\n\n`types.go` should then have the point defined + functions for serialization.\nBelow is the full example. Note the top line specifies how to generate the file\n`./src/edu/cwru/eecs/pointbptree/wrapper.go`. To generate it run `go generate\nedu/cwru/eecs/pointbptree`.\n\n```go\n//go:generate fs2-generic --output=wrapper.go --package-name=pointbptree bptree --key-type=*Point --key-size=12 --key-empty=nil --key-serializer=SerializePoint --key-deserializer=DeserializePoint --value-type=float64 --value-size=8 --value-empty=0.0 --value-serializer=SerializeFloat64 --value-deserializer=DeserializeFloat64\npackage pointbptree\n\nimport (\n\t\"encoding/binary\"\n\t\"math\"\n)\n\ntype Point struct {\n\tX, Y, Z int32\n}\n\nfunc SerializePoint(p *Point) []byte {\n\tbytes := make([]byte, 4*3)\n\tbinary.BigEndian.PutUint32(bytes[0:04], uint32(p.X))\n\tbinary.BigEndian.PutUint32(bytes[4:08], uint32(p.Y))\n\tbinary.BigEndian.PutUint32(bytes[8:12], uint32(p.Z))\n\treturn bytes\n}\n\nfunc DeserializePoint(bytes []byte) *Point {\n\treturn \u0026Point{\n\t\tX: int32(binary.BigEndian.Uint32(bytes[0:04])),\n\t\tY: int32(binary.BigEndian.Uint32(bytes[4:08])),\n\t\tZ: int32(binary.BigEndian.Uint32(bytes[8:12])),\n\t}\n}\n\nfunc SerializeFloat64(f float64) []byte {\n\tbytes := make([]byte, 8)\n\tbinary.BigEndian.PutUint64(bytes, math.Float64bits(f))\n\treturn bytes\n}\n\nfunc DeserializeFloat64(bytes []byte) float64 {\n\treturn math.Float64frombits(binary.BigEndian.Uint64(bytes))\n}\n```\n\n\n## FMap\n\n[docs](https://godoc.org/github.com/timtadh/fs2/fmap)\n\nFMap provides a block oriented interface for implementing memory mapped file\nstructures. It is block oriented because memory mapped structures **should** be\nblock aligned. By making the interface block oriented, the programmer is forced\nto write the structures in a block oriented fashion. I use it with\n[fs2/slice](https://godoc.org/github.com/timtadh/fs2/slice) which provides a\nsimple way to cast []byte to other types of pointers. You can accomplish a\nsimilar thing with just using the `reflect` package but you might find\n`fs2/slice` more convenient.\n\nFMap provides an interface for creating both anonymous and file backed memory\nmaps. It supports resizing the memory maps dynamically via allocation and free\nmethods. Note, when an allocation occurs the underlying file and memory map\n**may** resize using `mremap` with the flag `MREMAP_MAYMOVE`. So don't let\npointers escape your memory map! Keep everything as file offsets and be happy!\n\n## Memory Mapped IO versus Read/Write\n\nA key motivation of this work is to explore memory mapped IO versus a read/write\ninterface in the context of Go. I have two hypotheses:\n\n1. The operating system is good at page management generally. While, we know\n   more about how to manage the structure of B+Trees, VarChar stores, and Linear\n   Hash tables than the OS there is no indication that from Go you can achieve\n   better performance. Therefore, I hypothesize that leaving it to the OS will\n   lead to a smaller working set and a faster data structure in general.\n\n2. You can make Memory Mapping performant in Go. There are many challenges here.\n   The biggest of which is that there are no dynamically size array TYPES in go.\n   The size of the array is part of the type, you have to use slices. This\n   creates complications when hooking up structures which contain slices to mmap\n   allocated blocks of memory. I hypothesize that this repository can achieve\n   good (enough) performance here.\n\nIn my past experience using the read/write interface I have encountered two\nchallenges:\n\n1. When using the read/write interface one needs to block and cache management.\n   In theory databases which bypass the OS cache management get better\n   performance. In practice, there are challenges achieving this from a garbage\n   collected language.\n\n2. Buffer management is a related problem. In the past I have relied on Go's\n   built in memory management scheme. This often become a bottle neck. To solve\n   this problem, one must implement custom allocators and buffer management\n   subsystems.\n\nMemory mapped IO avoids both of these problems by delegating them to the\noperating system. If the OS does a good job, then this system will perform well.\nIf it does a bad job it will perform poorly. The reason why systems such as\nOracle circumvent all OS level functions for page management is the designers\nbelieve: a) they can do it better, and b) it provides consistent performance\nacross platforms.\n\nMemory mapped IO in Go has several challenges.\n\n1. You have to subvert type and memory safety.\n\n2. There is no dynamically sized arrays. Therefore, everything has to use\n   slices. This means that you can't just point a `struct` at a memory mapped\n   block and expect it work if it has slices in it. Instead, some book keeping\n   needs to be done to hook up the slices properly. This adds overhead.\n\nThe results so far:\n\n1. It can be done\n\n2. Integrating (partial) runtime checking for safety can be achieved through the\n   use of the \"do\" interface.\n\n3. The performance numbers look like they are as good or better than the\n   Linear Hash table I implemented in my file-structures repository.\n\n## Related Projects\n\n1. [file-structures](https://github.com/timtadh/file-structures) - A collection\n   of file-structures includes: B+Tree, BTree, Linear Hash Table, VarChar Store.\n2. [data-structures](https://github.com/timtadh/data-structures) - A collection\n   of in memory data structures. Includes a B+Tree.\n3. [boltdb](https://github.com/boltdb/bolt) - a mmap'ed b+ tree based key/value\n   store.\n4. [goleveldb](https://github.com/syndtr/goleveldb) - another database written\n   in go\n5. [cznic/b](https://github.com/cznic/b) - an in memory b+ tree\n6. [xiang90/bplustree](https://github.com/xiang90/bplustree) - an in memory b+\n   tree\n7. your project here.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimtadh%2Ffs2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimtadh%2Ffs2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimtadh%2Ffs2/lists"}