{"id":13411873,"url":"https://github.com/kelindar/column","last_synced_at":"2025-05-15T18:00:22.357Z","repository":{"id":37140008,"uuid":"371172777","full_name":"kelindar/column","owner":"kelindar","description":"High-performance, columnar, in-memory store with bitmap indexing in Go","archived":false,"fork":false,"pushed_at":"2024-01-15T06:14:26.000Z","size":783,"stargazers_count":1465,"open_issues_count":26,"forks_count":61,"subscribers_count":24,"default_branch":"main","last_synced_at":"2025-04-07T22:11:08.384Z","etag":null,"topics":["bitmap","columnar-storage","data-oriented","db","indexing","soa"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kelindar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["kelindar"]}},"created_at":"2021-05-26T21:27:45.000Z","updated_at":"2025-04-02T11:18:04.000Z","dependencies_parsed_at":"2024-06-18T14:42:49.214Z","dependency_job_id":null,"html_url":"https://github.com/kelindar/column","commit_stats":{"total_commits":136,"total_committers":6,"mean_commits":"22.666666666666668","dds":0.08823529411764708,"last_synced_commit":"0af9c372db72a60ff7f32bde50350c6fdbf3498c"},"previous_names":["kelindar/columnar"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kelindar%2Fcolumn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kelindar%2Fcolumn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kelindar%2Fcolumn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kelindar%2Fcolumn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kelindar","download_url":"https://codeload.github.com/kelindar/column/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254394718,"owners_count":22063984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitmap","columnar-storage","data-oriented","db","indexing","soa"],"created_at":"2024-07-30T20:01:17.837Z","updated_at":"2025-05-15T18:00:22.318Z","avatar_url":"https://github.com/kelindar.png","language":"Go","readme":"\u003cp align=\"center\"\u003e\r\n\u003cimg width=\"330\" height=\"110\" src=\".github/logo.png\" border=\"0\" alt=\"kelindar/column\"\u003e\r\n\u003cbr\u003e\r\n\u003cimg src=\"https://img.shields.io/github/go-mod/go-version/kelindar/column\" alt=\"Go Version\"\u003e\r\n\u003ca href=\"https://pkg.go.dev/github.com/kelindar/column\"\u003e\u003cimg src=\"https://pkg.go.dev/badge/github.com/kelindar/column\" alt=\"PkgGoDev\"\u003e\u003c/a\u003e\r\n\u003ca href=\"https://goreportcard.com/report/github.com/kelindar/column\"\u003e\u003cimg src=\"https://goreportcard.com/badge/github.com/kelindar/column\" alt=\"Go Report Card\"\u003e\u003c/a\u003e\r\n\u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-blue.svg\" alt=\"License\"\u003e\u003c/a\u003e\r\n\u003ca href=\"https://coveralls.io/github/kelindar/column\"\u003e\u003cimg src=\"https://coveralls.io/repos/github/kelindar/column/badge.svg\" alt=\"Coverage\"\u003e\u003c/a\u003e\r\n\u003c/p\u003e\r\n\r\n## Columnar In-Memory Store with Bitmap Indexing\r\n\r\nThis package contains a **high-performance, columnar, in-memory storage engine** that supports fast querying, update and iteration with zero-allocations and bitmap indexing.\r\n\r\n## Features\r\n\r\n- Optimized, cache-friendly **columnar data layout** that minimizes cache-misses.\r\n- Optimized for **zero heap allocation** during querying (see benchmarks below).\r\n- Optimized **batch updates/deletes**, an update during a transaction takes around `12ns`.\r\n- Support for **SIMD-enabled aggregate functions** such as \"sum\", \"avg\", \"min\" and \"max\".\r\n- Support for **SIMD-enabled filtering** (i.e. \"where\" clause) by leveraging [bitmap indexing](https://github.com/kelindar/bitmap).\r\n- Support for **columnar projection** (i.e. \"select\" clause) for fast retrieval.\r\n- Support for **computed indexes** that are dynamically calculated based on provided predicate.\r\n- Support for **concurrent updates** using sharded latches to keep things fast.\r\n- Support for **transaction isolation**, allowing you to create transactions and commit/rollback.\r\n- Support for **expiration** of rows based on time-to-live or expiration column.\r\n- Support for **atomic merging** of any values, transactionally.\r\n- Support for **primary keys** for use-cases where offset can't be used.\r\n- Support for **change data stream** that streams all commits consistently.\r\n- Support for **concurrent snapshotting** allowing to store the entire collection into a file.\r\n\r\n## Documentation\r\n\r\nThe general idea is to leverage cache-friendly ways of organizing data in [structures of arrays (SoA)](https://en.wikipedia.org/wiki/AoS_and_SoA) otherwise known \"columnar\" storage in database design. This, in turn allows us to iterate and filter over columns very efficiently. On top of that, this package also adds [bitmap indexing](https://en.wikipedia.org/wiki/Bitmap_index) to the columnar storage, allowing to build filter queries using binary `and`, `and not`, `or` and `xor` (see [kelindar/bitmap](https://github.com/kelindar/bitmap) with SIMD support).\r\n\r\n- [Collection and Columns](#collection-and-columns)\r\n- [Querying and Indexing](#querying-and-indexing)\r\n- [Iterating over Results](#iterating-over-results)\r\n- [Updating Values](#updating-values)\r\n- [Expiring Values](#expiring-values)\r\n- [Transaction Commit and Rollback](#transaction-commit-and-rollback)\r\n- [Using Primary Keys](#using-primary-keys)\r\n- [Storing Binary Records](#storing-binary-records)\r\n- [Streaming Changes](#streaming-changes)\r\n- [Snapshot and Restore](#snapshot-and-restore)\r\n- [Examples](#examples)\r\n- [Benchmarks](#benchmarks)\r\n- [Contributing](#contributing)\r\n\r\n## Collection and Columns\r\n\r\nIn order to get data into the store, you'll need to first create a `Collection` by calling `NewCollection()` method. Each collection requires a schema, which needs to be specified by calling `CreateColumn()` multiple times or automatically inferred from an object by calling `CreateColumnsOf()` function. In the example below we create a new collection with several columns.\r\n\r\n```go\r\n// Create a new collection with some columns\r\nplayers := column.NewCollection()\r\nplayers.CreateColumn(\"name\", column.ForString())\r\nplayers.CreateColumn(\"class\", column.ForString())\r\nplayers.CreateColumn(\"balance\", column.ForFloat64())\r\nplayers.CreateColumn(\"age\", column.ForInt16())\r\n```\r\n\r\nNow that we have created a collection, we can insert a single record by using `Insert()` method on the collection. In this example we're inserting a single row and manually specifying values. Note that this function returns an `index` that indicates the row index for the inserted row.\r\n\r\n```go\r\nindex, err := players.Insert(func(r column.Row) error {\r\n\tr.SetString(\"name\", \"merlin\")\r\n\tr.SetString(\"class\", \"mage\")\r\n\tr.SetFloat64(\"balance\", 99.95)\r\n\tr.SetInt16(\"age\", 107)\r\n\treturn nil\r\n})\r\n```\r\n\r\nWhile the previous example demonstrated how to insert a single row, inserting multiple rows this way is rather inefficient. This is due to the fact that each `Insert()` call directly on the collection initiates a separate transacion and there's a small performance cost associated with it. If you want to do a bulk insert and insert many values, faster, that can be done by calling `Insert()` on a transaction, as demonstrated in the example below. Note that the only difference is instantiating a transaction by calling the `Query()` method and calling the `txn.Insert()` method on the transaction instead the one on the collection.\r\n\r\n```go\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tfor _, v := range myRawData {\r\n\t\ttxn.Insert(...)\r\n\t}\r\n\treturn nil // Commit\r\n})\r\n```\r\n\r\n## Querying and Indexing\r\n\r\nThe store allows you to query the data based on a presence of certain attributes or their values. In the example below we are querying our collection and applying a _filtering_ operation bu using `WithValue()` method on the transaction. This method scans the values and checks whether a certain predicate evaluates to `true`. In this case, we're scanning through all of the players and looking up their `class`, if their class is equal to \"rogue\", we'll take it. At the end, we're calling `Count()` method that simply counts the result set.\r\n\r\n```go\r\n// This query performs a full scan of \"class\" column\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tcount := txn.WithValue(\"class\", func(v interface{}) bool {\r\n\t\treturn v == \"rogue\"\r\n\t}).Count()\r\n\treturn nil\r\n})\r\n```\r\n\r\nNow, what if we'll need to do this query very often? It is possible to simply _create an index_ with the same predicate and have this computation being applied every time (a) an object is inserted into the collection and (b) an value of the dependent column is updated. Let's look at the example below, we're fist creating a `rogue` index which depends on \"class\" column. This index applies the same predicate which only returns `true` if a class is \"rogue\". We then can query this by simply calling `With()` method and providing the index name.\r\n\r\nAn index is essentially akin to a boolean column, so you could technically also select it's value when querying it. Now, in this example the query would be around `10-100x` faster to execute as behind the scenes it uses [bitmap indexing](https://github.com/kelindar/bitmap) for the \"rogue\" index and performs a simple logical `AND` operation on two bitmaps when querying. This avoid the entire scanning and applying of a predicate during the `Query`.\r\n\r\n```go\r\n// Create the index \"rogue\" in advance\r\nout.CreateIndex(\"rogue\", \"class\", func(v interface{}) bool {\r\n\treturn v == \"rogue\"\r\n})\r\n\r\n// This returns the same result as the query before, but much faster\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tcount := txn.With(\"rogue\").Count()\r\n\treturn nil\r\n})\r\n```\r\n\r\nThe query can be further expanded as it allows indexed `intersection`, `difference` and `union` operations. This allows you to ask more complex questions of a collection. In the examples below let's assume we have a bunch of indexes on the `class` column and we want to ask different questions.\r\n\r\nFirst, let's try to merge two queries by applying a `Union()` operation with the method named the same. Here, we first select only rogues but then merge them together with mages, resulting in selection containing both rogues and mages.\r\n\r\n```go\r\n// How many rogues and mages?\r\nplayers.Query(func(txn *column.Txn) error {\r\n\ttxn.With(\"rogue\").Union(\"mage\").Count()\r\n\treturn nil\r\n})\r\n```\r\n\r\nNext, let's count everyone who isn't a rogue, for that we can use a `Without()` method which performs a difference (i.e. binary `AND NOT` operation) on the collection. This will result in a count of all players in the collection except the rogues.\r\n\r\n```go\r\n// How many rogues and mages?\r\nplayers.Query(func(txn *column.Txn) error {\r\n\ttxn.Without(\"rogue\").Count()\r\n\treturn nil\r\n})\r\n```\r\n\r\nNow, you can combine all of the methods and keep building more complex queries. When querying indexed and non-indexed fields together it is important to know that as every scan will apply to only the selection, speeding up the query. So if you have a filter on a specific index that selects 50% of players and then you perform a scan on that (e.g. `WithValue()`), it will only scan 50% of users and hence will be 2x faster.\r\n\r\n```go\r\n// How many rogues that are over 30 years old?\r\nplayers.Query(func(txn *column.Txn) error {\r\n\ttxn.With(\"rogue\").WithFloat(\"age\", func(v float64) bool {\r\n\t\treturn v \u003e= 30\r\n\t}).Count()\r\n\treturn nil\r\n})\r\n```\r\n\r\n## Iterating over Results\r\n\r\nIn all of the previous examples, we've only been doing `Count()` operation which counts the number of elements in the result set. In this section we'll look how we can iterate over the result set.\r\n\r\nAs before, a transaction needs to be started using the `Query()` method on the collection. After which, we can call the `txn.Range()` method which allows us to iterate over the result set in the transaction. Note that it can be chained right after `With..()` methods, as expected.\r\n\r\nIn order to access the results of the iteration, prior to calling `Range()` method, we need to **first load column reader(s)** we are going to need, using methods such as `txn.String()`, `txn.Float64()`, etc. These prepare read/write buffers necessary to perform efficient lookups while iterating.\r\n\r\nIn the example below we select all of the rogues from our collection and print out their name by using the `Range()` method and accessing the \"name\" column using a column reader which is created by calling `txn.String(\"name\")` method.\r\n\r\n```go\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tnames := txn.String(\"name\") // Create a column reader\r\n\r\n\treturn txn.With(\"rogue\").Range(func(i uint32) {\r\n\t\tname, _ := names.Get()\r\n\t\tprintln(\"rogue name\", name)\r\n\t})\r\n})\r\n```\r\n\r\nSimilarly, if you need to access more columns, you can simply create the appropriate column reader(s) and use them as shown in the example before.\r\n\r\n```go\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tnames := txn.String(\"name\")\r\n\tages  := txn.Int64(\"age\")\r\n\r\n\treturn txn.With(\"rogue\").Range(func(i uint32) {\r\n\t\tname, _ := names.Get()\r\n\t\tage,  _ := ages.Get()\r\n\r\n\t\tprintln(\"rogue name\", name)\r\n\t\tprintln(\"rogue age\", age)\r\n\t})\r\n})\r\n```\r\n\r\nTaking the `Sum()` of a (numeric) column reader will take into account a transaction's current filtering index.\r\n\r\n```go\r\nplayers.Query(func(txn *column.Txn) error {\r\n\ttotalAge := txn.With(\"rouge\").Int64(\"age\").Sum()\r\n\ttotalRouges := int64(txn.Count())\r\n\r\n\tavgAge := totalAge / totalRouges\r\n\r\n\ttxn.WithInt(\"age\", func(v float64) bool {\r\n\t\treturn v \u003c avgAge\r\n\t})\r\n\r\n\t// get total balance for 'all rouges younger than the average rouge'\r\n\tbalance := txn.Float64(\"balance\").Sum()\r\n\treturn nil\r\n})\r\n```\r\n\r\n## Sorted Indexes\r\n\r\nAlong with bitmap indexing, collections support consistently sorted indexes. These indexes are transient, and must be recreated when a collection is loading a snapshot. \r\n\r\nIn the example below, we create a SortedIndex object and use it to sort filtered records in a transaction.\r\n\r\n```go\r\n// Create the sorted index \"sortedNames\" in advance\r\nout.CreateSortIndex(\"richest\", \"balance\")\r\n\r\n// This filters the transaction with the `rouge` index before\r\n// ranging through the remaining balances by ascending order\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tname    := txn.String(\"name\")\r\n\tbalance := txn.Float64(\"balance\")\r\n\r\n\ttxn.With(\"rogue\").Ascend(\"richest\", func (i uint32) {\r\n\t\t// save or do something with sorted record\r\n\t\tcurName, _ := name.Get()\r\n\t\tbalance.Set(newBalance(curName))\r\n\t})\r\n\treturn nil\r\n})\r\n```\r\n\r\n## Updating Values\r\n\r\nIn order to update certain items in the collection, you can simply call `Range()` method and use column accessor's `Set()` or `Add()` methods to update a value of a certain column atomically. The updates won't be instantly reflected given that our store supports transactions. Only when transaction is commited, then the update will be applied to the collection, allowing for isolation and rollbacks.\r\n\r\nIn the example below we're selecting all of the rogues and updating both their balance and age to certain values. The transaction returns `nil`, hence it will be automatically committed when `Query()` method returns.\r\n\r\n```go\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tbalance := txn.Float64(\"balance\")\r\n\tage     := txn.Int64(\"age\")\r\n\r\n\treturn txn.With(\"rogue\").Range(func(i uint32) {\r\n\t\tbalance.Set(10.0) // Update the \"balance\" to 10.0\r\n\t\tage.Set(50)       // Update the \"age\" to 50\r\n\t})\r\n})\r\n```\r\n\r\nIn certain cases, you might want to atomically increment or decrement numerical values. In order to accomplish this you can use the provided `Merge()` operation. Note that the indexes will also be updated accordingly and the predicates re-evaluated with the most up-to-date values. In the below example we're incrementing the balance of all our rogues by _500_ atomically.\r\n\r\n```go\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tbalance := txn.Float64(\"balance\")\r\n\r\n\treturn txn.With(\"rogue\").Range(func(i uint32) {\r\n\t\tbalance.Merge(500.0) // Increment the \"balance\" by 500\r\n\t})\r\n})\r\n```\r\n\r\nWhile atomic increment/decrement for numerical values is relatively straightforward, this `Merge()` operation can be specified using `WithMerge()` option and also used for other data types, such as strings. In the example below we are creating a merge function that concatenates two strings together and when `MergeString()` is called, the new string gets appended automatically.\r\n\r\n```go\r\n// A merging function that simply concatenates 2 strings together\r\nconcat := func(value, delta string) string {\r\n\tif len(value) \u003e 0 {\r\n\t\tvalue += \", \"\r\n\t}\r\n\treturn value + delta\r\n}\r\n\r\n// Create a column with a specified merge function\r\ndb := column.NewCollection()\r\ndb.CreateColumn(\"alphabet\", column.ForString(column.WithMerge(concat)))\r\n\r\n// Insert letter \"A\"\r\ndb.Insert(func(r column.Row) error {\r\n\tr.SetString(\"alphabet\", \"A\") // now contains \"A\"\r\n\treturn nil\r\n})\r\n\r\n// Insert letter \"B\"\r\ndb.QueryAt(0, func(r column.Row) error {\r\n\tr.MergeString(\"alphabet\", \"B\") // now contains \"A, B\"\r\n\treturn nil\r\n})\r\n```\r\n\r\n## Expiring Values\r\n\r\nSometimes, it is useful to automatically delete certain rows when you do not need them anymore. In order to do this, the library automatically adds an `expire` column to each new collection and starts a cleanup goroutine aynchronously that runs periodically and cleans up the expired objects. In order to set this, you can simply use `Insert...()` method on the collection that allows to insert an object with a time-to-live duration defined.\r\n\r\nIn the example below we are inserting an object to the collection and setting the time-to-live to _5 seconds_ from the current time. After this time, the object will be automatically evicted from the collection and its space can be reclaimed.\r\n\r\n```go\r\nplayers.Insert(func(r column.Row) error {\r\n\tr.SetString(\"name\", \"Merlin\")\r\n\tr.SetString(\"class\", \"mage\")\r\n\tr.SetTTL(5 * time.Second) // time-to-live of 5 seconds\r\n\treturn nil\r\n})\r\n```\r\n\r\nOn an interesting note, since `expire` column which is automatically added to each collection is an actual normal column, you can query and even update it. In the example below we query and extend the time-to-live by 1 hour using the `Extend()` method.\r\n\r\n```go\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tttl := txn.TTL()\r\n\treturn txn.Range(func(i uint32) {\r\n\t\tttl.Extend(1 * time.Hour) // Add some time\r\n\t})\r\n})\r\n```\r\n\r\n## Transaction Commit and Rollback\r\n\r\nTransactions allow for isolation between two concurrent operations. In fact, all of the batch queries must go through a transaction in this library. The `Query` method requires a function which takes in a `column.Txn` pointer which contains various helper methods that support querying. In the example below we're trying to iterate over all of the players and update their balance by setting it to `10.0`. The `Query` method automatically calls `txn.Commit()` if the function returns without any error. On the flip side, if the provided function returns an error, the query will automatically call `txn.Rollback()` so none of the changes will be applied.\r\n\r\n```go\r\n// Range over all of the players and update (successfully their balance)\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tbalance := txn.Float64(\"balance\")\r\n\ttxn.Range(func(i uint32) {\r\n\t\tv.Set(10.0) // Update the \"balance\" to 10.0\r\n\t})\r\n\r\n\t// No error, transaction will be committed\r\n\treturn nil\r\n})\r\n```\r\n\r\nNow, in this example, we try to update balance but a query callback returns an error, in which case none of the updates will be actually reflected in the underlying collection.\r\n\r\n```go\r\n// Range over all of the players and update (successfully their balance)\r\nplayers.Query(func(txn *column.Txn) error {\r\n\tbalance := txn.Float64(\"balance\")\r\n\ttxn.Range(func(i uint32) {\r\n\t\tv.Set(10.0) // Update the \"balance\" to 10.0\r\n\t})\r\n\r\n\t// Returns an error, transaction will be rolled back\r\n\treturn fmt.Errorf(\"bug\")\r\n})\r\n```\r\n\r\n## Using Primary Keys\r\n\r\nIn certain cases it is useful to access a specific row by its primary key instead of an index which is generated internally by the collection. For such use-cases, the library provides `Key` column type that enables a seamless lookup by a user-defined _primary key_. In the example below we create a collection with a primary key `name` using `CreateColumn()` method with a `ForKey()` column type. Then, we use `InsertKey()` method to insert a value.\r\n\r\n```go\r\nplayers := column.NewCollection()\r\nplayers.CreateColumn(\"name\", column.ForKey())     // Create a \"name\" as a primary-key\r\nplayers.CreateColumn(\"class\", column.ForString()) // .. and some other columns\r\n\r\n// Insert a player with \"merlin\" as its primary key\r\nplayers.InsertKey(\"merlin\", func(r column.Row) error {\r\n\tr.SetString(\"class\", \"mage\")\r\n\treturn nil\r\n})\r\n```\r\n\r\nSimilarly, you can use primary key to query that data directly, without knowing the exact offset. Do note that using primary keys will have an overhead, as it requires an additional step of looking up the offset using a hash table managed internally.\r\n\r\n```go\r\n// Query merlin's class\r\nplayers.QueryKey(\"merlin\", func(r column.Row) error {\r\n\tclass, _ := r.String(\"class\")\r\n\treturn nil\r\n})\r\n```\r\n\r\n## Storing Binary Records\r\n\r\nIf you find yourself in need of encoding a more complex structure as a single column, you may do so by using `column.ForRecord()` function. This allows you to specify a `BinaryMarshaler` / `BinaryUnmarshaler` type that will get automatically encoded as a single column. In th example below we are creating a `Location` type that implements the required methods.\r\n\r\n```go\r\ntype Location struct {\r\n\tX float64 `json:\"x\"`\r\n\tY float64 `json:\"y\"`\r\n}\r\n\r\nfunc (l Location) MarshalBinary() ([]byte, error) {\r\n\treturn json.Marshal(l)\r\n}\r\n\r\nfunc (l *Location) UnmarshalBinary(b []byte) error {\r\n\treturn json.Unmarshal(b, l)\r\n}\r\n```\r\n\r\nNow that we have a record implementation, we can create a column for this struct by using `ForRecord()` function as shown below.\r\n\r\n```go\r\nplayers.CreateColumn(\"location\", ForRecord(func() *Location {\r\n\treturn new(Location)\r\n}))\r\n```\r\n\r\nIn order to manipulate the record, we can use the appropriate `Record()`, `SetRecord()` methods of the `Row`, similarly to other column types.\r\n\r\n```go\r\n// Insert a new location\r\nidx, _ := players.Insert(func(r Row) error {\r\n\tr.SetRecord(\"location\", \u0026Location{X: 1, Y: 2})\r\n\treturn nil\r\n})\r\n\r\n// Read the location back\r\nplayers.QueryAt(idx, func(r Row) error {\r\n\tlocation, ok := r.Record(\"location\")\r\n\treturn nil\r\n})\r\n```\r\n\r\n## Streaming Changes\r\n\r\nThis library also supports streaming out all transaction commits consistently, as they happen. This allows you to implement your own change data capture (CDC) listeners, stream data into kafka or into a remote database for durability. In order to enable it, you can simply provide an implementation of a `commit.Logger` interface during the creation of the collection.\r\n\r\nIn the example below we take advantage of the `commit.Channel` implementation of a `commit.Logger` which simply publishes the commits into a go channel. Here we create a buffered channel and keep consuming the commits with a separate goroutine, allowing us to view transactions as they happen in the store.\r\n\r\n```go\r\n// Create a new commit writer (simple channel) and a new collection\r\nwriter  := make(commit.Channel, 1024)\r\nplayers := NewCollection(column.Options{\r\n\tWriter: writer,\r\n})\r\n\r\n// Read the changes from the channel\r\ngo func(){\r\n\tfor commit := range writer {\r\n\t\tfmt.Printf(\"commit %v\\n\", commit.ID)\r\n\t}\r\n}()\r\n\r\n// ... insert, update or delete\r\n```\r\n\r\nOn a separate note, this change stream is guaranteed to be consistent and serialized. This means that you can also replicate those changes on another database and synchronize both. In fact, this library also provides `Replay()` method on the collection that allows to do just that. In the example below we create two collections `primary` and `replica` and asychronously replicating all of the commits from the `primary` to the `replica` using the `Replay()` method together with the change stream.\r\n\r\n```go\r\n// Create a primary collection\r\nwriter  := make(commit.Channel, 1024)\r\nprimary := column.NewCollection(column.Options{\r\n\tWriter: \u0026writer,\r\n})\r\nprimary.CreateColumnsOf(object)\r\n\r\n// Replica with the same schema\r\nreplica := column.NewCollection()\r\nreplica.CreateColumnsOf(object)\r\n\r\n// Keep 2 collections in sync\r\ngo func() {\r\n\tfor change := range writer {\r\n\t\treplica.Replay(change)\r\n\t}\r\n}()\r\n```\r\n\r\n## Snapshot and Restore\r\n\r\nThe collection can also be saved in a single binary format while the transactions are running. This can allow you to periodically schedule backups or make sure all of the data is persisted when your application terminates.\r\n\r\nIn order to take a snapshot, you must first create a valid `io.Writer` destination and then call the `Snapshot()` method on the collection in order to create a snapshot, as demonstrated in the example below.\r\n\r\n```go\r\ndst, err := os.Create(\"snapshot.bin\")\r\nif err != nil {\r\n\tpanic(err)\r\n}\r\n\r\n// Write a snapshot into the dst\r\nerr := players.Snapshot(dst)\r\n```\r\n\r\nConversely, in order to restore an existing snapshot, you need to first open an `io.Reader` and then call the `Restore()` method on the collection. Note that the collection and its schema must be already initialized, as our snapshots do not carry this information within themselves.\r\n\r\n```go\r\nsrc, err := os.Open(\"snapshot.bin\")\r\nif err != nil {\r\n\tpanic(err)\r\n}\r\n\r\n// Restore from an existing snapshot\r\nerr := players.Restore(src)\r\n```\r\n\r\n## Examples\r\n\r\nMultiple complete usage examples of this library can be found in the [examples](https://github.com/kelindar/column/tree/main/examples) directory in this repository.\r\n\r\n## Benchmarks\r\n\r\nThe benchmarks below were ran on a collection of **100,000 items** containing a dozen columns. Feel free to explore the benchmarks but I strongly recommend testing it on your actual dataset.\r\n\r\n```\r\ncpu: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz\r\nBenchmarkCollection/insert-8            2523     469481 ns/op    24356 B/op    500 allocs/op\r\nBenchmarkCollection/select-at-8     22194190      54.23 ns/op        0 B/op      0 allocs/op\r\nBenchmarkCollection/scan-8              2068     568953 ns/op      122 B/op      0 allocs/op\r\nBenchmarkCollection/count-8           571449       2057 ns/op        0 B/op      0 allocs/op\r\nBenchmarkCollection/range-8            28660      41695 ns/op        3 B/op      0 allocs/op\r\nBenchmarkCollection/update-at-8      5911978      202.8 ns/op        0 B/op      0 allocs/op\r\nBenchmarkCollection/update-all-8        1280     946272 ns/op     3726 B/op      0 allocs/op\r\nBenchmarkCollection/delete-at-8      6405852      188.9 ns/op        0 B/op      0 allocs/op\r\nBenchmarkCollection/delete-all-8     2073188      562.6 ns/op        0 B/op      0 allocs/op\r\n```\r\n\r\nWhen testing for larger collections, I added a small example (see `examples` folder) and ran it with **20 million rows** inserted, each entry has **12 columns and 4 indexes** that need to be calculated, and a few queries and scans around them.\r\n\r\n```\r\nrunning insert of 20000000 rows...\r\n-\u003e insert took 20.4538183s\r\n\r\nrunning snapshot of 20000000 rows...\r\n-\u003e snapshot took 2.57960038s\r\n\r\nrunning full scan of age \u003e= 30...\r\n-\u003e result = 10200000\r\n-\u003e full scan took 61.611822ms\r\n\r\nrunning full scan of class == \"rogue\"...\r\n-\u003e result = 7160000\r\n-\u003e full scan took 81.389954ms\r\n\r\nrunning indexed query of human mages...\r\n-\u003e result = 1360000\r\n-\u003e indexed query took 608.51µs\r\n\r\nrunning indexed query of human female mages...\r\n-\u003e result = 640000\r\n-\u003e indexed query took 794.49µs\r\n\r\nrunning update of balance of everyone...\r\n-\u003e updated 20000000 rows\r\n-\u003e update took 214.182216ms\r\n\r\nrunning update of age of mages...\r\n-\u003e updated 6040000 rows\r\n-\u003e update took 81.292378ms\r\n```\r\n\r\n## Contributing\r\n\r\nWe are open to contributions, feel free to submit a pull request and we'll review it as quickly as we can. This library is maintained by [Roman Atachiants](https://www.linkedin.com/in/atachiants/)\r\n\r\n## License\r\n\r\nTile is licensed under the [MIT License](LICENSE.md).\r\n","funding_links":["https://github.com/sponsors/kelindar"],"categories":["Database","Go","Databases","Key-Value Store","数据库","Data Integration Frameworks","Libraries","Uncategorized","Generators"],"sub_categories":["Databases Implemented in Go","Go中实现的数据库","Advanced Console UIs"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkelindar%2Fcolumn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkelindar%2Fcolumn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkelindar%2Fcolumn/lists"}