{"id":13444624,"url":"https://github.com/tobgu/qframe","last_synced_at":"2025-04-04T10:05:33.564Z","repository":{"id":47465984,"uuid":"115053884","full_name":"tobgu/qframe","owner":"tobgu","description":"Immutable data frame for Go","archived":false,"fork":false,"pushed_at":"2024-07-02T03:52:37.000Z","size":3730,"stargazers_count":407,"open_issues_count":13,"forks_count":33,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-03-28T09:03:48.379Z","etag":null,"topics":["data-frame","data-science","dataframe","go","golang","immutable"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tobgu.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-21T22:48:16.000Z","updated_at":"2025-03-24T07:32:30.000Z","dependencies_parsed_at":"2024-12-06T23:15:35.365Z","dependency_job_id":null,"html_url":"https://github.com/tobgu/qframe","commit_stats":{"total_commits":356,"total_committers":12,"mean_commits":"29.666666666666668","dds":0.1713483146067416,"last_synced_commit":"edb23855dc466ccea4b89d245aa06adc94cba431"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobgu%2Fqframe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobgu%2Fqframe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobgu%2Fqframe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobgu%2Fqframe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tobgu","download_url":"https://codeload.github.com/tobgu/qframe/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247153892,"owners_count":20892752,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-frame","data-science","dataframe","go","golang","immutable"],"created_at":"2024-07-31T04:00:32.452Z","updated_at":"2025-04-04T10:05:33.535Z","avatar_url":"https://github.com/tobgu.png","language":"Go","funding_links":[],"categories":["DataFrames"],"sub_categories":["Vector Database"],"readme":"[![CI Status](https://github.com/tobgu/qframe/actions/workflows/ci.yaml/badge.svg)](https://github.com/tobgu/qframe/actions/workflows/ci.yaml)\n[![Go Coverage](https://github.com/tobgu/qframe/wiki/coverage.svg)](https://raw.githack.com/wiki/tobgu/qframe/coverage.html)\n[![Go Report Card](https://goreportcard.com/badge/github.com/tobgu/qframe)](https://goreportcard.com/report/github.com/tobgu/qframe)\n[![GoDoc](https://godoc.org/github.com/tobgu/qframe?status.svg)](https://godoc.org/github.com/tobgu/qframe)\n\nQFrame is an immutable data frame that support filtering, aggregation\nand data manipulation. Any operation on a QFrame results in\na new QFrame, the original QFrame remains unchanged. This can be done\nfairly efficiently since much of the underlying data will be shared\nbetween the two frames.\n\nThe design of QFrame has mainly been driven by the requirements from\n[qocache](https://github.com/tobgu/qocache) but it is in many aspects\na general purpose data frame. Any suggestions for added/improved\nfunctionality to support a wider scope is always of interest as long\nas they don't conflict with the requirements from qocache!\nSee [Contribute](#contribute).\n\n## Installation\n`go get github.com/tobgu/qframe`\n\n## Usage\nBelow are some examples of common use cases. The list is not exhaustive\nin any way. For a complete description of all operations including more\nexamples see the [docs](https://godoc.org/github.com/tobgu/qframe).\n\n### IO\nQFrames can currently be read from and written to CSV, record\noriented JSON, and any SQL database supported by the go `database/sql`\ndriver.\n\n#### CSV Data\n\nRead CSV data:\n```go\ninput := `COL1,COL2\na,1.5\nb,2.25\nc,3.0`\n\nf := qframe.ReadCSV(strings.NewReader(input))\nfmt.Println(f)\n```\nOutput:\n```\nCOL1(s) COL2(f)\n------- -------\n      a     1.5\n      b    2.25\n      c       3\n\nDims = 2 x 3\n```\n\n#### SQL Data\n\nQFrame supports reading and writing data from the standard library `database/sql`\ndrivers. It has been tested with [SQLite](github.com/mattn/go-sqlite3), [Postgres](github.com/lib/pq), and [MariaDB](github.com/go-sql-driver/mysql).\n\n##### SQLite Example\n\nLoad data to and from an in-memory SQLite database. Note\nthat this example requires you to have [go-sqlite3](https://github.com/mattn/go-sqlite3) installed\nprior to running.\n\n```go\npackage main\n\nimport (\n\t\"database/sql\"\n\t\"fmt\"\n\n\t_ \"github.com/mattn/go-sqlite3\"\n\t\"github.com/tobgu/qframe\"\n\tqsql \"github.com/tobgu/qframe/config/sql\"\n)\n\nfunc main() {\n\t// Create a new in-memory SQLite database.\n\tdb, _ := sql.Open(\"sqlite3\", \":memory:\")\n\t// Add a new table.\n\tdb.Exec(`\n\tCREATE TABLE test (\n\t\tCOL1 INT,\n\t\tCOL2 REAL,\n\t\tCOL3 TEXT,\n\t\tCOL4 BOOL\n\t);`)\n\t// Create a new QFrame to populate our table with.\n\tqf := qframe.New(map[string]interface{}{\n\t\t\"COL1\": []int{1, 2, 3},\n\t\t\"COL2\": []float64{1.1, 2.2, 3.3},\n\t\t\"COL3\": []string{\"one\", \"two\", \"three\"},\n\t\t\"COL4\": []bool{true, true, true},\n\t})\n\tfmt.Println(qf)\n\t// Start a new SQL Transaction.\n\ttx, _ := db.Begin()\n\t// Write the QFrame to the database.\n\tqf.ToSQL(tx,\n\t\t// Write only to the test table\n\t\tqsql.Table(\"test\"),\n\t\t// Explicitly set SQLite compatibility.\n\t\tqsql.SQLite(),\n\t)\n\t// Create a new QFrame from SQL.\n\tnewQf := qframe.ReadSQL(tx,\n\t\t// A query must return at least one column. In this \n\t\t// case it will return all of the columns we created above.\n\t\tqsql.Query(\"SELECT * FROM test\"),\n\t\t// SQLite stores boolean values as integers, so we\n\t\t// can coerce them back to bools with the CoercePair option.\n\t\tqsql.Coerce(qsql.CoercePair{Column: \"COL4\", Type: qsql.Int64ToBool}),\n\t\tqsql.SQLite(),\n\t)\n\tfmt.Println(newQf)\n\tfmt.Println(newQf.Equals(qf))\n}\n```\n\nOutput:\n\n```\nCOL1(i) COL2(f) COL3(s) COL4(b)\n------- ------- ------- -------\n      1     1.1     one    true\n      2     2.2     two    true\n      3     3.3   three    true\n\nDims = 4 x 3\ntrue \n```\n\n### Filtering\nFiltering can be done either by applying individual filters\nto the QFrame or by combining filters using AND and OR.\n\nFilter with OR-clause:\n```go\nf := qframe.New(map[string]interface{}{\"COL1\": []int{1, 2, 3}, \"COL2\": []string{\"a\", \"b\", \"c\"}})\nnewF := f.Filter(qframe.Or(\n    qframe.Filter{Column: \"COL1\", Comparator: \"\u003e\", Arg: 2},\n    qframe.Filter{Column: \"COL2\", Comparator: \"=\", Arg: \"a\"}))\nfmt.Println(newF)\n```\n\nOutput:\n```\nCOL1(i) COL2(s)\n------- -------\n      1       a\n      3       c\n\nDims = 2 x 2\n```\n\n### Grouping and aggregation\nGrouping and aggregation is done in two distinct steps. The function\nused in the aggregation step takes a slice of elements and\nreturns an element. For floats this function signature matches\nmany of the statistical functions in [Gonum](https://github.com/gonum/gonum),\nthese can hence be applied directly.\n\n```go\nintSum := func(xx []int) int {\n    result := 0\n    for _, x := range xx {\n        result += x\n    }\n    return result\n}\n\nf := qframe.New(map[string]interface{}{\"COL1\": []int{1, 2, 2, 3, 3}, \"COL2\": []string{\"a\", \"b\", \"c\", \"a\", \"b\"}})\nf = f.GroupBy(groupby.Columns(\"COL2\")).Aggregate(qframe.Aggregation{Fn: intSum, Column: \"COL1\"})\nfmt.Println(f.Sort(qframe.Order{Column: \"COL2\"}))\n```\n\nOutput:\n```\nCOL2(s) COL1(i)\n------- -------\n      a       4\n      b       5\n      c       2\n\nDims = 2 x 3\n```\n\n### Data manipulation\nThere are two different functions by which data can be manipulated,\n`Apply` and `Eval`.\n`Eval` is slightly more high level and takes a more data driven approach\nbut basically boils down to a bunch of `Apply` in the end.\n\nExample using `Apply` to string concatenate two columns:\n```go\nf := qframe.New(map[string]interface{}{\"COL1\": []int{1, 2, 3}, \"COL2\": []string{\"a\", \"b\", \"c\"}})\nf = f.Apply(\n    qframe.Instruction{Fn: function.StrI, DstCol: \"COL1\", SrcCol1: \"COL1\"},\n    qframe.Instruction{Fn: function.ConcatS, DstCol: \"COL3\", SrcCol1: \"COL1\", SrcCol2: \"COL2\"})\nfmt.Println(f.Select(\"COL3\"))\n```\n\nOutput:\n```\nCOL3(s)\n-------\n     1a\n     2b\n     3c\n\nDims = 1 x 3\n```\n\nThe same example using `Eval` instead:\n```go\nf := qframe.New(map[string]interface{}{\"COL1\": []int{1, 2, 3}, \"COL2\": []string{\"a\", \"b\", \"c\"}})\nf = f.Eval(\"COL3\", qframe.Expr(\"+\", qframe.Expr(\"str\", types.ColumnName(\"COL1\")), types.ColumnName(\"COL2\")))\nfmt.Println(f.Select(\"COL3\"))\n```\n\n## More usage examples\nExamples of the most common operations are available in the\n[docs](https://godoc.org/github.com/tobgu/qframe).\n\n## Error handling\nAll operations that may result in errors will set the `Err` variable\non the returned QFrame to indicate that an error occurred.\nThe presence of an error on the QFrame will prevent any future operations\nfrom being executed on the frame (eg. it follows a monad-like pattern).\nThis allows for smooth chaining of multiple operations without having\nto explicitly check errors between each operation.\n\n## Configuration parameters\nAPI functions that require configuration parameters make use of\n[functional options](https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis)\nto allow more options to be easily added in the future in a backwards\ncompatible way.\n\n## Design goals\n* Performance\n  - Speed should be on par with, or better than, Python Pandas for corresponding operations.\n  - No or very little memory overhead per data element.\n  - Performance impact of operations should be straight forward to reason about.\n* API\n  - Should be reasonably small and low ceremony.\n  - Should allow custom, user provided, functions to be used for data processing\n  - Should provide built in functions for most common operations\n\n## High level design\nA QFrame is a collection of columns which can be of type int, float,\nstring, bool or enum. For more information about the data types see the\n[types docs](https://godoc.org/github.com/tobgu/qframe/types).\n\nIn addition to the columns there is also an index which controls\nwhich rows in the columns that are part of the QFrame and the\nsort order of these columns.\nMany operations on QFrames only affect the index, the underlying\ndata remains the same.\n\nMany functions and methods in qframe take the empty interface as parameter,\nfor functions to be applied or string references to internal functions\nfor example.\nThese always correspond to a union/sum type with a fixed set of valid types\nthat are checked in runtime through type switches (there's hardly any\nreflection applied in QFrame for performance reasons).\nWhich types are valid depends on the function called and the column type\nthat is affected. Modelling this statically is hard/impossible in Go,\nhence the dynamic approach. If you plan to use QFrame with datasets\nwith fixed layout and types it should be a small task to write tiny\nwrappers for the types you are using to regain static type safety.\n\n## Limitations\n* The API can still not be considered stable.\n* The maximum number of rows in a QFrame is 4294967296 (2^32).\n* The CSV parser only handles ASCII characters as separators.\n* Individual strings cannot be longer than 268 Mb (2^28 byte).\n* A string column cannot contain more than a total of 34 Gb (2^35 byte).\n* At the moment you cannot rely on any of the errors returned to\n  fulfill anything else than the `Error` interface. In the future\n  this will hopefully be improved to provide more help in identifying\n  the root cause of errors.\n\n## Performance/benchmarks\nThere are a number of benchmarks in [qbench](https://github.com/tobgu/qbench)\ncomparing QFrame to Pandas and Gota where applicable.\n\n## Other data frames\nThe work on QFrame has been inspired by [Python Pandas](https://pandas.pydata.org/)\nand [Gota](https://github.com/kniren/gota).\n\n## Contribute\nWant to contribute? Great! Open an issue on Github and let the discussions\nbegin! Below are some instructions for working with the QFrame repo.\n\n### Ideas for further work\nBelow are some ideas of areas where contributions would be welcome.\n\n* Support for more input and output formats.\n* Support for additional column formats.\n* Support for using the [Arrow](https://github.com/apache/arrow) format for columns.\n* General CPU and memory optimizations.\n* Improve documentation.\n* More analytical functionality.\n* Dataset joins.\n* Improved interoperability with other libraries in the Go data science eco system.\n* Improve string representation of QFrames.\n\n### Install dependencies\n`make dev-deps`\n\n### Tests\nPlease contribute tests together with any code. The tests should be\nwritten against the public API to avoid lockdown of the implementation\nand internal structure which would make it more difficult to change in\nthe future.\n\nRun tests:\n`make test`\n\nThis will also trigger code to be regenerated.\n\n### Code generation\nThe codebase contains some generated code to reduce the amount of\nduplication required for similar functionality across different column\ntypes. Generated code is recognized by file names ending with `_gen.go`.\nThese files must never be edited directly.\n\nTo trigger code generation:\n`make generate`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftobgu%2Fqframe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftobgu%2Fqframe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftobgu%2Fqframe/lists"}