{"id":18840232,"url":"https://github.com/maxim2266/csvplus","last_synced_at":"2025-04-14T07:07:14.831Z","repository":{"id":57481396,"uuid":"68197197","full_name":"maxim2266/csvplus","owner":"maxim2266","description":"csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.","archived":false,"fork":false,"pushed_at":"2021-07-22T21:42:11.000Z","size":88,"stargazers_count":67,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-14T07:07:08.001Z","etag":null,"topics":["csv","csv-format","etl","etl-framework","etl-pipeline","fluent-interface","go","go-csv","stream-processing"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxim2266.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-09-14T10:31:22.000Z","updated_at":"2024-11-14T15:10:31.000Z","dependencies_parsed_at":"2022-09-26T17:50:30.355Z","dependency_job_id":null,"html_url":"https://github.com/maxim2266/csvplus","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fcsvplus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fcsvplus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fcsvplus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fcsvplus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxim2266","download_url":"https://codeload.github.com/maxim2266/csvplus/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248837278,"owners_count":21169374,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","csv-format","etl","etl-framework","etl-pipeline","fluent-interface","go","go-csv","stream-processing"],"created_at":"2024-11-08T02:46:55.737Z","updated_at":"2025-04-14T07:07:14.804Z","avatar_url":"https://github.com/maxim2266.png","language":"Go","readme":"# csvplus\n\n[![GoDoc](https://godoc.org/github.com/maxim2266/csvplus?status.svg)](https://pkg.go.dev/github.com/maxim2266/csvplus)\n[![Go Report Card](https://goreportcard.com/badge/github.com/maxim2266/csvplus)](https://goreportcard.com/report/github.com/maxim2266/csvplus)\n[![License: BSD 3-Clause](https://img.shields.io/badge/License-BSD_3--Clause-yellow.svg)](https://opensource.org/licenses/BSD-3-Clause)\n\nPackage `csvplus` extends the standard Go [encoding/csv](https://golang.org/pkg/encoding/csv/)\npackage with fluent interface, lazy stream processing operations, indices and joins.\n\nThe library is primarily designed for [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load)-like processes.\nIt is mostly useful in places where the more advanced searching/joining capabilities of a fully-featured SQL\ndatabase are not required, but the same time the data transformations needed still include SQL-like operations.\n\n##### License: BSD\n\n### Examples\n\nSimple sequential processing:\n```Go\npeople := csvplus.FromFile(\"people.csv\").SelectColumns(\"name\", \"surname\", \"id\")\n\nerr := csvplus.Take(people).\n\tFilter(csvplus.Like(csvplus.Row{\"name\": \"Amelia\"})).\n\tMap(func(row csvplus.Row) csvplus.Row { row[\"name\"] = \"Julia\"; return row }).\n\tToCsvFile(\"out.csv\", \"name\", \"surname\")\n\nif err != nil {\n\treturn err\n}\n```\n\nMore involved example:\n```Go\ncustomers := csvplus.FromFile(\"people.csv\").SelectColumns(\"id\", \"name\", \"surname\")\ncustIndex, err := csvplus.Take(customers).UniqueIndexOn(\"id\")\n\nif err != nil {\n\treturn err\n}\n\nproducts := csvplus.FromFile(\"stock.csv\").SelectColumns(\"prod_id\", \"product\", \"price\")\nprodIndex, err := csvplus.Take(products).UniqueIndexOn(\"prod_id\")\n\nif err != nil {\n\treturn err\n}\n\norders := csvplus.FromFile(\"orders.csv\").SelectColumns(\"cust_id\", \"prod_id\", \"qty\", \"ts\")\niter := csvplus.Take(orders).Join(custIndex, \"cust_id\").Join(prodIndex)\n\nreturn iter(func(row csvplus.Row) error {\n\t// prints lines like:\n\t//\tJohn Doe bought 38 oranges for £0.03 each on 2016-09-14T08:48:22+01:00\n\t_, e := fmt.Printf(\"%s %s bought %s %ss for £%s each on %s\\n\",\n\t\trow[\"name\"], row[\"surname\"], row[\"qty\"], row[\"product\"], row[\"price\"], row[\"ts\"])\n\treturn e\n})\n```\n\n### Design principles\n\nThe package functionality is based on the operations on the following entities:\n- type `Row`\n- type `DataSource`\n- type `Index`\n\n#### Type `Row`\n`Row` represents one row from a `DataSource`. It is a map from column names\nto the string values under those columns on the current row. The package expects a unique name\nassigned to every column at source. Compared to using integer indices this provides more\nconvenience when complex transformations get applied to each row during processing.\n\n#### type `DataSource`\nType `DataSource` represents any source of zero or more rows, like `.csv` file. This is a function\nthat when invoked feeds the given callback with the data from its source, one `Row` at a time.\nThe type also has a number of operations defined on it that provide for easy composition of the\noperations on the `DataSource`, forming so called [fluent interface](https://en.wikipedia.org/wiki/Fluent_interface).\nAll these operations are 'lazy', i.e. they are not performed immediately, but instead each of them\nreturns a new `DataSource`.\n\nThere is also a number of convenience operations that actually invoke\nthe `DataSource` function to produce a specific type of output:\n- `IndexOn` to build an index on the specified column(s);\n- `UniqueIndexOn` to build a unique index on the specified column(s);\n- `ToCsv` to serialise the `DataSource` to the given `io.Writer` in `.csv` format;\n- `ToCsvFile` to store the `DataSource` in the specified file in `.csv` format;\n- `ToJSON` to serialise the `DataSource` to the given `io.Writer` in JSON format;\n- `ToJSONFile` to store the `DataSource` in the specified file in JSON format;\n- `ToRows` to convert the `DataSource` to a slice of `Row`s.\n\n#### Type `Index`\n`Index` is a sorted collection of rows. The sorting is performed on the columns specified when the index\nis created. Iteration over an index yields a sorted sequence of rows. An `Index` can be joined with\na `DataSource`. The type has operations for finding rows and creating sub-indices in O(log(n)) time.\nAnother useful operation is resolving duplicates. Building an index takes O(n*log(n)) time. It should\nbe noted that the `Index` building operation requires the entire dataset to be read into\nthe memory, so certain care should be taken when indexing huge datasets. An index can also be\nstored to, or loaded from a disk file.\n\nFor more details see the [documentation](https://godoc.org/github.com/maxim2266/csvplus).\n\n### Project status\nThe project is in a usable state usually called \"beta\". Tested on Linux Mint 18.3 using Go version 1.10.2.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxim2266%2Fcsvplus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxim2266%2Fcsvplus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxim2266%2Fcsvplus/lists"}