{"id":13507718,"url":"https://github.com/beatrichartz/csv","last_synced_at":"2025-04-29T14:29:23.745Z","repository":{"id":27940978,"uuid":"31433548","full_name":"beatrichartz/csv","owner":"beatrichartz","description":"CSV Decoding and Encoding for Elixir","archived":false,"fork":false,"pushed_at":"2025-01-02T10:31:40.000Z","size":420,"stargazers_count":508,"open_issues_count":7,"forks_count":94,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-22T09:48:19.240Z","etag":null,"topics":["csv","decoder","decoding","elixir","encoder","encoding","hex","parser","parsing","rfc-4180","stream"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/beatrichartz.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-02-27T18:39:55.000Z","updated_at":"2025-03-12T14:14:27.000Z","dependencies_parsed_at":"2023-11-26T05:19:20.876Z","dependency_job_id":"a94323f9-796b-4cf0-9f31-7fb1181018e8","html_url":"https://github.com/beatrichartz/csv","commit_stats":{"total_commits":394,"total_committers":40,"mean_commits":9.85,"dds":"0.20812182741116747","last_synced_commit":"1a9f852a14740ff6a37160f1e80636d231143a42"},"previous_names":[],"tags_count":45,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beatrichartz%2Fcsv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beatrichartz%2Fcsv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beatrichartz%2Fcsv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beatrichartz%2Fcsv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/beatrichartz","download_url":"https://codeload.github.com/beatrichartz/csv/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251518387,"owners_count":21602143,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","decoder","decoding","elixir","encoder","encoding","hex","parser","parsing","rfc-4180","stream"],"created_at":"2024-08-01T02:00:38.129Z","updated_at":"2025-04-29T14:29:23.729Z","avatar_url":"https://github.com/beatrichartz.png","language":"Elixir","readme":"# CSV [![Build Status](https://github.com/beatrichartz/csv/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/beatrichartz/csv) [![Coverage Status](https://coveralls.io/repos/github/beatrichartz/csv/badge.svg?branch=main)](https://coveralls.io/github/beatrichartz/csv?branch=main) [![Codacy Badge](https://app.codacy.com/project/badge/Grade/2b1154889a3f4d1681bf40a89834271c)](https://www.codacy.com/gh/beatrichartz/csv/dashboard?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=beatrichartz/csv\u0026amp;utm_campaign=Badge_Grade) [![Hex pm](http://img.shields.io/hexpm/v/csv.svg?style=flat)](https://hex.pm/packages/csv) [![Hex Docs](https://img.shields.io/badge/hex-docs-lightgreen.svg)](https://hexdocs.pm/csv/) [![License](https://img.shields.io/hexpm/l/csv.svg)](https://github.com/beatrichartz/csv/blob/main/LICENSE) [![Downloads](https://img.shields.io/hexpm/dw/csv.svg?style=flat)](https://hex.pm/packages/csv)\n\n[RFC 4180](http://tools.ietf.org/html/rfc4180) compliant, composable CSV parsing and encoding for Elixir.\n\n## Installation\n\nAdd\n```elixir\n{:csv, \"~\u003e 3.2\"}\n```\nto your deps in `mix.exs` like so:\n\n```elixir\ndefp deps do\n  [{:csv, \"~\u003e 3.2\"}]\nend\n```\n\n## Getting all correctly formatted rows\nCSV is a notoriously fickle format, with many implementations and files interpreting it differently.\n\nFor that reason, `CSV` implements a normal mode `CSV.decode` that will return a stream of `ok: [\"field1\", \"field2\"]`\nand `err: \"Message\"` tuples. It will also **reparse lines after a previous line has opened an unterminated escape sequence**,\nensuring you get all correctly formatted rows.\n\nThe goal of this library is to allow to extract all correctly formatted rows, while displaying descriptive errors for \nincorrectly formatted rows.\n\nIn strict mode using `CSV.decode!` the library will raise an exception when it encounters the first error, aborting the\noperation.\n\n## Performance\nThis library uses fast binary matching and is able to parse about half a million rows of a moderately complex CSV file \nper second in a single process on a small cloud instance spec (2vCPU, 2GB Memory). CSV parsing will unlikely become a \nbottleneck in your data pipeline.\n\nIf you are reading from a large file, `CSV` will perform best when streaming with `:read_ahead` in byte mode:\n\n```elixir\nFile.stream!(\"data.csv\", [read_ahead: 100_000], 1000) |\u003e CSV.decode()\n```\n\nWhile `1000` is usually a good default number of bytes to stream, you should measure performance and fine-tune\nbyte size according to your environment.\n\n## Upgrading from 2.x\nThe main goal for 3.x has been to streamline the library API and leverage binary matching. \n\n#### Upgrading should require few to no changes in most cases\n\n- **Parallelism has been removed**, alongside its options `:num_workers` and `:worker_work_ratio`. You can safely remove them.\n- `CSV` now expects line breaks to be present in the data. If you used to parse strings by applying `String.split/2` before \n  passing it to decode, you can do the same now feeding in\n  the string as a single item of a list:\n  ```elixir\n  [\"a,b,c\\nd,e,f\"] |\u003e CSV.decode()\n  ```\n- **`StrayQuoteError` is now `StrayEscapeCharacterError`**. If you catch this error in your code, you need to rename it.\n- **The `:strip_fields` option needs to be replaced** with the `:field_transform` option:\n  ```elixir\n  File.stream!(\"data.csv\") |\u003e CSV.decode(field_transform: \u0026String.trim/1)\n  ```\n- **`:validate_row_length` now defaults to `false`**. This option produces an error for rows with different length. Set it\n  to `true` to get the same behaviour as in 2.x\n- **`:escape_formulas` is now `:unescape_formulas` for `decode` and `decode!`.** It is still `:escape_formulas` for\n  `encode`. Change `:escape_formulas` to `:unescape_formulas` in `decode` calls to get the same behaviour as in 2.x\n- **`:escape_max_lines` now defaults to `10`** instead of `1000`. To get the same behaviour as in 2.x, use:\n  ```elixir\n  File.stream!(\"data.csv\") |\u003e CSV.decode(escape_max_lines: 1000)\n  ```\n- **`:replace` has been removed**. `CSV` will now return fields with incorrect encoding as-is. \n  You can use the new `:field_transform` option to provide a function transforming fields while they are being parsed. \n  This allows to e.g. replace incorrect encoding:\n  ```elixir\n  defp replace_bad_encoding(field) do\n    if String.valid?(field) do\n      field\n    else\n      field\n      |\u003e String.codepoints()\n      |\u003e Enum.map(fn codepoint -\u003e if String.valid?(codepoint), do: codepoint, else: \"?\" end)\n      |\u003e Enum.join()\n    end\n  end\n\n  File.stream!(\"data.csv\") |\u003e CSV.decode(field_transform: \u0026replace_bad_encoding/1)\n  ```\n\n**That's it!** Please open an issue if you see any other non-backward compatible behaviour so it can be documented.\n\n### Elixir version requirements\n* Elixir `1.5.0` is required for all versions above `2.5.0`.\n* Elixir `1.1.0` is required for all versions above `1.1.5`.\n\n## Design Goals\nThis library aims to to solve concerns related to csv parsing in data pipelines, following the UNIX philosophy:\nIt consumes streams or enumerables, producing streams of lists, maps or tuples depending on configuration. This simplifies\n using it in data pipelines, where CSV encoding or decoding is only one of the processing steps.\n\n## Usage\n`CSV` can decode and encode from and to a stream of bytes or lines.\n\n### Decoding\n\nDo this to decode data:\n\n````elixir\n# Decode file line by line\nFile.stream!(\"data.csv\")\n  |\u003e CSV.decode()\n\n# Decode a UTF-16 file with BOM\nFile.stream!([:trim_bom, encoding: {:utf16, :little}])\n  |\u003e CSV.decode()\n\n# Decode file in chunks of 1000 bytes\nFile.stream!(\"data.csv\", [], 1000) \n  |\u003e CSV.decode()\n\n# Decode a csv formatted string\n[\"long,csv,string\\\\nwith,multiple,lines\"] \n  |\u003e CSV.decode()\n\n# Decode a list of arbitrarily chunked csv data\n[\"list,\", \"of,arbitrarily\", \"\\\\nchun\", \"ked,csv,data\\\\n\"] \n  |\u003e CSV.decode()\n````\n\nAnd you'll get a stream of row tuples:\n````elixir\n[ok: [\"a\", \"b\"], ok: [\"c\", \"d\"]]\n````\n\nAnd, potentially error tuples:\n````elixir\n[error: \"\", ok: [\"c\", \"d\"]]\n````\n\nUse strict mode `decode!` to get a two-dimensional list, raising errors as they\noccur, aborting the operation:\n````elixir\nFile.stream!(\"data.csv\") |\u003e CSV.decode!\n````\n\nUnredact source data in exceptions that `decode!` throws:\n```elixir\nFile.stream!(\"data.csv\") |\u003e CSV.decode!(unredact_exceptions: true)\n```\n\n\n#### Options\n\nFor all available options [check the docs on `decode`](https://hexdocs.pm/csv/CSV.html#decode/2)\n[and `decode!`](https://hexdocs.pm/csv/CSV.html#decode!/2)\n\nSpecify a semicolon separator:\n\n````elixir\nstream |\u003e CSV.decode(separator: ?;)\n````\n\nSpecify a custom escape character:\n\n````elixir\nstream |\u003e CSV.decode(escape_character: ?@)\n````\n\nApply a transformation to a field when parsed, e.g. trimming the field:\n\n````elixir\nstream |\u003e CSV.decode(field_transform: \u0026String.trim/1)\n````\n\nUnescape formulas that have been escaped:\n\n````elixir\nstream |\u003e CSV.decode(unescape_formulas: true)\n````\n\nRedact source data in error tuples producted by decode:\n\n````elixir\nstream |\u003e CSV.decode(redact_errors: true)\n````\n\n\n### Encoding\n\nDo this to encode a table (two-dimensional list):\n\n````elixir\ntable_data |\u003e CSV.encode\n````\n\nAnd you'll get a stream of lines ready to be written to an IO.\nSo, this is writing to a file:\n\n````elixir\nfile = File.open!(\"test.csv\", [:write, :utf8])\ntable_data |\u003e CSV.encode |\u003e Enum.each(\u0026IO.write(file, \u00261))\n````\n\n#### Options\n\nUse a semicolon separator:\n\n````elixir\nyour_data |\u003e CSV.encode(separator: ?;)\n````\n\nUse a specific escape character:\n\n````elixir\nyour_data |\u003e CSV.encode(escape_character: ?@)\n````\n\nYou can also specify headers when encoding, which will encode map values into\nthe right place:\n\n````elixir\n[%{\"a\" =\u003e \"value!\"}] |\u003e CSV.encode(headers: [\"z\", \"a\"])\n# [\"z,a\\\\r\\\\n\", \",value!\\\\r\\\\n\"]\n````\n\nYou can also specify a keyword list, the keys of the list will be used as the keys for the rows, \nbut the values will be the value used for the header row name in CSV output\n\n````elixir\n[%{a: \"value!\"}] |\u003e CSV.encode(headers: [a: \"x\", b: \"y\"])\n# [\"x,y\\\\r\\\\n\", \"value!,\\\\r\\\\n\"]\n````\n\nYou'll surely appreciate some [more info on `encode`](https://hexdocs.pm/csv/CSV.html#encode/2).\n\n#### Polymorphic encoding\n\nMake sure your data gets encoded the way you want - implement the `CSV.Encode`\nprotocol for whatever you wish to encode:\n\n````elixir\ndefimpl CSV.Encode, for: MyData do\n  def encode(%MyData{has: fun}, env \\\\ []) do\n    \"so much #{fun}\" |\u003e CSV.Encode.encode(env)\n  end\nend\n````\n\nOr similar.\n\n#### Ensure performant encoding\n\nThe encoding protocol implements a fallback to Any for types where a simple call\no `to_string` will provide unambiguous results. Protocol dispatch for the\nfallback to Any is *very* slow when protocols are not consolidated, so make sure\nyou [have `consolidate_protocols: true`](http://blog.plataformatec.com.br/2015/04/build-embedded-and-start-permanent-in-elixir-1-0-4/)\nin your `mix.exs` or you consolidate protocols manually for production in order\nto get good performance.\n\nThere is more to know about everything :tm: - [Check the doc](http://hexdocs.pm/csv/)\n\n## Contributions \u0026 Bugfixes are most welcome!\n\nPlease make sure to add tests. I will not look at PRs that are\neither failing or lowering coverage. Also, solve one problem at\na time.\n\n## Copyright and License\n\nCopyright (c) 2022 Beat Richartz\n\nCSV source code is licensed under the [MIT License](https://github.com/beatrichartz/csv/blob/main/LICENSE).\n","funding_links":[],"categories":["CSV","Elixir"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeatrichartz%2Fcsv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbeatrichartz%2Fcsv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeatrichartz%2Fcsv/lists"}