{"id":22035518,"url":"https://github.com/amperity/separator","last_synced_at":"2025-05-07T19:42:56.737Z","repository":{"id":57750600,"uuid":"521696578","full_name":"amperity/separator","owner":"amperity","description":"An efficient and defensive codec for CSV and other delimiter-separated value formats","archived":false,"fork":false,"pushed_at":"2023-05-08T20:51:43.000Z","size":75,"stargazers_count":18,"open_issues_count":0,"forks_count":0,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-13T19:27:50.545Z","etag":null,"topics":["clojure","codec","csv","tsv"],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amperity.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-05T15:54:44.000Z","updated_at":"2024-06-14T21:17:08.000Z","dependencies_parsed_at":"2024-12-01T07:48:33.859Z","dependency_job_id":null,"html_url":"https://github.com/amperity/separator","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amperity%2Fseparator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amperity%2Fseparator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amperity%2Fseparator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amperity%2Fseparator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amperity","download_url":"https://codeload.github.com/amperity/separator/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252945810,"owners_count":21829662,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","codec","csv","tsv"],"created_at":"2024-11-30T10:25:11.251Z","updated_at":"2025-05-07T19:42:56.709Z","avatar_url":"https://github.com/amperity.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"Separator\n=========\n\n[![CircleCI](https://circleci.com/gh/amperity/separator.svg?style=shield\u0026circle-token=1b358576395c3758b3a88b5d265862ca91b0fa2b)](https://circleci.com/gh/amperity/separator)\n[![codecov](https://codecov.io/gh/amperity/separator/branch/main/graph/badge.svg)](https://codecov.io/gh/amperity/separator)\n[![cljdoc](https://cljdoc.org/badge/com.amperity/separator)](https://cljdoc.org/d/com.amperity/separator/CURRENT)\n\nA Clojure library for working with [Delimiter-Separated Value](https://en.wikipedia.org/wiki/Delimiter-separated_values)\ndata. This includes a customizable defensive parser and a simple writer.\n\nYou might be interested in using this instead of the common\n[clojure.data.csv](https://github.com/clojure/data.csv) or a more mainstream\ncodec like [Jackson](https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv)\nbecause [CSV is a terrible format](http://fuckcsv.com) and you'll often need to\ndeal with messy, malformed, and downright bizarre data files.\n\n\n## Usage\n\nReleases are published on Clojars; to use the latest version with Leiningen,\nadd the following to your project dependencies:\n\n[![Clojars Project](http://clojars.org/com.amperity/separator/latest-version.svg)](http://clojars.org/com.amperity/separator)\n\nThe main namespace entrypoint is `separator.io`, which contains both the\nreading and writing interfaces.\n\n```clojure\n=\u003e (require '[separator.io :as separator])\n```\n\n### Reading\n\nOne of the significant features of this library is safety valves on parsing to\ndeal with bad input data. The parser does its best to recover from these errors\nand present meaningful data about the problems to the consumer. This includes\nlimiting the maximum cell size and the maximum row width.\n\nTo parse data into a sequence of rows, use the `read-rows` function. This\naccepts many kinds of inputs, including directly reading string data:\n\n```clojure\n=\u003e (vec (separator/read-rows \"A,B,C\\nD,E,F\\nG,H,I\\n\"))\n[[\"A\" \"B\" \"C\"] [\"D\" \"E\" \"F\"] [\"G\" \"H\" \"I\"]]\n\n;; quoted cells can embed newlines\n=\u003e (vec (separator/read-rows \"A,B,C\\nD,E,\\\"F\\nG\\\",H,I\\n\"))\n[[\"A\" \"B\" \"C\"] [\"D\" \"E\" \"F\\nG\" \"H\" \"I\"]]\n\n;; parse errors are included in the sequence by default\n=\u003e (vec (separator/read-rows \"A,B,C\\nD,\\\"\\\"E,F\\nG,H,I\\n\"))\n[[\"A\" \"B\" \"C\"] #\u003cseparator.io.ParseException@34b69fbe :malformed-quote 2:4\u003e [\"G\" \"H\" \"I\"]]\n\n;; the error mode can also omit them\n=\u003e (vec (separator/read-rows \"A,B,C\\nD,\\\"\\\"E,F\\nG,H,I\\n\" :error-mode :ignore))\n[[\"A\" \"B\" \"C\"] [\"G\" \"H\" \"I\"]]\n\n;; ...or throw them\n=\u003e (vec (separator/read-rows \"A,B,C\\nD,\\\"\\\"E,F\\nG,H,I\\n\" :error-mode :throw))\n;; Execution error (ParseException) at separator.io.Parser/parseError (Parser.java:87).\n;; Unexpected character following quote: E\n\n;; the errors carry data:\n=\u003e (ex-data *e)\n{:column 4,\n :line 2,\n :message \"Unexpected character following quote: E\",\n :partial-cell \"\",\n :partial-row [\"D\"],\n :skipped-text \"E...F\",\n :type :malformed-quote}\n```\n\nThe parser also supports customizable quote, separator, and escape characters.\nEscapes are not part of the CSV standard but show up often in practice, so we\nneed to deal with them.\n\n```clojure\n=\u003e (vec (separator/read-rows \"A|B|C\\nD|E|^F\\nG^|H|I\\n\" :separator \\| :quote \\^))\n[[\"A\" \"B\" \"C\"] [\"D\" \"E\" \"F\\nG\" \"H\" \"I\"]]\n\n=\u003e (vec (separator/read-rows \"A,B,C\\\\\\nD,E,F\\nG,H,I\\n\" :escape \\\\))\n[[\"A\" \"B\" \"C\\\\nD\" \"E\" \"F\"] [\"G\" \"H\" \"I\"]]\n```\n\nAdditionally, there's a convenience wrapper using the `zip-headers` transducer\nto read a sequence of map records instead, by utilizing a row of headers:\n\n```clojure\n=\u003e (vec (separator/read-records \"name,age,role\\nPhillip Fry,26,Delivery Boy\\nTuranga Leela,28,Ship Pilot\\nHubert Farnsworth,160,Professor\\n\"))\n[{\"age\" \"26\", \"name\" \"Phillip Fry\", \"role\" \"Delivery Boy\"}\n {\"age\" \"28\", \"name\" \"Turanga Leela\", \"role\" \"Ship Pilot\"}\n {\"age\" \"160\", \"name\" \"Hubert Farnsworth\", \"role\" \"Professor\"}]\n```\n\n### Writing\n\nThe library also provides tools for writing delimiter-separated data from a\nsequence of rows using the `write-rows` function. This takes a `Writer` to print the\ndata to and a similar set of options to control the output format:\n\n```clojure\n=\u003e (separator/write-rows *out* [[\"A\" \"B\" \"C\"] [\"D\" \"E\" \"F\"] [\"G\" \"H\" \"I\"]])\n;; A,B,C\n;; D,E,F\n;; G,H,I\n3\n\n;; cells containing the quote or separator character are automatically quoted\n=\u003e (separator/write-rows *out* [[\"A\" \"B,B\" \"C\"] [\"D\" \"E\" \"F\\\"F\"]])\n;; A,\"B,B\",C\n;; D,E,\"F\"\"F\"\n2\n\n;; you can also force quoting for all cells\n=\u003e (separator/write-rows *out* [[\"A\" \"B\" \"C\"] [\"D\" \"E\" \"F\"] [\"G\" \"H\" \"I\"]] :quote? true)\n;; \"A\",\"B\",\"C\"\n;; \"D\",\"E\",\"F\"\n;; \"G\",\"H\",\"I\"\n3\n\n;; or provide a predicate to control quoting\n=\u003e (separator/write-rows *out* [[\"A\" \"B\" \"C\"] [\"D\" \"E\" \"F\"] [\"G\" \"H\" \"I\"]] :quote? #{\"E\"})\n;; A,B,C\n;; D,\"E\",F\n;; G,H,I\n3\n```\n\n\n## Performance\n\nSeparator prioritizes defensiveness over speed, but aims to be as performant as\npossible within those constraints. For comparison, it's faster than `data.csv`\nbut significantly slower than Jackson:\n\n```clojure\n=\u003e (crit/quick-bench (consume! (separator/read-rows test-file)))\n;; Evaluation count : 6 in 6 samples of 1 calls.\n;;              Execution time mean : 5.544234 sec\n;;     Execution time std-deviation : 78.630488 ms\n;;    Execution time lower quantile : 5.481820 sec ( 2.5%)\n;;    Execution time upper quantile : 5.667485 sec (97.5%)\n;;                    Overhead used : 6.824396 ns\n\n=\u003e (crit/quick-bench (consume! (data-csv-read test-file)))\n;; Evaluation count : 6 in 6 samples of 1 calls.\n;;              Execution time mean : 10.253641 sec\n;;     Execution time std-deviation : 121.221011 ms\n;;    Execution time lower quantile : 10.146078 sec ( 2.5%)\n;;    Execution time upper quantile : 10.436205 sec (97.5%)\n;;                    Overhead used : 6.943926 ns\n\n=\u003e (crit/quick-bench (consume! (jackson-read test-file)))\n;; Evaluation count : 6 in 6 samples of 1 calls.\n;;              Execution time mean : 2.325301 sec\n;;     Execution time std-deviation : 40.611328 ms\n;;    Execution time lower quantile : 2.296693 sec ( 2.5%)\n;;    Execution time upper quantile : 2.390772 sec (97.5%)\n;;                    Overhead used : 6.824396 ns\n```\n\nThe test above was performed on a 2021 MacBook Pro with `data.csv` version\n1.0.1 and `jackson-dataformat-csv` version 2.13.0 on a 330 MB CSV file with\n12.4 million rows.\n\nOf course, all the speed in the world won't save you from a misplaced quote:\n\n```clojure\n=\u003e (spit \"simple-err.csv\" \"A,B,C\\nD,\\\"\\\"E,F\\nG,H,I\\n\")\n\n=\u003e (consume! (separator/read-rows (io/file \"simple-err.csv\")))\n3\n\n=\u003e (consume! (data-csv-read (io/file \"simple-err.csv\")))\n;; Execution error at clojure.data.csv/read-quoted-cell (csv.clj:37).\n;; CSV error (unexpected character: E)\n\n=\u003e (consume! (jackson-read (io/file \"simple-err.csv\")))\n;; Execution error (JsonParseException) at com.fasterxml.jackson.core.JsonParser/_constructError (JsonParser.java:2337).\n;; Unexpected character ('E' (code 69)): Expected column separator character (',' (code 44)) or end-of-line\n;;  at [Source: (com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader); line: 2, column: 6]\n```\n\n\n## License\n\nCopyright © 2022 Amperity, Inc.\n\nDistributed under the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famperity%2Fseparator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famperity%2Fseparator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famperity%2Fseparator/lists"}