{"id":31285691,"url":"https://github.com/igrishaev/pg-bin","last_synced_at":"2026-05-18T09:07:33.374Z","repository":{"id":313965470,"uuid":"1052831314","full_name":"igrishaev/pg-bin","owner":"igrishaev","description":"Parse binary Postgres COPY output","archived":false,"fork":false,"pushed_at":"2025-09-15T16:14:13.000Z","size":70,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-12-13T05:00:19.744Z","etag":null,"topics":["binary","clojure","copy","postgres"],"latest_commit_sha":null,"homepage":"https://github.com/igrishaev/pg-bin","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/igrishaev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-08T15:56:43.000Z","updated_at":"2025-09-24T03:29:12.000Z","dependencies_parsed_at":"2025-09-13T17:51:19.563Z","dependency_job_id":null,"html_url":"https://github.com/igrishaev/pg-bin","commit_stats":null,"previous_names":["igrishaev/pg-copy","igrishaev/pg-bin"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/igrishaev/pg-bin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrishaev%2Fpg-bin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrishaev%2Fpg-bin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrishaev%2Fpg-bin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrishaev%2Fpg-bin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/igrishaev","download_url":"https://codeload.github.com/igrishaev/pg-bin/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrishaev%2Fpg-bin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33172173,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-18T05:43:36.989Z","status":"ssl_error","status_checked_at":"2026-05-18T05:43:19.133Z","response_time":71,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary","clojure","copy","postgres"],"created_at":"2025-09-24T08:12:07.453Z","updated_at":"2026-05-18T09:07:33.368Z","avatar_url":"https://github.com/igrishaev.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PG.bin\n\nA library to parse Postgres COPY dumps made in binary format.\n\nPostgres has a great API to transfer data into and out from a database called\nCOPY. What is special about it is that it supports three different formats: CSV,\ntext and binary. Both CSV and text are trivial: values are passed using their\ntext representation. Only quoting rules and separating characters differ.\n\nBinary format is special in that direction that values are not text. They're\npassed exactly how they're stored in Postgres. Thus, binary format is more\ncompact: it's 30% less in size than CSV or text. The same applies to\nperformance: COPY-ing a binary data back and forth takes about 15-25% less time.\n\nTo parse a binary dump, one must know its structure. This is what the library\ndoes: it knows how to parse such dumps. It supports most of the built-in\nPostgres types including JSON(b). The API is simple an extensible.\n\n## Installation\n\nAdd this to your project:\n\n~~~clojure\n;; lein\n[com.github.igrishaev/pg-bin \"0.1.0\"]\n\n;; deps\ncom.github.igrishaev/pg-bin {:mvn/version \"0.1.0\"}\n~~~\n\n## Usage\n\nLet's prepare a binary dump as follows:\n\n~~~sql\ncreate temp table test(\n    f_01 int2,\n    f_02 int4,\n    f_03 int8,\n    f_04 boolean,\n    f_05 float4,\n    f_06 float8,\n    f_07 text,\n    f_08 varchar(12),\n    f_09 time,\n    f_10 timetz,\n    f_11 date,\n    f_12 timestamp,\n    f_13 timestamptz,\n    f_14 bytea,\n    f_15 json,\n    f_16 jsonb,\n    f_17 uuid,\n    f_18 numeric(12,3),\n    f_19 text null,\n    f_20 decimal\n);\n\ninsert into test values (\n    1,\n    2,\n    3,\n    true,\n    123.456,\n    654.321,\n    'hello',\n    'world',\n    '10:42:35',\n    '10:42:35+0030',\n    '2025-11-30',\n    '2025-11-30 10:42:35',\n    '2025-11-30 10:42:35.123567+0030',\n    '\\xDEADBEEF',\n    '{\"foo\": [1, 2, 3, {\"kek\": [true, false, null]}]}',\n    '{\"foo\": [1, 2, 3, {\"kek\": [true, false, null]}]}',\n    '4bda6037-1c37-4051-9898-13b82f1bd712',\n    '123456.123456',\n    null,\n    '123999.999100500'\n);\n\n\\copy test to '/Users/ivan/dump.bin' with (format binary);\n~~~\n\nLet's peek what's inside:\n\n~~~text\nxxd -d /Users/ivan/dump.bin\n\n00000000: 5047 434f 5059 0aff 0d0a 0000 0000 0000  PGCOPY..........\n00000016: 0000 0000 1400 0000 0200 0100 0000 0400  ................\n00000032: 0000 0200 0000 0800 0000 0000 0000 0300  ................\n00000048: 0000 0101 0000 0004 42f6 e979 0000 0008  ........B..y....\n00000064: 4084 7291 6872 b021 0000 0005 6865 6c6c  @.r.hr.!....hell\n00000080: 6f00 0000 0577 6f72 6c64 0000 0008 0000  o....world......\n00000096: 0008 fa0e 9cc0 0000 000c 0000 0008 fa0e  ................\n00000112: 9cc0 ffff f8f8 0000 0004 0000 24f9 0000  ............$...\n00000128: 0008 0002 e7cc 4a0a fcc0 0000 0008 0002  ......J.........\n00000144: e7cb dec3 0d6f 0000 0004 dead beef 0000  .....o..........\n00000160: 0030 7b22 666f 6f22 3a20 5b31 2c20 322c  .0{\"foo\": [1, 2,\n00000176: 2033 2c20 7b22 6b65 6b22 3a20 5b74 7275   3, {\"kek\": [tru\n00000192: 652c 2066 616c 7365 2c20 6e75 6c6c 5d7d  e, false, null]}\n00000208: 5d7d 0000 0031 017b 2266 6f6f 223a 205b  ]}...1.{\"foo\": [\n00000224: 312c 2032 2c20 332c 207b 226b 656b 223a  1, 2, 3, {\"kek\":\n00000240: 205b 7472 7565 2c20 6661 6c73 652c 206e   [true, false, n\n00000256: 756c 6c5d 7d5d 7d00 0000 104b da60 371c  ull]}]}....K.`7.\n00000272: 3740 5198 9813 b82f 1bd7 1200 0000 0e00  7@Q..../........\n00000288: 0300 0100 0000 0300 0c0d 8004 ceff ffff  ................\n00000304: ff00 0000 1000 0400 0100 0000 0900 0c0f  ................\n00000320: 9f27 0700 32ff ff                        .'..2..\n~~~\n\nNow the library comes into play:\n\n~~~clojure\n(ns some.ns\n  (:require\n   [clojure.java.io :as io]\n   [pg-bin.core :as copy]\n   taggie.core))\n\n(def FIELDS\n  [:int2\n   :int4\n   :int8\n   :boolean\n   :float4\n   :float8\n   :text\n   :varchar\n   :time\n   :timetz\n   :date\n   :timestamp\n   :timestamptz\n   :bytea\n   :json\n   :jsonb\n   :uuid\n   :numeric\n   :text\n   :decimal])\n\n(copy/parse \"/Users/ivan/dump.bin\" FIELDS)\n\n[[1\n  2\n  3\n  true\n  (float 123.456)\n  654.321\n  \"hello\"\n  \"world\"\n  #LocalTime \"10:42:35\"\n  #OffsetTime \"10:42:35+00:30\"\n  #LocalDate \"2025-11-30\"\n  #LocalDateTime \"2025-11-30T10:42:35\"\n  #OffsetDateTime \"2025-11-30T10:12:35.123567Z\"\n  (=bytes [-34, -83, -66, -17])\n  \"{\\\"foo\\\": [1, 2, 3, {\\\"kek\\\": [true, false, null]}]}\"\n  \"{\\\"foo\\\": [1, 2, 3, {\\\"kek\\\": [true, false, null]}]}\"\n  #uuid \"4bda6037-1c37-4051-9898-13b82f1bd712\"\n  123456.123M\n  nil\n  123999.999100500M]]\n~~~\n\n[taggie]: https://github.com/igrishaev/taggie\n\nHere and below: I use [Taggie][taggie] to render complex values like date \u0026\ntime, byte arrays and so on. Really useful!\n\nThis is what is going on here: we parse a source pointing to a dump using the\n`parse` function. A source might be a file, a byte array, an input stream and so\non -- anything that can be coerced to an input stream using the\n`clojure.java.io/input-stream` function.\n\nBinary files produced by Postgres don't know their structure. Unfortunately,\nthere is no information about types, only data. One should help the library\ntraverse a binary dump by specifying a vector of types. The `FIELDS` variable\ndeclares the structure of the file. See below what types are supported.\n\n## API\n\nThere are two functions to parse, namely:\n\n- `pg-bin.core/parse` accepts any source and returns a vector of parsed\n  lines. This function is eager meaning it consumes the whole source and\n  accumulates lines in a vector.\n\n- `pg-bin.core/parse-seq` accepts an `InputStream` and returns a lazy sequence\n  of parsed lines. It must be called under the `with-open` macro as follows:\n\n~~~clojure\n(with-open [in (io/input-stream \"/Users/ivan/dump.bin\")]\n  (let [lines (copy/parse-seq in FIELDS)]\n    (doseq [line lines]\n      ...)))\n~~~\n\nBoth functions accept a list of fields as the second argument.\n\n## Skipping fields\n\nWhen parsing, it's likely that you don't need all fields to be parsed. You may\nkeep only the leading ones:\n\n~~~clojure\n(copy/parse DUMP_PATH [:int2 :int4 :int8])\n[[1 2 3]]\n~~~\n\nTo skip fields located in the middle, use either `:skip` or an underscore:\n\n~~~clojure\n(copy/parse DUMP_PATH [:int2 :skip :_ :boolean])\n[[1 true]]\n~~~\n\n## Raw fields\n\nIf, for any reason, you have a type in your dump that the library is not aware\nabout, or you'd like to examine its binary representation, specify `:raw` or\n`:bytes`. Each value will be a byte array then. It's up to you how to deal with\nthose bytes:\n\n~~~clojure\n(copy/parse DUMP_PATH [:raw :raw :bytes])\n[[#bytes [0, 1]\n  #bytes [0, 0, 0, 2]\n  #bytes [0, 0, 0, 0, 0, 0, 0, 3]]]\n~~~\n\n## Handling JSON\n\nPostgres is well-known for its vast JSON capabilities, and sometimes tables that\nwe dump have json(b) columns. Above, you saw that by default, they're parsed as\nplain strings. This is because there is no a built-in JSON parser in Java and I\ndon't want to tie this library to a certain JSON implementation.\n\nBut the library provides a number of macros to extend undelrying\nmulti-methods. With a line of code, you can enable parsing json(b) types with\nChesire, Jsonista, Clojure.data.json, Charred, and JSam. This is how to do it:\n\n~~~clojure\n(ns some.ns\n  (:require\n   [pg-bin.core :as copy]\n   [pg-bin.json :as json]))\n\n(json/set-cheshire keyword) ;; overrides multimethods\n\n(copy/parse DUMP_PATH FIELDS)\n\n[[...\n  {:foo [1 2 3 {:kek [true false nil]}]}\n  {:foo [1 2 3 {:kek [true false nil]}]}\n  ...]]\n~~~\n\nThe `set-cheshire` macro extends multimethods assuming you have Cheshire\ninstalled. Now the `parse` function, when facing json(b) types, will decode them\nproperly.\n\nThe `pg-bin.json` namespace provides the following macros:\n\n- `set-string`: parse json(b) types as strings again;\n- `set-cheshire`: parse using Cheshire;\n- `set-data-json`: parse using clojure.data.json;\n- `set-jsonista`: parse using Jsonista;\n- `set-charred`: parse using Charred;\n- `set-jsam`: parse using JSam.\n\nAll of them accept optional parameters that are passed into the underlying\nparsing function.\n\nPG.Bin doesn't introduce any JSON-related dependencies. Each macro assumes you\nhave added a required library into the classpath.\n\n## Metadata\n\nEach parsed line tracks its length in bytes, offset from the beginning of a file\n(or a stream) and a unique index:\n\n~~~clojure\n(-\u003e (copy/parse DUMP_PATH FIELDS)\n    first\n    meta)\n\n#:pg{:length 306, :index 0, :offset 19}\n~~~\n\nKnowing these values might help reading a dump by chunks.\n\n## Supported types\n\n- `:raw :bytea :bytes` for raw access and `bytea`\n- `:skip :_ nil` to skip a certain field\n- `:uuid` to parse UUIDs\n- `:int2 :short :smallint :smallserial` 2-byte integer (short)\n- `:int4 :int :integer :oid :serial` 4-byte integer (integer)\n- `:int8 :bigint :long :bigserial` 8-byte integer (long)\n- `:numeric :decimal` numeric type (becomes `BigDecimal`)\n- `:float4 :float :real` 4-byte float (float)\n- `:float8 :double :double-precision` 8-byte float (double)\n- `:boolean :bool` boolean\n- `:text :varchar :enum :name :string` text values\n- `:date` becomes `java.time.LocalDate`\n- `:time :time-without-time-zone` becomes `java.time.LocalTime`\n- `:timetz :time-with-time-zone` becomes `java.time.OffsetTime`\n- `:timestamp :timestamp-without-time-zone` becomes `java.time.LocalDateTime`\n- `:timestamptz :timestamp-with-time-zone` becomes `java.time.OffsetDateTime`\n\nPing me for more types, if needed.\n\n## On Writing\n\nAt the moment, the library only parses binary dumps. Writing them is possible\nyet requires extra work. Ping me if you really need writing binary files.\n\n## Scenarios\n\nWhy using this library ever? Imagine you have to fetch a mas-s-s-ive chunk of\nrows from a database, say 2-3 million to build a report. That might be an issue:\nyou don't want to saturate memory, neither you want to paginate using\nLIMIT/OFFSET as it's slow. A simple solution would be to dump the data you need\ninto a file and process it. You won't keep the database constantly busy as\nyou're working with a dump! Here is a small demo:\n\n~~~clojure\n(ns some.ns\n  (:require\n   [pg-bin.core :as copy]\n   [pg-bin.json :as json]))\n\n(defn make-copy-manager\n  \"\n  Build an instance of CopyManager from a connection.\n  \"\n  ^CopyManager [^Connection conn]\n  (new CopyManager (.unwrap conn BaseConnection)))\n\n(let [conn (jdbc/get-connection data-source)\n      mgr (make-copy-manager conn)\n      sql \"copy table_name(col1, col2...) to stdout with (format binary)\"\n      ;; you can use a query without parameters as well\n      sql \"copy (select... from... where...) to stdout with (format binary)\"\n      ]\n  (with-open [out (io/output-stream \"/path/to/dump.bin\")]\n    (.copyOut mgr sql out)))\n\n(with-open [in (io/input-stream \"/path/to/dump.bin\")]\n  (let [lines (copy/parse-seq in [:int2 :text ...])]\n    (doseq [line lines]\n      ...)))\n~~~\n\nAbove, we dump the data into a file and then process it. There is a way to\nprocess lines on the fly using another thread. The second demo:\n\n~~~clojure\n(let [conn\n      (jdbc/get-connection data-source)\n\n      mgr\n      (make-copy-manager conn)\n\n      sql\n      \"copy table_name(col1, col2...) to stdout with (format binary)\"\n\n      in\n      (new PipedInputStream)\n\n      started? (promise)\n\n      fut ;; a future to process the output\n      (future\n        (with-open [_ in] ;; must close it afterward\n          (deliver started? true) ;; must report we have started\n          (let [lines (copy/parse-seq in [:int2 :text ...])]\n            (doseq [line lines] ;; process on the fly\n              ;; without touching the disk\n              ...))))]\n\n  ;; ensure the future has started\n  @started?\n\n  ;; drain down to the piped output stream\n  (with-open [out (new PipedOutputStream in)]\n    (.copyOut mgr sql out))\n\n  @fut ;; wait for the future to complete\n  )\n~~~\n\n## Misc\n\n~~~\n©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©\nIvan Grishaev, 2025. © UNLICENSE ©\n©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©\n~~~\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Figrishaev%2Fpg-bin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Figrishaev%2Fpg-bin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Figrishaev%2Fpg-bin/lists"}