{"id":16641864,"url":"https://github.com/samthor/parq","last_synced_at":"2025-10-30T11:31:22.914Z","repository":{"id":216734500,"uuid":"633668773","full_name":"samthor/parq","owner":"samthor","description":"Parquet reader in JS","archived":false,"fork":false,"pushed_at":"2024-01-14T23:07:43.000Z","size":1502,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-02T08:11:21.407Z","etag":null,"topics":["javascript","parquet"],"latest_commit_sha":null,"homepage":"https://samthor.github.io/parq/","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/samthor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-04-28T02:43:23.000Z","updated_at":"2024-05-29T05:13:29.000Z","dependencies_parsed_at":"2024-01-12T11:29:28.531Z","dependency_job_id":"3649916f-757b-45c3-be2b-ce6a53787224","html_url":"https://github.com/samthor/parq","commit_stats":null,"previous_names":["samthor/parq"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samthor%2Fparq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samthor%2Fparq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samthor%2Fparq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samthor%2Fparq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/samthor","download_url":"https://codeload.github.com/samthor/parq/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238960280,"owners_count":19559235,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["javascript","parquet"],"created_at":"2024-10-12T07:48:07.216Z","updated_at":"2025-10-30T11:31:22.451Z","avatar_url":"https://github.com/samthor.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\nparq is a Parquet reader in JavaScript.\n[Install from NPM via \"parq\"](https://www.npmjs.com/package/parq).\n[Demo here](https://samthor.github.io/parq/).\n\n## Usage\n\nYou can build a reader and then iterate over its contents, yielding a `Uint8Array` for each value:\n\n```js\nimport { buildReader, flatIterate } from 'parq';\n\nconst bytes = /* Uint8Array from somewhere */;\nconst pr = await buildReader(bytes);\n\n// iterate over the data in rows 100-200 of column zero\nconst it = flatIterate(pr, 0, 100, 200);\n\nlet i = 100;\nfor await (const value of it) {\n  console.info(`col0 row${i}=`, value);\n  ++i;\n}\n```\n\nIt's a bit awkward to receive a `Uint8Array` per-value (you can use `DataView` to read its contents), but it matches how Parquet works: it has a variety of primitive data types _as well as_ the `BYTE_ARRAY` type which has variable length.\nThis type is usually used for UTF-8 encoded strings.\n\nTo find out what type is used per-column, check `pr.info().columns` for their name, type, and so on, before indexing.\n\n### Advanced Usage\n\nYou can access the low-level methods on `ParquetReader` to read raw page data directly.\nThese need a little bit of work to eventually render, but this means you can process the data more efficiently.\n\nYou can also pass a `Reader` implementation to `buildReader` instead of raw bytes.\nThis is a method which reads bytes in a specific range, useful if you are processing large files and don't want to read it from disk or network all at once.\n\n## Support\n\nThis is missing support for Parquet files that use:\n\n- data pages v2\n- compression codecs `LZO`, `BROTLI`, `LZ4`, `LZ4_RAW`\n- possibly complex nested schemas.\n\nIt supports compressions `SNAPPY`, `GZIP`, and `ZSTD` _via_ a dynamic import of the [zstddec](https://www.npmjs.com/package/zstddec) package.\nIf you need `ZSTD`, install \"ztsdec\" and instruct your bundler to use it.\n(I can see adding [brotli-wasm](https://www.npmjs.com/package/brotli-wasm) for `BROTLI` if it's needed in the same way.)\n\n## Demo\n\nThere's a simple demo [on GitHub Pages](https://samthor.github.io/parq/), with the source in [demo](./demo).\nThis uses a `Worker` to process Parquet data remotely, which means that this code can trivially handle gigabyte or more file sizes.\nIt implements a remote `ParquetReader` that connects to the worker.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamthor%2Fparq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsamthor%2Fparq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamthor%2Fparq/lists"}