{"id":13423646,"url":"https://github.com/hannes/miniparquet","last_synced_at":"2025-03-15T17:32:03.743Z","repository":{"id":146813393,"uuid":"200883921","full_name":"hannes/miniparquet","owner":"hannes","description":"Library to read a subset of Parquet files","archived":true,"fork":false,"pushed_at":"2020-02-13T19:59:08.000Z","size":497,"stargazers_count":43,"open_issues_count":6,"forks_count":7,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-05-20T06:09:31.001Z","etag":null,"topics":["cpp","cpp11","dependency-free","parquet","parquet-cpp","parquet-files"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hannes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-08-06T16:03:34.000Z","updated_at":"2024-04-21T13:54:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"ffb9e90b-bf2f-47c5-9452-c322bab424e8","html_url":"https://github.com/hannes/miniparquet","commit_stats":null,"previous_names":["hannesmuehleisen/miniparquet"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hannes%2Fminiparquet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hannes%2Fminiparquet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hannes%2Fminiparquet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hannes%2Fminiparquet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hannes","download_url":"https://codeload.github.com/hannes/miniparquet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243767077,"owners_count":20344868,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","cpp11","dependency-free","parquet","parquet-cpp","parquet-files"],"created_at":"2024-07-31T00:00:39.693Z","updated_at":"2025-03-15T17:32:03.219Z","avatar_url":"https://github.com/hannes.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"# miniparquet\n[![Travis](https://api.travis-ci.org/hannesmuehleisen/miniparquet.svg?branch=master)](https://travis-ci.org/hannesmuehleisen/miniparquet)\n[![CRAN\nstatus](https://www.r-pkg.org/badges/version/miniparquet)](https://cran.r-project.org/package=miniparquet)\n[![](http://cranlogs.r-pkg.org/badges/miniparquet)](https://dgrtwo.shinyapps.io/cranview/)\n\n`miniparquet` is a reader for a common subset of Parquet files. miniparquet only supports rectangular-shaped data structures (no nested tables) and only the Snappy compression scheme. miniparquet has no (zero, none, 0) [external dependencies](https://research.swtch.com/deps) and is very lightweight. It compiles in seconds to a binary size of under 1 MB. \n\n## Installation\nMiniparquet comes as C++ library, a Python package and a R package. Install the R package like so:\n\n`devtools::install_github(\"hannesmuehleisen/miniparquet\")` \n\nThe C++ library can be built by typing `make`.\n\nThe Python package is installed using `python setup.py install`\n\n\n## Usage\nUse the R package like so: `df \u003c- miniparquet::parquet_read(\"example.parquet\")` \n\nFolders of similar-structured Parquet files (e.g. produced by Spark) can be read like this: \n\n`df \u003c- data.table::rbindlist(lapply(Sys.glob(\"some-folder/part-*.parquet\"), miniparquet::parquet_read))`\n\nIf you find a file that should be supported but isn't, please open an issue here with a link to the file. \n\nUse the Python package like so: `miniparquet.read('example.parquet')`. You can convert the result to a Pandas dataframe like so: `pandas.DataFrame.from_dict(miniparquet.read('example.parquet'))`\n\n\n## Performance\n`miniparquet` is quite fast, on my laptop (I7-4578U) it can read compressed Parquet files at over 200 MB/s using only a single thread. Previously, there was a comparision with the arrow package here, but it appeared that results were caused by a bug which is fixed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhannes%2Fminiparquet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhannes%2Fminiparquet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhannes%2Fminiparquet/lists"}