{"id":14066627,"url":"https://github.com/ddotta/parquetize","last_synced_at":"2025-12-30T01:00:21.170Z","repository":{"id":63412669,"uuid":"562876945","full_name":"ddotta/parquetize","owner":"ddotta","description":"R package that allows to convert databases of different formats to parquet format","archived":false,"fork":false,"pushed_at":"2024-10-22T10:20:49.000Z","size":7918,"stargazers_count":73,"open_issues_count":6,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-20T07:53:33.757Z","etag":null,"topics":["conversion","convert","converter","csv","parquet","r","r-package","sas","spss","sqlite","stata"],"latest_commit_sha":null,"homepage":"https://ddotta.github.io/parquetize/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ddotta.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-07T12:54:53.000Z","updated_at":"2025-05-30T14:25:16.000Z","dependencies_parsed_at":"2024-02-19T18:36:19.812Z","dependency_job_id":"85802e5f-c23c-407b-bf7b-76b6f6d95236","html_url":"https://github.com/ddotta/parquetize","commit_stats":{"total_commits":292,"total_committers":4,"mean_commits":73.0,"dds":0.03767123287671237,"last_synced_commit":"f0e77f7e858c3a9df1d1dc8aa9b921f77c69673e"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/ddotta/parquetize","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddotta%2Fparquetize","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddotta%2Fparquetize/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddotta%2Fparquetize/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddotta%2Fparquetize/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ddotta","download_url":"https://codeload.github.com/ddotta/parquetize/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddotta%2Fparquetize/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266516322,"owners_count":23941396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conversion","convert","converter","csv","parquet","r","r-package","sas","spss","sqlite","stata"],"created_at":"2024-08-13T07:05:11.532Z","updated_at":"2025-12-30T01:00:16.118Z","avatar_url":"https://github.com/ddotta.png","language":"R","readme":"\u003c!-- badges: start --\u003e\n![GitHub top\nlanguage](https://img.shields.io/github/languages/top/ddotta/parquetize)\n[![version](http://www.r-pkg.org/badges/version/parquetize)](https://CRAN.R-project.org/package=parquetize)\n[![cranlogs](http://cranlogs.r-pkg.org/badges/parquetize)](https://CRAN.R-project.org/package=parquetize)\n[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/parquetize?color=brightgreen)](https://cran.r-project.org/package=parquetize)\n[![R check\nstatus](https://github.com/ddotta/parquetize/workflows/R-CMD-check/badge.svg)](https://github.com/ddotta/parquetize/actions/workflows/check-release.yaml)\n[![codecov](https://codecov.io/gh/ddotta/parquetize/branch/main/graph/badge.svg?token=25MHI8O62M)](https://app.codecov.io/gh/ddotta/parquetize)\n[![CodeFactor](https://www.codefactor.io/repository/github/ddotta/parquetize/badge)](https://www.codefactor.io/repository/github/ddotta/parquetize)\n\u003c!-- badges: end --\u003e\n\n:package: Package `parquetize` \u003cimg src=\"man/figures/hex_parquetize.png\" width=110 align=\"right\"/\u003e\n======================================\n\nR package that allows to convert databases of different formats (csv, SAS, SPSS, Stata, rds, sqlite, JSON, ndJSON) to [parquet](https://parquet.apache.org/) format in a same function.\n\n## Installation\n\nTo install `parquetize` from CRAN :  \n\n``` r\ninstall.packages(\"parquetize\")\n```\n\nOr alternatively to install the development version from GitHub :  \n\n``` r\nremotes::install_github(\"ddotta/parquetize\")\n```\n\nThen to load it :  \n\n``` r\nlibrary(parquetize)\n```\n\n## Why this package ?\n\nThis package is a simple wrapper of some very useful functions from the [haven](https://github.com/tidyverse/haven), [readr](https://github.com/tidyverse/readr/), [jsonlite](https://github.com/jeroen/jsonlite), [RSQLite](https://github.com/r-dbi/RSQLite) and [arrow](https://github.com/apache/arrow) packages.\n\nWhile working, I realized that I was often repeating the same operation when working with parquet files : \n\n- I import the file in R with {haven}, {jsonlite}, {readr}, {DBI} or {RSQLite}.\n- And I export the file in parquet format\n\nAs a fervent of the DRY principle (don't repeat yourself) the exported functions of this package make my life easier and **execute these operations within the same function**.  \n\n**The last benefit** of using package `{parquetize}` is that its functions allow to create single parquet files or partitioned files depending on the arguments chosen in the functions.\n\n- [csv_to_parquet()](https://ddotta.github.io/parquetize/reference/csv_to_parquet.html)\n    - **The other benefit of this function** is that it allows you to convert csv or txt files whether they are stored locally or available on the internet directly to csv/txt format or inside a zip.\n- [json_to_parquet()](https://ddotta.github.io/parquetize/reference/json_to_parquet.html)\n    - **The other benefit of this function** is that it handles JSON and ndJSON files in a same function. There is only one function to use for these 2 cases.  \n- [rds_to_parquet()](https://ddotta.github.io/parquetize/reference/rds_to_parquet.html)  \n- [fst_to_parquet()](https://ddotta.github.io/parquetize/reference/fst_to_parquet.html)  \n- [table_to_parquet()](https://ddotta.github.io/parquetize/reference/table_to_parquet.html)\n    - **The other benefit of this function** is that it handles SAS, SPSS and Stata files in a same function. There is only one function to use for these 3 cases. To avoid overcharging R's RAM for huge table, the conversion can be done by chunk. For more information, see [here](https://ddotta.github.io/parquetize/articles/aa-conversions.html)\n- [sqlite_to_parquet()](https://ddotta.github.io/parquetize/reference/sqlite_to_parquet.html)\n- [dbi_to_parquet()](https://ddotta.github.io/parquetize/reference/dbi_to_parquet.html)\n\n    \nFor more details, see the examples associated with each function in the documentation.  \n\n## Example\n\nYou want to use the Insee file of first names by birth department? Use R and {parquetize} package that takes care of everything: it downloads the data (3.7 million rows) and converts it to parquet format in few seconds ! \n\n\u003cimg src=\"man/figures/Insee_example_csv.gif\" width=\"100%\" /\u003e\n\n## Contribution\n\nFeel welcome to contribute to add features that you find useful in your daily work.  \nIdeas are welcomed in [the issues](https://github.com/ddotta/parquetize/issues).\n","funding_links":[],"categories":["R"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddotta%2Fparquetize","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fddotta%2Fparquetize","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddotta%2Fparquetize/lists"}