{"id":30524004,"url":"https://github.com/intake/intake-parquet","last_synced_at":"2025-12-11T21:03:24.252Z","repository":{"id":48870067,"uuid":"116419990","full_name":"intake/intake-parquet","owner":"intake","description":"Parquet plugin for Intake","archived":false,"fork":false,"pushed_at":"2023-12-15T15:17:27.000Z","size":135,"stargazers_count":11,"open_issues_count":6,"forks_count":15,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-06-19T00:29:24.720Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://intake-parquet.readthedocs.io/en/latest/?badge=latest","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/intake.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-01-05T19:34:47.000Z","updated_at":"2023-12-04T06:56:50.000Z","dependencies_parsed_at":"2023-12-18T04:44:18.931Z","dependency_job_id":null,"html_url":"https://github.com/intake/intake-parquet","commit_stats":{"total_commits":49,"total_committers":10,"mean_commits":4.9,"dds":0.2857142857142857,"last_synced_commit":"f7a83a2e561fca1de8fcbc0ff06bb8ae04937116"},"previous_names":["continuumio/intake-parquet"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/intake/intake-parquet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intake%2Fintake-parquet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intake%2Fintake-parquet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intake%2Fintake-parquet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intake%2Fintake-parquet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/intake","download_url":"https://codeload.github.com/intake/intake-parquet/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intake%2Fintake-parquet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272254545,"owners_count":24901064,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-26T02:00:07.904Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-26T20:52:25.338Z","updated_at":"2025-12-11T21:03:23.776Z","avatar_url":"https://github.com/intake.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Intake-parquet\n\n[![Build Status](https://travis-ci.org/ContinuumIO/intake-parquet.svg?branch=master)](https://travis-ci.org/ContinuumIO/intake-parquet)\n[![Documentation Status](https://readthedocs.org/projects/intake-parquet/badge/?version=latest)](http://intake-parquet.readthedocs.io/en/latest/?badge=latest)\n\n[Intake data loader](https://github.com/ContinuumIO/intake/) interface to the parquet binary tabular data format.\n\nParquet is very popular in the big-data ecosystem, because it provides columnar\nand chunk-wise access to the data, with efficient encodings and compression. This makes\nthe format particularly effective for streaming through large subsections of even\nlarger data-sets, hence it's common use with Hadoop and Spark.\n\nParquet data may be single files, directories of files, or nested directories, where\nthe directory names are meaningful in the partitioning of the data.\n\n### Features\n\nThe parquet plugin allows for:\n\n- efficient metadata parsing, so you know the data types and number of records without\n  loading any data\n- random access of partitions\n- column and index selection, load only the data you need\n- passing of value-based filters, that you only load those partitions containing some\n  valid data (NB: does not filter the values within a partition)\n\n### Installation\n\nThe conda install instructions are:\n\n```\nconda install -c conda-forge intake-parquet\n```\n\n### Examples\n\nSee the notebook in the examples/ directory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintake%2Fintake-parquet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintake%2Fintake-parquet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintake%2Fintake-parquet/lists"}