Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with parquet

A curated list of projects in awesome lists tagged with parquet .

https://github.com/multiprocessio/dsq

Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

cli csv excel golang json openoffice-calc parquet sql tsv

Last synced: 19 Dec 2024

https://github.com/roapi/roapi

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets

Last synced: 30 Oct 2024

https://github.com/apache/parquet-java

Apache Parquet Java

apache parquet parquet-java

Last synced: 17 Dec 2024

https://github.com/apache/arrow-rs

Official Rust implementation of Apache Arrow

arrow object-store parquet rust

Last synced: 16 Dec 2024

https://github.com/apache/drill

Apache Drill is a distributed MPP query layer for self describing data

big-data drill hadoop hive java jdbc parquet sql

Last synced: 17 Dec 2024

https://github.com/apache/parquet-format

Apache Parquet Format

apache parquet parquet-format

Last synced: 16 Dec 2024

https://github.com/uber/petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

deep-learning machine-learning parquet parquet-files pyarrow pyspark pytorch sysml tensorflow

Last synced: 18 Dec 2024

https://github.com/gchq/gaffer

A large-scale entity and relation database supporting aggregation of properties

accumulo aggregation big-data graph graph-database hadoop hbase parquet spark

Last synced: 17 Dec 2024

https://github.com/gchq/Gaffer

A large-scale entity and relation database supporting aggregation of properties

accumulo aggregation big-data graph graph-database hadoop hbase parquet spark

Last synced: 13 Nov 2024

https://github.com/rilldata/rill

Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.

bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit

Last synced: 17 Dec 2024

https://github.com/quiltdata/quilt

Quilt is a data mesh for connecting people with actionable data

data data-engineering data-version-control data-versioning parquet python serialization

Last synced: 17 Dec 2024

https://github.com/paradigmxyz/cryo

cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes

crypto ethereum evm parquet rust

Last synced: 21 Dec 2024

https://github.com/bigdatagenomics/adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

avro big-data bioinformatics genomics java parquet python r scala spark

Last synced: 16 Dec 2024

https://github.com/cinchoo/choetl

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml

Last synced: 20 Dec 2024

https://github.com/mukunku/parquetviewer

Simple Windows desktop application for viewing & querying Apache Parquet files

apache-parquet big-data dot-net parquet windows-desktop

Last synced: 20 Dec 2024

https://github.com/HariSekhon/DevOps-Python-tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

avro aws cloudformation devops docker dockerhub elasticsearch gcf gcp hadoop hbase hdfs json linux parquet pyspark python solr spark travis-ci

Last synced: 07 Nov 2024

https://github.com/Cinchoo/ChoETL

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml

Last synced: 26 Oct 2024

https://github.com/DerwenAI/kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.

graph-algorithms graph-libraries graph-thinking inference json-ld knowledge-graph networkx owl pandas parquet probabilistic-soft-logic python-igraph pyvis r2rml-mapping rapids rdflib roamresearch shacl skos sparql

Last synced: 12 Nov 2024

https://github.com/randomfractals/vscode-data-preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml

Last synced: 21 Dec 2024

https://github.com/ranaroussi/pystore

Fast data store for Pandas time-series data

dask database dataframe datastore pandas parquet timeseries

Last synced: 02 Nov 2024

https://github.com/RandomFractals/vscode-data-preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml

Last synced: 06 Nov 2024

https://github.com/kylebarron/parquet-wasm

Rust-based WebAssembly bindings to read and write Apache Parquet data

apache-arrow apache-parquet arrow javascript parquet rust wasm webassembly

Last synced: 21 Dec 2024

https://github.com/julien040/anyquery

Query anything (JSON, Salesforce, GitHub, etc.) with SQL and visualize your data with any MySQL-compatible BI tool.

airtable analytics api business-intelligence csv data-visualization database github go json migration mysql notion parquet pql prql salesforce sql sqlite

Last synced: 21 Dec 2024

https://github.com/apache/parquet-cpp

Apache Parquet

big-data java parquet

Last synced: 01 Oct 2024

https://github.com/moshe/elasticsearch_loader

A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

csv elasticsearch elasticsearch-loader json logstash parquet python

Last synced: 17 Nov 2024

https://github.com/skale-me/skale

High performance distributed data processing engine

aws-s3 azure-storage cluster machine-learning nodejs parquet skale

Last synced: 30 Oct 2024

https://github.com/crunchydata/pg_parquet

Copy to/from Parquet in S3 from within PostgreSQL

columnar data-ingestion data-migration parquet postgresql

Last synced: 16 Dec 2024

https://github.com/jorgecarleitao/parquet2

Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow

parallelism parquet rust safe

Last synced: 18 Dec 2024

https://github.com/ironsource/parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format

javascript nodejs parquet

Last synced: 21 Dec 2024

https://github.com/spotify/ratatool

A tool for data sampling, data generation, and data diffing

avro bigquery parquet protobuf scala scalacheck

Last synced: 21 Dec 2024

https://github.com/sksamuel/centurion

Kotlin Bigdata Toolkit

bigdata java kotlin orc parquet

Last synced: 15 Dec 2024

https://github.com/manojkarthick/pqrs

Command line tool for inspecting Parquet files

arrow parquet rust

Last synced: 20 Dec 2024

https://github.com/cldellow/sqlite-parquet-vtable

A SQLite vtable extension to read Parquet files

apache-arrow apache-parquet parquet sqlite sqlite3

Last synced: 18 Dec 2024

https://github.com/awslabs/amazon-s3-find-and-forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

amazon-s3 aws big-data ccpa data data-erasure data-lake gdpr parquet privacy right-to-be-forgotten s3

Last synced: 05 Nov 2024

https://github.com/hyparam/hyparquet

parquet file parser for javascript

javascript js parquet parquetjs parser snappy thrift

Last synced: 15 Dec 2024

https://github.com/CrunchyData/pg_parquet

Copy to/from Parquet in S3 from within PostgreSQL

columnar data-ingestion data-migration parquet postgresql

Last synced: 20 Oct 2024

https://github.com/pacman82/odbc2parquet

A command line tool to query an ODBC data source and write the result into a parquet file.

odbc parquet

Last synced: 20 Dec 2024

https://github.com/scikit-hep/awkward-0.x

Manipulate arrays of complex data structures as easily as Numpy.

analysis apache-arrow arrow big-data columnar columnar-storage hdf5 numpy parquet python python3 root root-cern scikit-hep

Last synced: 28 Sep 2024

https://github.com/chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api

Last synced: 19 Dec 2024

https://github.com/Chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api

Last synced: 11 Nov 2024

https://github.com/metrico/quackpipe

QuackPipe is an OLAP API built on top of DuckDB with ClickHouse compatibility bits

api clickhouse clickhouse-server csv database duckdb duckdb-api duckdb-engine gigapipe golang lambda lambda-functions olap parquet qryn rest-api s3 server sql

Last synced: 21 Dec 2024

https://github.com/g-research/parquetsharp

ParquetSharp is a .NET library for reading and writing Apache Parquet files.

apache-arrow apache-parquet big-data columnar-storage csharp dotnet parquet

Last synced: 15 Dec 2024

https://github.com/ktrueda/parquet-tools

easy install parquet-tools

cli parquet parquet-tools

Last synced: 15 Nov 2024

https://github.com/DeepRec-AI/HybridBackend

A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster

deep-learning gpu hybrid-parallelism parquet recommender-system

Last synced: 11 Nov 2024

https://github.com/deeprec-ai/hybridbackend

A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster

deep-learning gpu hybrid-parallelism parquet recommender-system

Last synced: 11 Nov 2024

https://github.com/sunchao/parquet-rs

Apache Parquet implementation in Rust

hadoop parquet rust

Last synced: 25 Nov 2024

https://github.com/l1xnan/duckling

A fast viewer for CSV/Parquet files and databases such as DuckDB, SQLite, PostgreSQL, MySQL, Clickhouse, etc., base on Tauri

clickhouse duckdb mysql parquet postgresql rust sqlite tauri

Last synced: 18 Nov 2024

https://github.com/juliaio/parquet.jl

Julia implementation of Parquet columnar file format reader

columnar-storage julia parquet

Last synced: 20 Dec 2024

https://github.com/indix/schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

avro graphql-api json parquet schema-inference schema-registry spark tsv

Last synced: 11 Oct 2024

https://github.com/maxcountryman/warc-parquet

🗄️ A simple CLI for converting WARC to Parquet.

crawling duckdb parquet warc web-archiving

Last synced: 18 Dec 2024

https://github.com/parsyl/parquet

A library for reading and writing parquet files.

dremel golang parquet reader writer

Last synced: 26 Oct 2024

https://github.com/saurfang/sparksql-protobuf

Read SparkSQL parquet file as RDD[Protobuf]

parquet protobuf sparksql

Last synced: 07 Nov 2024

https://github.com/igor-suhorukov/openstreetmap_h3

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps

apach-sedona apache-arrow apache-spark arrow citusdb column-store converter duckdb geometry-processing geospatial java openstreetmap parquet parquet-files pbf pbf-format postgis postgresql world

Last synced: 17 Dec 2024

https://github.com/spotify/gcs-tools

GCS support for avro-tools, parquet-tools and protobuf

avro gcp gcs gcs-connector google-storage parquet protobuf

Last synced: 17 Dec 2024

https://github.com/coady/graphique

GraphQL service for arrow tables and parquet data sets.

arrow graphql parquet

Last synced: 18 Nov 2024

https://github.com/ddotta/parquetize

R package that allows to convert databases of different formats to parquet format

conversion convert converter csv parquet r r-package sas spss sqlite stata

Last synced: 04 Dec 2024

https://github.com/xiangpenghao/parquet-viewer

View parquet files online

parquet parquet-viewer rust webassembly

Last synced: 20 Dec 2024

https://github.com/cldellow/csv2parquet

Convert a CSV to a parquet file.

apache-arrow apache-parquet csv parquet

Last synced: 20 Dec 2024

https://github.com/exyi/pg2parquet

Export PostgreSQL table or query into Parquet file

parquet postgres postgresql

Last synced: 21 Dec 2024

https://github.com/zsvoboda/dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx

Last synced: 12 Oct 2024

https://github.com/mattf96s/QuackDB

Open-source in-browser DuckDB SQL editor

apache-arrow comlink duckdb duckdb-wasm parquet remix remix-run shadcn sql sst

Last synced: 19 Nov 2024

https://github.com/apache/parquet-testing

Apache Parquet Testing

apache parquet parquet-testing

Last synced: 15 Dec 2024

https://github.com/dask-contrib/dask-deltatable

A Delta Lake reader for Dask

dask dask-dataframes delta-lake parquet python

Last synced: 16 Dec 2024

https://github.com/hannes/miniparquet

Library to read a subset of Parquet files

cpp cpp11 dependency-free parquet parquet-cpp parquet-files

Last synced: 26 Oct 2024

https://github.com/cldellow/datasette-parquet

Add DuckDB, Parquet, CSV and JSON lines support to Datasette

datasette datasette-plugin duckdb parquet

Last synced: 01 Nov 2024

https://github.com/dbiir/paraflow

A real-time analytical system for ID-associated data

hadoop kafka orc parquet presto spark-sql

Last synced: 21 Nov 2024

https://github.com/randomfractals/chicago-crimes

Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.

chicago crimes duckdb julia jupyter-notebooks large-csv malloy malloydata parquet polars pyarrow

Last synced: 28 Oct 2024

https://github.com/ak-coram/cl-duckdb

Common Lisp CFFI wrapper around the DuckDB C API

c-bindings common-lisp data-science duckdb lisp olap parquet sql

Last synced: 13 Nov 2024

https://github.com/devinrsmith/deephaven-parquet-viewer

A browser-based Parquet file viewer

bigdata docker parquet parquet-viewer

Last synced: 27 Nov 2024

https://github.com/timkpaine/perspective-parquet

Parquet file reader and editor in Jupyterlab, built with `perspective` for pivoting, filtering, aggregating, etc

data-science data-visualization datavisualization dataviz jupyter jupyterlab jupyterlab-extension jupyterlab-extensions parquet parquet-viewer perspective pivot-tables

Last synced: 27 Oct 2024

https://github.com/agile-lab-dev/wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

akka elasticsearch hadoop hbase hdfs jdbc kafka parquet scala solr spark spark-streaming yarn

Last synced: 17 Dec 2024

https://github.com/nevillelyh/parquet-extra

A collection of Apache Parquet add-on modules

avro magnolia parquet scala scala-macros tensorflow

Last synced: 12 Nov 2024

https://github.com/KxSystems/arrowkdb

kdb+ integration with Apache Arrow and Parquet

arrow kdb parquet q

Last synced: 12 Nov 2024

https://github.com/recordevolution/imctermite

Enables extraction of measurement data from binary files with extension 'raw' used by proprietary software imcFAMOS/imcSTUDIO and facilitates its storage in open source file formats

binary csv fileformat imccronos imcfamos imcstudio measurement parquet raw time-series

Last synced: 17 Dec 2024

https://github.com/kxsystems/arrowkdb

kdb+ integration with Apache Arrow and Parquet

arrow kdb parquet q

Last synced: 07 Nov 2024

https://github.com/mcaceresb/stata-parquet

Read and write parquet files from Stata

arrow parquet stata

Last synced: 28 Oct 2024

https://github.com/g-research/parquetsharp.dataframe

ParquetSharp.DataFrame is a .NET library for reading and writing Apache Parquet files into/from .NET DataFrames, using ParquetSharp

big-data csharp dataframe dotnet parquet

Last synced: 17 Nov 2024

https://github.com/mishmash-io/opentelemetry-server-embedded

An OpenTelemetry logs, metrics, traces and profiles collector that can be embedded in other systems as a data source.

apache druid java opentelemetry otlp parquet vertx

Last synced: 18 Oct 2024

https://github.com/hrbrmstr/sparrow

Temporary Shorcut For Reading Arrow/Parquet Bits Into R via 'reticulate'

arrow pandas-dataframe parquet r rstats

Last synced: 11 Oct 2024

https://github.com/exasol/parquet-io-java

Java library to read Parquet files.

exasol exasol-integration foundation-library java parquet

Last synced: 14 Nov 2024

https://github.com/amoeba/qlarrow

WIP QuickLook plugin for Apache Arrow and Parquet

apache-arrow golang macos parquet quicklook

Last synced: 09 Nov 2024

https://github.com/pbotros/river

A high-throughput, structured streaming framework built atop Redis Streams. C++, Python, and MATLAB support.

cpp iot matlab parquet python redis stream-processing streaming

Last synced: 02 Nov 2024

https://github.com/wukan1986/ddump

数据转存工具

download jqdatasdk parquet tushare wind

Last synced: 23 Nov 2024

https://github.com/civitaspo/embulk-output-s3_parquet

Embulk (https://github.com/embulk/embulk/) output plugin to dump records as Apache Parquet (https://parquet.apache.org/) files on S3.

embulk embulk-output-plugin parquet s3

Last synced: 28 Oct 2024

https://github.com/childmindresearch/bids2table

Efficiently index large-scale BIDS neuroimaging datasets and derivatives

arrow bids data-pipeline elt etl neuroimaging parquet

Last synced: 01 Nov 2024

https://github.com/openbridge/ob_datastash

Stream your CSV files to an HTTP API

aws bigquery csv csv-files logstash parquet redshift

Last synced: 14 Nov 2024

https://github.com/juliageo/geoparquet.jl

Geospatial data in Parquet files

geo gis io julia parquet

Last synced: 21 Nov 2024