Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with parquet
A curated list of projects in awesome lists tagged with parquet .
https://github.com/multiprocessio/dsq
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
cli csv excel golang json openoffice-calc parquet sql tsv
Last synced: 19 Dec 2024
https://github.com/roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets
Last synced: 30 Oct 2024
https://github.com/apache/arrow-rs
Official Rust implementation of Apache Arrow
arrow object-store parquet rust
Last synced: 16 Dec 2024
https://github.com/dathere/qsv
Blazing-fast Data-Wrangling toolkit
ckan cli csv data-engineering data-wrangling dcat excel geocode luau metadata opendata parquet polars postgresql snappy sql sqlite statistics timeseries
Last synced: 16 Dec 2024
https://github.com/jqnatividad/qsv
Blazing-fast Data-Wrangling toolkit
ckan cli csv data-engineering data-wrangling dcat excel geocode luau metadata opendata parquet polars postgresql snappy sql sqlite statistics timeseries
Last synced: 25 Nov 2024
https://github.com/uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
deep-learning machine-learning parquet parquet-files pyarrow pyspark pytorch sysml tensorflow
Last synced: 18 Dec 2024
https://github.com/gchq/gaffer
A large-scale entity and relation database supporting aggregation of properties
accumulo aggregation big-data graph graph-database hadoop hbase parquet spark
Last synced: 17 Dec 2024
https://github.com/gchq/Gaffer
A large-scale entity and relation database supporting aggregation of properties
accumulo aggregation big-data graph graph-database hadoop hbase parquet spark
Last synced: 13 Nov 2024
https://github.com/rilldata/rill
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
bi business-analytics csv data data-analysis data-visualization dataviz duckdb gcs golang parquet parquet-tools parquet-viewer s3 sql sql-editor svelte sveltejs sveltekit
Last synced: 17 Dec 2024
https://github.com/quiltdata/quilt
Quilt is a data mesh for connecting people with actionable data
data data-engineering data-version-control data-versioning parquet python serialization
Last synced: 17 Dec 2024
https://github.com/tonbo-io/tonbo
A portable embedded database using Arrow.
arrow big-data database embedded-database htap lsm-tree offline-first parquet rust store-engine
Last synced: 08 Dec 2024
https://github.com/cinchoo/choetl
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml
Last synced: 20 Dec 2024
https://github.com/mukunku/parquetviewer
Simple Windows desktop application for viewing & querying Apache Parquet files
apache-parquet big-data dot-net parquet windows-desktop
Last synced: 20 Dec 2024
https://github.com/HariSekhon/DevOps-Python-tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
avro aws cloudformation devops docker dockerhub elasticsearch gcf gcp hadoop hbase hdfs json linux parquet pyspark python solr spark travis-ci
Last synced: 07 Nov 2024
https://github.com/Cinchoo/ChoETL
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml
Last synced: 26 Oct 2024
https://github.com/developmentseed/lonboard
A Python library for fast, interactive geospatial vector data visualization in Jupyter.
anywidget apache-arrow apache-parquet data-visualization deck-gl geoarrow geopandas geoparquet geospatial geospatial-analysis jupyter jupyter-widget longboard map-visualization maps parquet python visualization webgl
Last synced: 17 Dec 2024
https://github.com/DerwenAI/kglab
Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
graph-algorithms graph-libraries graph-thinking inference json-ld knowledge-graph networkx owl pandas parquet probabilistic-soft-logic python-igraph pyvis r2rml-mapping rapids rdflib roamresearch shacl skos sparql
Last synced: 12 Nov 2024
https://github.com/randomfractals/vscode-data-preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml
Last synced: 21 Dec 2024
https://github.com/ranaroussi/pystore
Fast data store for Pandas time-series data
dask database dataframe datastore pandas parquet timeseries
Last synced: 02 Nov 2024
https://github.com/RandomFractals/vscode-data-preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
array arrow avro config csv data excel extension json parquet perspective viewer vscode yaml
Last synced: 06 Nov 2024
https://github.com/kylebarron/parquet-wasm
Rust-based WebAssembly bindings to read and write Apache Parquet data
apache-arrow apache-parquet arrow javascript parquet rust wasm webassembly
Last synced: 21 Dec 2024
https://github.com/julien040/anyquery
Query anything (JSON, Salesforce, GitHub, etc.) with SQL and visualize your data with any MySQL-compatible BI tool.
airtable analytics api business-intelligence csv data-visualization database github go json migration mysql notion parquet pql prql salesforce sql sqlite
Last synced: 21 Dec 2024
https://github.com/paradedb/pg_analytics
DuckDB-powered analytics for Postgres
analytics arrow big-data columnar database datafusion datalake deltalake duckdb iceberg lakehouse lakehouse-platform object-storage olap paradedb parquet postgres postgresql realtime-analytics sql
Last synced: 21 Dec 2024
https://github.com/moshe/elasticsearch_loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
csv elasticsearch elasticsearch-loader json logstash parquet python
Last synced: 17 Nov 2024
https://github.com/skale-me/skale
High performance distributed data processing engine
aws-s3 azure-storage cluster machine-learning nodejs parquet skale
Last synced: 30 Oct 2024
https://github.com/crunchydata/pg_parquet
Copy to/from Parquet in S3 from within PostgreSQL
columnar data-ingestion data-migration parquet postgresql
Last synced: 16 Dec 2024
https://github.com/jorgecarleitao/parquet2
Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Last synced: 18 Dec 2024
https://github.com/ironsource/parquetjs
fully asynchronous, pure JavaScript implementation of the Parquet file format
Last synced: 21 Dec 2024
https://github.com/spotify/ratatool
A tool for data sampling, data generation, and data diffing
avro bigquery parquet protobuf scala scalacheck
Last synced: 21 Dec 2024
https://github.com/apecloud/myduckserver
MySQL & Postgres Analytics, Reimagined
analytics arrow business-analytics business-intelligence columnar-storage data-engineering data-science database duckdb htap mariadb mysql olap pandas parquet polars postgres replication sql zero-etl
Last synced: 20 Dec 2024
https://github.com/manojkarthick/pqrs
Command line tool for inspecting Parquet files
Last synced: 20 Dec 2024
https://github.com/cldellow/sqlite-parquet-vtable
A SQLite vtable extension to read Parquet files
apache-arrow apache-parquet parquet sqlite sqlite3
Last synced: 18 Dec 2024
https://github.com/awslabs/amazon-s3-find-and-forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
amazon-s3 aws big-data ccpa data data-erasure data-lake gdpr parquet privacy right-to-be-forgotten s3
Last synced: 05 Nov 2024
https://github.com/hyparam/hyparquet
parquet file parser for javascript
javascript js parquet parquetjs parser snappy thrift
Last synced: 15 Dec 2024
https://github.com/CrunchyData/pg_parquet
Copy to/from Parquet in S3 from within PostgreSQL
columnar data-ingestion data-migration parquet postgresql
Last synced: 20 Oct 2024
https://github.com/pacman82/odbc2parquet
A command line tool to query an ODBC data source and write the result into a parquet file.
Last synced: 20 Dec 2024
https://github.com/scikit-hep/awkward-0.x
Manipulate arrays of complex data structures as easily as Numpy.
analysis apache-arrow arrow big-data columnar columnar-storage hdf5 numpy parquet python python3 root root-cern scikit-hep
Last synced: 28 Sep 2024
https://github.com/chabane/bigdata-playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api
Last synced: 19 Dec 2024
https://github.com/Chabane/bigdata-playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api
Last synced: 11 Nov 2024
https://github.com/metrico/quackpipe
QuackPipe is an OLAP API built on top of DuckDB with ClickHouse compatibility bits
api clickhouse clickhouse-server csv database duckdb duckdb-api duckdb-engine gigapipe golang lambda lambda-functions olap parquet qryn rest-api s3 server sql
Last synced: 21 Dec 2024
https://github.com/pgspider/parquet_s3_fdw
ParquetS3 Foreign Data Wrapper for PostgresSQL
fdw foreign-data-wrapper foreign-tables parquet parquets3-fdw postgresql postgresql-extension s3
Last synced: 04 Dec 2024
https://github.com/g-research/parquetsharp
ParquetSharp is a .NET library for reading and writing Apache Parquet files.
apache-arrow apache-parquet big-data columnar-storage csharp dotnet parquet
Last synced: 15 Dec 2024
https://github.com/spotify/magnolify
A collection of Magnolia add-on modules
avro bigquery bigtable cats datastore guava magnolia neo4j parquet protobuf scala scalacheck tensorflow
Last synced: 15 Dec 2024
https://github.com/DeepRec-AI/HybridBackend
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
deep-learning gpu hybrid-parallelism parquet recommender-system
Last synced: 11 Nov 2024
https://github.com/deeprec-ai/hybridbackend
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
deep-learning gpu hybrid-parallelism parquet recommender-system
Last synced: 11 Nov 2024
https://github.com/sunchao/parquet-rs
Apache Parquet implementation in Rust
Last synced: 25 Nov 2024
https://github.com/l1xnan/duckling
A fast viewer for CSV/Parquet files and databases such as DuckDB, SQLite, PostgreSQL, MySQL, Clickhouse, etc., base on Tauri
clickhouse duckdb mysql parquet postgresql rust sqlite tauri
Last synced: 18 Nov 2024
https://github.com/juliaio/parquet.jl
Julia implementation of Parquet columnar file format reader
columnar-storage julia parquet
Last synced: 20 Dec 2024
https://github.com/indix/schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
avro graphql-api json parquet schema-inference schema-registry spark tsv
Last synced: 11 Oct 2024
https://github.com/maxcountryman/warc-parquet
🗄️ A simple CLI for converting WARC to Parquet.
crawling duckdb parquet warc web-archiving
Last synced: 18 Dec 2024
https://github.com/saurfang/sparksql-protobuf
Read SparkSQL parquet file as RDD[Protobuf]
Last synced: 07 Nov 2024
https://github.com/igor-suhorukov/openstreetmap_h3
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
apach-sedona apache-arrow apache-spark arrow citusdb column-store converter duckdb geometry-processing geospatial java openstreetmap parquet parquet-files pbf pbf-format postgis postgresql world
Last synced: 17 Dec 2024
https://github.com/spotify/gcs-tools
GCS support for avro-tools, parquet-tools and protobuf
avro gcp gcs gcs-connector google-storage parquet protobuf
Last synced: 17 Dec 2024
https://github.com/coady/graphique
GraphQL service for arrow tables and parquet data sets.
Last synced: 18 Nov 2024
https://github.com/xiangpenghao/parquet-viewer
View parquet files online
parquet parquet-viewer rust webassembly
Last synced: 20 Dec 2024
https://github.com/cldellow/csv2parquet
Convert a CSV to a parquet file.
apache-arrow apache-parquet csv parquet
Last synced: 20 Dec 2024
https://github.com/exyi/pg2parquet
Export PostgreSQL table or query into Parquet file
Last synced: 21 Dec 2024
https://github.com/monix/monix-connect
A set of connectors for Monix. 🔛
aws connectors dynamodb elasticsearch google-cloud-storage hdfs mongodb monix parquet reactive-streams redis s3 scala sqs workflow
Last synced: 15 Dec 2024
https://github.com/zsvoboda/dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx
Last synced: 12 Oct 2024
https://github.com/mattf96s/QuackDB
Open-source in-browser DuckDB SQL editor
apache-arrow comlink duckdb duckdb-wasm parquet remix remix-run shadcn sql sst
Last synced: 19 Nov 2024
https://github.com/voluntas/duckdb-wasm-parquet
duckdb-wasm parquet playwright typescript vite
Last synced: 05 Dec 2024
https://github.com/apache/parquet-testing
Apache Parquet Testing
apache parquet parquet-testing
Last synced: 15 Dec 2024
https://github.com/dask-contrib/dask-deltatable
A Delta Lake reader for Dask
dask dask-dataframes delta-lake parquet python
Last synced: 16 Dec 2024
https://github.com/hannes/miniparquet
Library to read a subset of Parquet files
cpp cpp11 dependency-free parquet parquet-cpp parquet-files
Last synced: 26 Oct 2024
https://github.com/cldellow/datasette-parquet
Add DuckDB, Parquet, CSV and JSON lines support to Datasette
datasette datasette-plugin duckdb parquet
Last synced: 01 Nov 2024
https://github.com/randomfractals/chicago-crimes
Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.
chicago crimes duckdb julia jupyter-notebooks large-csv malloy malloydata parquet polars pyarrow
Last synced: 28 Oct 2024
https://github.com/ak-coram/cl-duckdb
Common Lisp CFFI wrapper around the DuckDB C API
c-bindings common-lisp data-science duckdb lisp olap parquet sql
Last synced: 13 Nov 2024
https://github.com/devinrsmith/deephaven-parquet-viewer
A browser-based Parquet file viewer
bigdata docker parquet parquet-viewer
Last synced: 27 Nov 2024
https://github.com/timkpaine/perspective-parquet
Parquet file reader and editor in Jupyterlab, built with `perspective` for pivoting, filtering, aggregating, etc
data-science data-visualization datavisualization dataviz jupyter jupyterlab jupyterlab-extension jupyterlab-extensions parquet parquet-viewer perspective pivot-tables
Last synced: 27 Oct 2024
https://github.com/agile-lab-dev/wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
akka elasticsearch hadoop hbase hdfs jdbc kafka parquet scala solr spark spark-streaming yarn
Last synced: 17 Dec 2024
https://github.com/nevillelyh/parquet-extra
A collection of Apache Parquet add-on modules
avro magnolia parquet scala scala-macros tensorflow
Last synced: 12 Nov 2024
https://github.com/KxSystems/arrowkdb
kdb+ integration with Apache Arrow and Parquet
Last synced: 12 Nov 2024
https://github.com/recordevolution/imctermite
Enables extraction of measurement data from binary files with extension 'raw' used by proprietary software imcFAMOS/imcSTUDIO and facilitates its storage in open source file formats
binary csv fileformat imccronos imcfamos imcstudio measurement parquet raw time-series
Last synced: 17 Dec 2024
https://github.com/kxsystems/arrowkdb
kdb+ integration with Apache Arrow and Parquet
Last synced: 07 Nov 2024
https://github.com/syedhassaanahmed/databricks-notebooks
Collection of Databricks and Jupyter Notebooks
azure-data-lake azure-databricks azure-event-hubs azure-iothub azure-sql-database azure-storage cosmos-db graphframes hive-udf jupyter-notebooks kafka matplotlib mongodb pandas-dataframe parquet power-bi pyspark spark spark-sql spark-udf
Last synced: 10 Nov 2024
https://github.com/mcaceresb/stata-parquet
Read and write parquet files from Stata
Last synced: 28 Oct 2024
https://github.com/mishmash-io/opentelemetry-server-embedded
An OpenTelemetry logs, metrics, traces and profiles collector that can be embedded in other systems as a data source.
apache druid java opentelemetry otlp parquet vertx
Last synced: 18 Oct 2024
https://github.com/hrbrmstr/sparrow
Temporary Shorcut For Reading Arrow/Parquet Bits Into R via 'reticulate'
arrow pandas-dataframe parquet r rstats
Last synced: 11 Oct 2024
https://github.com/exasol/parquet-io-java
Java library to read Parquet files.
exasol exasol-integration foundation-library java parquet
Last synced: 14 Nov 2024
https://github.com/amoeba/qlarrow
WIP QuickLook plugin for Apache Arrow and Parquet
apache-arrow golang macos parquet quicklook
Last synced: 09 Nov 2024
https://github.com/pbotros/river
A high-throughput, structured streaming framework built atop Redis Streams. C++, Python, and MATLAB support.
cpp iot matlab parquet python redis stream-processing streaming
Last synced: 02 Nov 2024
https://github.com/ekote/build-your-first-end-to-end-lakehouse-solution
Build Your First End-to-End Lakehouse Solution (aka.ms/fabconlake)
apache-spark data-engineering data-factory data-pipeline data-science dataflows delta-lake lakehouse machine-learning microsoft-azure microsoft-fabric parquet powerbi tutorial warehouse workshop
Last synced: 12 Oct 2024
https://github.com/civitaspo/embulk-output-s3_parquet
Embulk (https://github.com/embulk/embulk/) output plugin to dump records as Apache Parquet (https://parquet.apache.org/) files on S3.
embulk embulk-output-plugin parquet s3
Last synced: 28 Oct 2024
https://github.com/childmindresearch/bids2table
Efficiently index large-scale BIDS neuroimaging datasets and derivatives
arrow bids data-pipeline elt etl neuroimaging parquet
Last synced: 01 Nov 2024