https://github.com/queryverse/parquetfiles.jl
FileIO.jl integration for Parquet files
https://github.com/queryverse/parquetfiles.jl
julia queryverse
Last synced: 8 months ago
JSON representation
FileIO.jl integration for Parquet files
- Host: GitHub
- URL: https://github.com/queryverse/parquetfiles.jl
- Owner: queryverse
- License: other
- Created: 2017-12-18T01:00:10.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2022-07-26T19:06:57.000Z (almost 4 years ago)
- Last Synced: 2025-04-11T04:35:55.332Z (about 1 year ago)
- Topics: julia, queryverse
- Language: Julia
- Size: 147 KB
- Stars: 19
- Watchers: 2
- Forks: 10
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# ParquetFiles
[](http://www.repostatus.org/#active)
[](https://travis-ci.org/queryverse/ParquetFiles.jl)
[](https://ci.appveyor.com/project/queryverse/parquetfiles-jl/branch/master)
[](http://codecov.io/github/queryverse/ParquetFiles.jl?branch=master)
## Overview
This package provides load support for [Parquet](https://parquet.apache.org/) files under the [FileIO.jl](https://github.com/JuliaIO/FileIO.jl) package.
## Installation
Use ``] add ParquetFiles`` in Julia to install ParquetFiles and its dependencies.
## Usage
### Load a Parquet file
To read a Parquet file into a ``DataFrame``, use the following julia code:
````julia
using ParquetFiles, DataFrames
df = DataFrame(load("data.parquet"))
````
The call to ``load`` returns a ``struct`` that is an [IterableTable.jl](https://github.com/queryverse/IterableTables.jl), so it can be passed to any function that can handle iterable tables, i.e. all the sinks in [IterableTable.jl](https://github.com/queryverse/IterableTables.jl). Here are some examples of materializing a Parquet file into data structures that are not a ``DataFrame``:
````julia
using ParquetFiles, IndexedTables, TimeSeries, Temporal, VegaLite
# Load into an IndexedTable
it = IndexedTable(load("data.parquet"))
# Load into a TimeArray
ta = TimeArray(load("data.parquet"))
# Load into a TS
ts = TS(load("data.parquet"))
# Plot directly with Gadfly
@vlplot(:point, data=load("data.parquet"), x=:a, y=:b)
````
### Using the pipe syntax
``load`` also support the pipe syntax. For example, to load a Parquet file into a ``DataFrame``, one can use the following code:
````julia
using ParquetFiles, DataFrame
df = load("data.parquet") |> DataFrame
````
The pipe syntax is especially useful when combining it with [Query.jl](https://github.com/queryverse/Query.jl) queries, for example one can easily load a Parquet file, pipe it into a query, then pipe it to the ``save`` function to store the results in a new file.