Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/djnavarro/arrow-user2022
Larger-Than-Memory Data Workflows with Apache Arrow
https://github.com/djnavarro/arrow-user2022
Last synced: about 1 month ago
JSON representation
Larger-Than-Memory Data Workflows with Apache Arrow
- Host: GitHub
- URL: https://github.com/djnavarro/arrow-user2022
- Owner: djnavarro
- License: cc-by-4.0
- Created: 2022-06-19T05:30:30.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-06-26T23:49:54.000Z (over 1 year ago)
- Last Synced: 2024-10-18T06:36:40.458Z (3 months ago)
- Language: HTML
- Homepage: https://arrow-user2022.netlify.app/
- Size: 22.9 MB
- Stars: 45
- Watchers: 2
- Forks: 44
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome_ai_agents - Arrow-User2022 - Larger-Than-Memory Data Workflows with Apache Arrow (Building / Workflows)
- awesome_ai_agents - Arrow-User2022 - Larger-Than-Memory Data Workflows with Apache Arrow (Building / Workflows)
README
# Larger-Than-Memory Data Workflows with Apache Arrow
[![DOI](https://zenodo.org/badge/505020662.svg)](https://zenodo.org/badge/latestdoi/505020662) [![Netlify Status](https://api.netlify.com/api/v1/badges/ae31113f-79b7-4bc0-9e26-600ced0da14b/deploy-status)](https://app.netlify.com/sites/arrow-user2022/deploys)
This repository contains source code and data for the Apache Arrow workshop run as part of the 2022 UseR! Conference. You can fork and download this repository from [GitHub](https://github.com/) with:
``` r
usethis::create_from_github("djnavarro/arrow-user2022", destdir="")
```The repository is not an R package, but it does have a DESCRIPTION file. To install all the package dependencies associated with the workshop, open R in the project folder and use this:
``` r
remotes::install_deps()
```The repository contains almost everything you need for the workshop. However, it does not include a copy of the data sets due to file size issues.
For the full NYC taxi data, see the instructions on the website. For the "tiny taxi" data, you can download directly from GitHub:
``` r
download.file(
url = "https://github.com/djnavarro/arrow-user2022/releases/download/v0.1/nyc-taxi-tiny.zip",
destfile = here::here("data/nyc-taxi-tiny.zip")
)
```To extract the parquet files from the archive:
``` r
unzip(
zipfile = here::here("data/nyc-taxi-tiny.zip"),
exdir = here::here("data")
)
```The workshop website files are contained within the `_site` folder, and are online at https://arrow-user2022.netlify.app/