Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/outerbounds/fast-data-blog

Example code related to a blog post, Fast Data: Loading Tables From S3 At Lightning Speed
https://github.com/outerbounds/fast-data-blog

arrow data high-performance-computing pandas python tabular-data

Last synced: 4 days ago
JSON representation

Example code related to a blog post, Fast Data: Loading Tables From S3 At Lightning Speed

Awesome Lists containing this project

README

        

# Read this blog post for context: [Fast Data: Loading Tables From S3 At Lightning Speed](https://outerbounds.com/blog/metaflow-fast-data/) ⚡

# Setup

## Python Environment 📦

### (Option A) Use conda
```
mamba env create -f env.yml
```

### (Option B) Use pip
```
python -m venv metaflow-structured-data-env
source metaflow-structured-data-env/bin/activate
pip install notebook==6.4.10 pyarrow==11.0.0 pandas==1.4.2 matplotlib==3.5.0 duckdb==0.6.0 scipy==1.10.1 lightgbm==3.3.5 seaborn==0.12.1
```

## Run the `FastDataProcessing` flow
```bash
python fast_data_processing.py --environment=conda run
```

## Run the `FastDataModeling` flow
```bash
python fast_data_modeling.py run
```