Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/outerbounds/fast-data-blog
Example code related to a blog post, Fast Data: Loading Tables From S3 At Lightning Speed
https://github.com/outerbounds/fast-data-blog
arrow data high-performance-computing pandas python tabular-data
Last synced: 4 days ago
JSON representation
Example code related to a blog post, Fast Data: Loading Tables From S3 At Lightning Speed
- Host: GitHub
- URL: https://github.com/outerbounds/fast-data-blog
- Owner: outerbounds
- Created: 2023-05-05T05:01:23.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-05-07T04:13:07.000Z (over 1 year ago)
- Last Synced: 2024-01-25T14:44:18.581Z (10 months ago)
- Topics: arrow, data, high-performance-computing, pandas, python, tabular-data
- Language: Python
- Homepage: https://outerbounds.com/blog/metaflow-fast-data/
- Size: 26.4 KB
- Stars: 4
- Watchers: 5
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Read this blog post for context: [Fast Data: Loading Tables From S3 At Lightning Speed](https://outerbounds.com/blog/metaflow-fast-data/) ⚡
# Setup
## Python Environment 📦
### (Option A) Use conda
```
mamba env create -f env.yml
```### (Option B) Use pip
```
python -m venv metaflow-structured-data-env
source metaflow-structured-data-env/bin/activate
pip install notebook==6.4.10 pyarrow==11.0.0 pandas==1.4.2 matplotlib==3.5.0 duckdb==0.6.0 scipy==1.10.1 lightgbm==3.3.5 seaborn==0.12.1
```## Run the `FastDataProcessing` flow
```bash
python fast_data_processing.py --environment=conda run
```## Run the `FastDataModeling` flow
```bash
python fast_data_modeling.py run
```