https://github.com/outerbounds/fast-data-blog
  
  
    Example code related to a blog post, Fast Data: Loading Tables From S3 At Lightning Speed 
    https://github.com/outerbounds/fast-data-blog
  
arrow data high-performance-computing pandas python tabular-data
        Last synced: 6 months ago 
        JSON representation
    
Example code related to a blog post, Fast Data: Loading Tables From S3 At Lightning Speed
- Host: GitHub
- URL: https://github.com/outerbounds/fast-data-blog
- Owner: outerbounds
- Created: 2023-05-05T05:01:23.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-07T04:13:07.000Z (over 2 years ago)
- Last Synced: 2024-01-25T14:44:18.581Z (almost 2 years ago)
- Topics: arrow, data, high-performance-computing, pandas, python, tabular-data
- Language: Python
- Homepage: https://outerbounds.com/blog/metaflow-fast-data/
- Size: 26.4 KB
- Stars: 4
- Watchers: 5
- Forks: 1
- Open Issues: 0
- 
            Metadata Files:
            - Readme: README.md
 
Awesome Lists containing this project
README
          
# Read this blog post for context: [Fast Data: Loading Tables From S3 At Lightning Speed](https://outerbounds.com/blog/metaflow-fast-data/) ⚡
# Setup
## Python Environment 📦
### (Option A) Use conda
```
mamba env create -f env.yml
```
### (Option B) Use pip
```
python -m venv metaflow-structured-data-env 
source metaflow-structured-data-env/bin/activate
pip install notebook==6.4.10 pyarrow==11.0.0 pandas==1.4.2 matplotlib==3.5.0 duckdb==0.6.0 scipy==1.10.1 lightgbm==3.3.5 seaborn==0.12.1
```
## Run the `FastDataProcessing` flow
```bash
python fast_data_processing.py --environment=conda run
```
## Run the `FastDataModeling` flow
```bash
python fast_data_modeling.py run
```