Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/milesgranger/flaco
(PoC) A very memory-efficient way to read data from PostgreSQL
https://github.com/milesgranger/flaco
arrow postgresql pyarrow python rust
Last synced: 3 months ago
JSON representation
(PoC) A very memory-efficient way to read data from PostgreSQL
- Host: GitHub
- URL: https://github.com/milesgranger/flaco
- Owner: milesgranger
- License: unlicense
- Created: 2021-09-28T17:42:53.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-10-28T04:08:27.000Z (about 2 years ago)
- Last Synced: 2024-10-12T00:35:21.344Z (3 months ago)
- Topics: arrow, postgresql, pyarrow, python, rust
- Language: Rust
- Homepage:
- Size: 146 KB
- Stars: 15
- Watchers: 3
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## flaco
[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)
[![CI](https://github.com/milesgranger/flaco/actions/workflows/CI.yml/badge.svg?branch=master)](https://github.com/milesgranger/flaco/actions/workflows/CI.yml)
[![PyPI](https://img.shields.io/pypi/v/flaco.svg)](https://pypi.org/project/flaco)
![PyPI - Wheel](https://img.shields.io/pypi/wheel/flaco)
[![Downloads](https://pepy.tech/badge/flaco/month)](https://pepy.tech/project/flaco)---
#### Install:
`pip install flaco`
---
The easiest and perhaps most memory efficient way to get PostgreSQL data (more flavors to come?)
into `pyarrow.Table`, `pandas.DataFrame` or Arrow (IPC/Feather) and Parquet files.Since [Arrow](https://github.com/apache/arrow) supports efficient and even larger-than-memory processing,
as with [dask](https://github.com/dask/dask), [duckdb](https://duckdb.org/), or others.
Just getting data onto disk is sometimes the hardest part; this aims to make that easier.API:
`flaco.read_sql_to_file`: Read SQL query into Feather or Parquet file.
`flaco.read_sql_to_pyarrow`: Read SQL query into a pyarrow table.NOTE:
This is still a WIP. I intend to generalize it more to be
useful towards a wider audience. Issues and pull requests welcome!---
### Example
```bash
Line # Mem usage Increment Occurrences Line Contents
=============================================================
122 147.9 MiB 147.9 MiB 1 @profile
123 def memory_profile():
124 147.9 MiB 0.0 MiB 1 stmt = "select * from test_table"
125
126 # Read SQL to file
127 150.3 MiB 2.4 MiB 1 flaco.read_sql_to_file(DB_URI, stmt, 'result.feather', flaco.FileFormat.Feather)
128 150.3 MiB 0.0 MiB 1 with pa.memory_map('result.feather', 'rb') as source:
129 150.3 MiB 0.0 MiB 1 table1 = pa.ipc.open_file(source).read_all()
130 408.1 MiB 257.8 MiB 1 table1_df1 = table1.to_pandas()
131
132 # Read SQL to pyarrow.Table
133 504.3 MiB 96.2 MiB 1 table2 = flaco.read_sql_to_pyarrow(DB_URI, stmt)
134 644.1 MiB 139.8 MiB 1 table2_df = table2.to_pandas()
135
136 # Pandas
137 648.8 MiB 4.7 MiB 1 engine = create_engine(DB_URI)
138 1335.4 MiB 686.6 MiB 1 _pandas_df = pd.read_sql(stmt, engine)
```---
### License
> _Why did you choose such lax licensing? Could you change to a copy left license, please?_
...just kidding, no one would ask that. This is dual licensed under
[Unlicense](LICENSE) or [MIT](LICENSE-MIT), at your discretion.