https://github.com/mikata-project/df_io

Python helpers for doing IO with Pandas DataFrames
https://github.com/mikata-project/df_io

Last synced: about 1 year ago
JSON representation

Python helpers for doing IO with Pandas DataFrames

Host: GitHub
URL: https://github.com/mikata-project/df_io
Owner: Mikata-Project
License: mit
Created: 2020-06-24T08:45:21.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2022-02-14T14:03:11.000Z (over 4 years ago)
Last Synced: 2025-05-16T00:31:32.524Z (about 1 year ago)
Language: Python
Size: 30.3 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# df_io
Python helpers for doing IO with Pandas DataFrames

# Available methods
## read_df

* bzip2/gzip/zstandard compression
* passing parameters to Pandas' readers
* reading from anything, which `smart_open` supports (local files, AWS S3 etc)
* most of the available formats, Pandas supports

## write_df

This method supports:
* streaming writes
* chunked writes
* bzip2/gzip/zstandard compression
* passing parameters to Pandas' writers
* writing to anything, which `smart_open` supports (local files, AWS S3 etc)
* most of the available formats, Pandas supports

# Documentation

[API doc](https://github.com/Mikata-Project/df_io/tree/master/docs/df_io.md)

### Examples

Write a Pandas DataFrame (df) to an S3 path in CSV format (the default):

```python
import df_io

df_io.write_df(df, 's3://bucket/dir/mydata.csv')
```

The same with gzip compression:

```python
df_io.write_df(df, 's3://bucket/dir/mydata.csv.gz')
```

With zstandard compression using pickle:

```python
df_io.write_df(df, 's3://bucket/dir/mydata.pickle.zstd', fmt='pickle')
```

Using JSON lines:

```python
df_io.write_df(df, 's3://bucket/dir/mydata.json.gz', fmt='json')
```

Passing writer parameters:

```python
df_io.write_df(df, 's3://bucket/dir/mydata.json.gz', fmt='json', writer_options={'lines': False})
```

Chunked write (splitting the df into equally sized parts and creating/writing outputs for them):

```python
df_io.write_df(df, 's3://bucket/dir/mydata.json.gz', fmt='json', chunksize=10000)
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mikata-project/df_io

Awesome Lists containing this project

README