Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/otsaloma/dataiter
Simple, light-weight data frames for Python
https://github.com/otsaloma/dataiter
data-frame json numba numpy python
Last synced: about 1 month ago
JSON representation
Simple, light-weight data frames for Python
- Host: GitHub
- URL: https://github.com/otsaloma/dataiter
- Owner: otsaloma
- License: mit
- Created: 2019-09-29T17:49:00.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-12-14T23:20:55.000Z (about 2 months ago)
- Last Synced: 2024-12-15T00:19:55.197Z (about 2 months ago)
- Topics: data-frame, json, numba, numpy, python
- Language: Python
- Homepage: https://dataiter.readthedocs.io/
- Size: 2.87 MB
- Stars: 25
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: COPYING
- Authors: AUTHORS.md
Awesome Lists containing this project
README
Simple, Light-Weight Data Frames for Python
===========================================[![PyPI](https://img.shields.io/pypi/v/dataiter.svg)](https://pypi.org/project/dataiter)
[![Downloads](https://pepy.tech/badge/dataiter/month)](https://pepy.tech/project/dataiter)Dataiter's **`DataFrame`** is a class for tabular data similar to R's
`data.frame`, implementing all common operations to manipulate data. It
is under the hood a dictionary of NumPy arrays and thus capable of fast
vectorized operations. You can consider it to be a light-weight
alternative to Pandas with a simple and consistent API. Performance-wise
Dataiter relies on NumPy and Numba and is likely to be at best
comparable to Pandas.## Installation
```bash
# Latest stable version
pip install -U dataiter# Latest development version
pip install -U git+https://github.com/otsaloma/dataiter# Numba (optional)
pip install -U numba
```Dataiter optionally uses **Numba** to speed up certain operations. If
you have Numba installed, Dataiter will use it automatically. It's
currently not a hard dependency, so you need to install it separately.## Quick Start
```python
>>> import dataiter as di
>>> data = di.read_csv("data/listings.csv")
>>> data.filter(hood="Manhattan", guests=2).sort(price=1).head()
.
id hood zipcode guests sqft price
int64 string string int64 float64 int64
──────── ───────── ─────── ────── ─────── ─────
0 42279170 Manhattan 10013 2 nan 0
1 42384530 Manhattan 10036 2 nan 0
2 18835820 Manhattan 10021 2 nan 10
3 20171179 Manhattan 10027 2 nan 10
4 14858544 Manhattan 2 nan 15
5 31397084 Manhattan 10002 2 nan 19
6 22289683 Manhattan 10031 2 nan 20
7 7760204 Manhattan 10040 2 nan 22
8 43292527 Manhattan 10033 2 nan 22
9 43268040 Manhattan 10033 2 nan 23
.
```## Documentation
https://dataiter.readthedocs.io/
If you're familiar with either dplyr (R) or Pandas (Python), the
comparison table in the documentation will give you a quick overview of
the differences and similarities in common operations.https://dataiter.readthedocs.io/en/stable/comparison.html
## Development
To install a virtualenv for development, use
make venv
or, for a specific Python version
make PYTHON=python3.X venv