https://github.com/xhochy/fletcher
Pandas ExtensionDType/Array backed by Apache Arrow
https://github.com/xhochy/fletcher
Last synced: 4 days ago
JSON representation
Pandas ExtensionDType/Array backed by Apache Arrow
- Host: GitHub
- URL: https://github.com/xhochy/fletcher
- Owner: xhochy
- License: mit
- Archived: true
- Created: 2018-03-04T16:44:22.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2023-02-22T15:17:01.000Z (about 2 years ago)
- Last Synced: 2025-03-14T04:19:11.330Z (about 1 month ago)
- Language: Python
- Homepage: https://fletcher.readthedocs.io/
- Size: 549 KB
- Stars: 229
- Watchers: 16
- Forks: 33
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
- awesome-dataframes - fletcher - Pandas ExtensionDType/Array backed by Apache Arrow. (Libraries)
- awesome-python-machine-learning-resources - GitHub - 45% open · ⏱️ 18.02.2021): (数据容器和结构)
README
# fletcher

[](https://github.com/ambv/black)
[](https://mybinder.org/v2/gh/xhochy/fletcher/master)A library that provides a generic set of Pandas ExtensionDType/Array
implementations backed by Apache Arrow. They support a wider range of types
than Pandas natively supports and also bring a different set of constraints and
behaviours that are beneficial in many situations.# 🗃️ Archived successfully 🤘
This project has been archived as development has ceased around 2021.
With the support of [Apache Arrow-backed extension arrays in `pandas`](https://github.com/pandas-dev/pandas/pull/35259), the major goal of this project has been fulfilled.
As Marc Garcia outlines in his blog post ["pandas 2.0 and the Arrow revolution (part I)"](https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i) Apache Arrow support in `pandas` is now generally available and here to stay.
`fletcher` has hopefully discovered some bugs along the way and gave inspiration to the implementation that is now in `pandas`.## Usage
To use `fletcher` in Pandas DataFrames, all you need to do is to wrap your data
in a `FletcherChunkedArray` or `FletcherContinuousArray` object. Your data can
be of either `pyarrow.Array`, `pyarrow.ChunkedArray` or a type that can be passed
to `pyarrow.array(…)`.```
import fletcher as fr
import pandas as pddf = pd.DataFrame({
'str_chunked': fr.FletcherChunkedArray(['a', 'b', 'c']),
'str_continuous': fr.FletcherContinuousArray(['a', 'b', 'c']),
})df.info()
#
# RangeIndex: 3 entries, 0 to 2
# Data columns (total 2 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 str_chunked 3 non-null fletcher_chunked[string]
# 1 str_continuous 3 non-null fletcher_continuous[string]
# dtypes: fletcher_chunked[string](1), fletcher_continuous[string](1)
# memory usage: 166.0 bytes
```## Development
While you can use `fletcher` in pip-based environments, we strongly recommend
using a `conda` based development setup with packages from `conda-forge`.```
# Create the conda environment with all necessary dependencies
conda env create# Activate the newly created environment
conda activate fletcher# Install fletcher into the current environment
python -m pip install -e . --no-build-isolation --no-use-pep517# Run the unit tests (you should do this several times during development)
py.test -nauto# Install pre-commit hooks
# These will then be automatically run on every commit and ensure that files
# are black formatted, have no flake8 issues and mypy checks the type consistency.
pre-commit install
```Code formatting is done using black. This should keep everything in a
consistent styling and the formatting is automatically adjusted via the
pre-commit hooks.### Using pandas in development mode
To test and develop against pandas' master or your local fixes, you can install a development version of pandas using:
```
git clone https://github.com/pandas-dev/pandas
cd pandas# Install additional pandas dependencies
conda install -y cython# Build and install pandas
python setup.py build_ext --inplace -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517
```This links the development version of `pandas` into your `fletcher` conda environment.
If you change any Python code in pandas, it is directly reflected in your environment.
If you change any Cython code in pandas, you need to re-execute `python setup.py build_ext --inplace -j 4`.### Using (py)arrow nightlies
To test and develop against the latest development version of Apache Arrow (`pyarrow`), you can install it from the `arrow-nightlies` conda channel:
```
conda install -c arrow-nightlies arrow-cpp pyarrow
```### Benchmarks
In `benchmarks/` we provide a set of benchmarks to compare the performance of
`fletcher` against `pandas` and ensure that `fletcher` itself stays performant.
The benchmarks are written using
[airspeed velocity](https://asv.readthedocs.io/en/stable/). When developing
the benchmarks you can run them using `asv dev` (use `-b ` to only
run a selection of them) only once. To get real benchmark values, you should
use `asv run --python=same` to run the benchmarks multiple times and get
meaningful average runtimes.