https://github.com/vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://github.com/vaexio/vaex

bigdata data-science dataframe hdf5 machine-learning machinelearning memory-mapped-file pyarrow python tabular-data visualization

Last synced: 4 months ago
JSON representation

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

Host: GitHub
URL: https://github.com/vaexio/vaex
Owner: vaexio
License: mit
Created: 2014-09-27T09:44:42.000Z (almost 12 years ago)
Default Branch: master
Last Pushed: 2024-10-08T16:23:10.000Z (almost 2 years ago)
Last Synced: 2025-05-11T11:12:22.969Z (about 1 year ago)
Topics: bigdata, data-science, dataframe, hdf5, machine-learning, machinelearning, memory-mapped-file, pyarrow, python, tabular-data, visualization
Language: Python
Homepage: https://vaex.io
Size: 133 MB
Stars: 8,378
Watchers: 142
Forks: 599
Open Issues: 547
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Security: SECURITY.md
- Authors: AUTHORS.txt

Awesome Lists containing this project

awesome-python-data-science - vaex - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second. (Data Manipulation / Data Frames)
awesome-systematic-trading - Vaex - commit/vaexio/vaex/master) ![GitHub Repo stars](https://img.shields.io/github/stars/vaexio/vaex?style=social) | Python, C++ | - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second (Basic Components / Alternative libraries)
awesome-high-performance-computing - Vaex - A Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. (Software / Trends)
awesome-list - Vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second. (Data Processing / Data Representation)
awesome-data-analysis - Vaex - High-performance Python library for lazy Out-of-Core DataFrames. (🐍 Python / Useful Python Tools for Data Analysis)
awesome-python-machine-learning-resources - GitHub - 31% open · ⏱️ 25.08.2022): (数据容器和结构)
StarryDivineSky - vaexio/vaex
awesome-machine-learning - Vaex - A high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. Documentation can be found [here](https://vaex.io/docs/index.html). (Python / General-Purpose Machine Learning)
awesome-machine-learning - Vaex - A high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. Documentation can be found [here](https://vaex.io/docs/index.html). (Python / General-Purpose Machine Learning)
awesome-production-machine-learning - Vaex - of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted). (Computation and Communication Optimisation)
fucking-awesome-machine-learning - Vaex - A high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. Documentation can be found 🌎 [here](vaex.io/docs/index.html). (Python / General-Purpose Machine Learning)
awesome-machine-learning - Vaex - A high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. Documentation can be found [here](https://vaex.io/docs/index.html). (Python / General-Purpose Machine Learning)
awesome-meteo - vaex - of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. (Uncategorized / Uncategorized)
awesome-python-fa - vaex - تحلیل سریع و حافظه‌-کارآمد داده‌های بزرگ. (📚 فهرست / کتابخانه هاي تحليل داده)
awesome-python-data-science - vaex - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second. (Data Manipulation / Data Frames)
awesome-machine-learning - Vaex - A high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. Documentation can be found [here](https://vaex.io/docs/index.html). (Python / General-Purpose Machine Learning)
awesome-python-learning - [vaexio/vaex: Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀
best-of-python - GitHub - 41% open · ⏱️ 05.02.2026): (Data Containers & Dataframes)
awesome-dataframes - Vaex - A high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. (Libraries)
awesome-opensource-ai - Vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python. Visualize and explore billion-row datasets at millions of rows per second. MIT licensed. ![GitHub stars](https://img.shields.io/github/stars/vaexio/vaex?style=social) (1. Core Frameworks & Libraries)
my-awesome-starred - vaexio/vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀 (Python)

README

          [![Supported Python Versions](https://img.shields.io/pypi/pyversions/vaex-core)](https://pypi.org/project/vaex-core/)

[![Documentation](https://readthedocs.org/projects/vaex/badge/?version=latest)](https://docs.vaex.io)

[![Slack](https://img.shields.io/badge/slack-chat-green.svg)](https://join.slack.com/t/vaexio/shared_invite/zt-shhxzf5i-Cf5n2LtkoYgUjOjbB3bGQQ)

# What is Vaex?

Vaex is a high performance Python library for lazy **Out-of-Core DataFrames**

(similar to Pandas), to visualize and explore big tabular datasets. It

calculates *statistics* such as mean, sum, count, standard deviation etc, on an

*N-dimensional grid* for more than **a billion** (`10^9`) samples/rows **per

second**. Visualization is done using **histograms**, **density plots** and **3d

volume rendering**, allowing interactive exploration of big data. Vaex uses

memory mapping, zero memory copy policy and lazy computations for best

performance (no memory wasted).

# Installing

With pip:

```

$ pip install vaex

```

Or conda:

```

$ conda install -c conda-forge vaex

```

[For more details, see the documentation](https://docs.vaex.io/en/latest/installing.html)

# Key features

## Instant opening of Huge data files (memory mapping)

[HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) and [Apache Arrow](https://arrow.apache.org/) supported.

![opening1a](https://user-images.githubusercontent.com/1765949/82818563-31c1e200-9e9f-11ea-9ee0-0a8c1994cdc9.png)

![opening1b](https://user-images.githubusercontent.com/1765949/82820352-49e73080-9ea2-11ea-9153-d73aa399d329.png)

[Read the documentation on how to efficiently convert your data](https://docs.vaex.io/en/latest/example_io.html) from CSV files, Pandas DataFrames, or other sources.

Lazy streaming from S3 supported in combination with memory mapping.

![opening1c](https://user-images.githubusercontent.com/1765949/82820516-a21e3280-9ea2-11ea-948b-07df26c4b5d3.png)

## Expression system

Don't waste memory or time with feature engineering, we (lazily) transform your data when needed.

![expression](https://user-images.githubusercontent.com/1765949/82818733-70f03300-9e9f-11ea-80b0-ab28e7950b5c.png)

## Out-of-core DataFrame

Filtering and evaluating expressions will not waste memory by making copies; the data is kept untouched on disk, and will be streamed only when needed. Delay the time before you need a cluster.

![occ-animated](https://user-images.githubusercontent.com/1765949/82821111-c6c6da00-9ea3-11ea-9f9e-498de8133cc2.gif)

## Fast groupby / aggregations

Vaex implements parallelized, highly performant `groupby` operations, especially when using categories (>1 billion/second).

![groupby](https://user-images.githubusercontent.com/1765949/82818807-97ae6980-9e9f-11ea-8820-41dd4441057a.png)

## Fast and efficient join

Vaex doesn't copy/materialize the 'right' table when joining, saving gigabytes of memory. With subsecond joining on a billion rows, it's pretty fast!

![join](https://user-images.githubusercontent.com/1765949/82818840-a268fe80-9e9f-11ea-8ba2-6a6d52c4af88.png)

## More features

 * Remote DataFrames (documentation coming soon)

 * Integration into [Jupyter and Voila for interactive notebooks and dashboards](https://vaex.readthedocs.io/en/latest/tutorial_jupyter.html)

 * [Machine Learning without (explicit) pipelines](https://vaex.readthedocs.io/en/latest/tutorial_ml.html)

## Contributing

See [contributing](CONTRIBUTING.md) page.

## Slack

Join the discussion in our [Slack](https://join.slack.com/t/vaexio/shared_invite/zt-shhxzf5i-Cf5n2LtkoYgUjOjbB3bGQQ) channel!

# Learn more about Vaex

 * Articles

   * [Beyond Pandas: Spark, Dask, Vaex and other big data technologies battling head to head](https://towardsdatascience.com/beyond-pandas-spark-dask-vaex-and-other-big-data-technologies-battling-head-to-head-a453a1f8cc13) (includes benchmarks)

   * [7 reasons why I love Vaex for data science](https://towardsdatascience.com/7-reasons-why-i-love-vaex-for-data-science-99008bc8044b) (tips and trics)

   * [ML impossible: Train 1 billion samples in 5 minutes on your laptop using Vaex and Scikit-Learn](https://towardsdatascience.com/ml-impossible-train-a-1-billion-sample-model-in-20-minutes-with-vaex-and-scikit-learn-on-your-9e2968e6f385)

   * [How to analyse 100 GB of data on your laptop with Python](https://towardsdatascience.com/how-to-analyse-100s-of-gbs-of-data-on-your-laptop-with-python-f83363dda94)

   * [Flying high with Vaex: analysis of over 30 years of flight data in Python](https://towardsdatascience.com/https-medium-com-jovan-veljanoski-flying-high-with-vaex-analysis-of-over-30-years-of-flight-data-in-python-b224825a6d56)

   * [Vaex: A DataFrame with super strings - Speed up your text processing up to a 1000x

](https://towardsdatascience.com/vaex-a-dataframe-with-super-strings-789b92e8d861)

   * [Vaex: Out of Core Dataframes for Python and Fast Visualization - 1 billion row datasets on your laptop](https://towardsdatascience.com/vaex-out-of-core-dataframes-for-python-and-fast-visualization-12c102db044a)

 * [Follow our tutorials](https://docs.vaex.io/en/latest/tutorials.html)

 * Watch our more recent talks:

   * [PyData London 2019](https://www.youtube.com/watch?v=2Tt0i823-ec)

   * [SciPy 2019](https://www.youtube.com/watch?v=ELtjRdPT8is)

 * Contact us for data science solutions, training, or enterprise support at https://vaex.io/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vaexio/vaex

Awesome Lists containing this project

README