https://github.com/rapidsai/cudf
cuDF - GPU DataFrame Library
https://github.com/rapidsai/cudf
arrow cpp cuda cudf dask data-analysis data-science dataframe gpu pandas pydata python rapids
Last synced: 3 months ago
JSON representation
cuDF - GPU DataFrame Library
- Host: GitHub
- URL: https://github.com/rapidsai/cudf
- Owner: rapidsai
- License: apache-2.0
- Created: 2017-05-07T03:43:37.000Z (almost 9 years ago)
- Default Branch: main
- Last Pushed: 2026-02-04T09:36:51.000Z (3 months ago)
- Last Synced: 2026-02-04T12:47:40.547Z (3 months ago)
- Topics: arrow, cpp, cuda, cudf, dask, data-analysis, data-science, dataframe, gpu, pandas, pydata, python, rapids
- Language: C++
- Homepage: https://docs.rapids.ai/api/cudf/stable/
- Size: 179 MB
- Stars: 9,479
- Watchers: 157
- Forks: 1,005
- Open Issues: 1,193
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
- awesome-python-tools - cuDF - accelerated DataFrame library based on Pandas, part of the RAPIDS ecosystem for faster large-scale data processing. (Data Science / Data Manipulations)
- awesome-list - cuDF - GPU DataFrame Library. (Data Processing / Data Representation)
- awesome-python-machine-learning-resources - GitHub - 12% open · ⏱️ 26.08.2022): (GPU实用程序)
- awesome-systematic-trading - CuDF - commit/rapidsai/cudf/main)  | Python | - cuDF - GPU DataFrame Library. No-code-change accelerator for pandas. (Basic Components / Python Performance Booster)
- stars - rapidsai/cudf - GPU DataFrame Library (HarmonyOS / Windows Manager)
- awesome-production-machine-learning - CuDF - Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. (Computation and Communication Optimisation)
- awesome-robotic-tooling - cudf - Provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming. (Operation System / Server Infrastructure and High Performance Computing)
- awesome-python-data-science - cuDF - GPU DataFrame Library. <img height="20" src="img/pandas_big.png" alt="pandas compatible"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated"> (Data Manipulation / Data Frames)
- awesome-machine-learning-datascience_resources - cudf
- awesome-python-data-science - cuDF - GPU DataFrame Library. <img height="20" src="img/pandas_big.png" alt="pandas compatible"> <img height="20" src="img/gpu_big.png" alt="GPU accelerated"> (Data Manipulation / Data Frames)
- awesome-opensource-ai - cuDF - GPU DataFrame library from RAPIDS. Accelerates pandas workflows on NVIDIA GPUs with zero code changes using cuDF.pandas accelerator mode. (📋 Contents / 🧬 1. Core Frameworks & Libraries)
- awesome-data-analysis - cuDF - A GPU DataFrame library for loading, joining, and aggregating data. (🐍 Python / Useful Python Tools for Data Analysis)
- awesome-vector-databases - NVIDIA cuDF - Open-source Python GPU DataFrame library that accelerates popular data engines like Apache Spark, pandas, and Polars on NVIDIA AI infrastructure. Built on Apache Arrow, it utilizes GPU parallelism and memory bandwidth to accelerate data processing and analytics workflows, serving as the data-processing foundation for the Sirius GPU-accelerated database project. ([Read more](/details/nvidia-cudf.md)) `GPU-accelerated` `Dataframe` `Apache Arrow` (Data Processing)
README
#
cuDF - A GPU-accelerated DataFrame library for tabular data processing
cuDF (pronounced "KOO-dee-eff") is an [Apache 2.0 licensed](LICENSE), GPU-accelerated DataFrame library
for tabular data processing. The cuDF library is one part of the [RAPIDS](https://rapids.ai/) GPU
Accelerated Data Science suite of libraries.
## About
cuDF is composed of multiple libraries including:
* [libcudf](https://docs.rapids.ai/api/cudf/stable/libcudf_docs/): A CUDA C++ library with [Apache Arrow](https://arrow.apache.org/) compliant
data structures and fundamental algorithms for tabular data.
* [pylibcudf](https://docs.rapids.ai/api/cudf/stable/pylibcudf/): A Python library providing [Cython](https://cython.org/) bindings for libcudf.
* [cudf](https://docs.rapids.ai/api/cudf/stable/user_guide/): A Python library providing
- A DataFrame library mirroring the [pandas](https://pandas.pydata.org/) API
- A zero-code change accelerator, [cudf.pandas](https://docs.rapids.ai/api/cudf/stable/cudf_pandas/), for existing pandas code.
* [cudf-polars](https://docs.rapids.ai/api/cudf/stable/cudf_polars/): A Python library providing a GPU engine for [Polars](https://pola.rs/)
* [dask-cudf](https://docs.rapids.ai/api/dask-cudf/stable/): A Python library providing a GPU backend for [Dask](https://www.dask.org/) DataFrames
Notable projects that use cuDF include:
* [Spark RAPIDS](https://github.com/NVIDIA/spark-rapids): A GPU accelerator plugin for [Apache Spark](https://spark.apache.org/)
* [Velox-cuDF](https://github.com/facebookincubator/velox/blob/main/velox/experimental/cudf/README.md): A [Velox](https://velox-lib.io/)
extension module to execute Velox plans on the GPU
* [Sirius](https://www.sirius-db.com/): A GPU-native SQL engine providing extensions for libraries like [DuckDB](https://duckdb.org/)
## Installation
### System Requirements
Operating System, GPU driver, and supported CUDA version information can be found at the [RAPIDS Installation Guide](https://docs.rapids.ai/install/#system-req)
### pip
A stable release of each cudf library is available on PyPI. You will need to match the major version number of your installed CUDA version with a `-cu##` suffix when installing from PyPI.
A development version of each library is available as a nightly release by including the `-i https://pypi.anaconda.org/rapidsai-wheels-nightly/simple` index.
```bash
# CUDA 13
pip install libcudf-cu13
pip install pylibcudf-cu13
pip install cudf-cu13
pip install cudf-polars-cu13
pip install dask-cudf-cu13
# CUDA 12
pip install libcudf-cu12
pip install pylibcudf-cu12
pip install cudf-cu12
pip install cudf-polars-cu12
pip install dask-cudf-cu12
```
### conda
A stable release of each cudf library is available to be installed with the conda package manager by specifying the `-c rapidsai` channel.
A development version of each library is available as a nightly release by specifying the `-c rapidsai-nightly` channel instead.
```bash
conda install -c rapidsai libcudf
conda install -c rapidsai pylibcudf
conda install -c rapidsai cudf
conda install -c rapidsai cudf-polars
conda install -c rapidsai dask-cudf
```
### source
To install cuDF from source, please follow [the contribution guide](CONTRIBUTING.md#setting-up-your-build-environment) detailing
how to setup the build environment.
## Examples
The following examples showcase reading a parquet file, dropping missing rows with a null value,
and performing a groupby aggregation on the data.
### cudf
`import cudf` and the APIs are largely similar to pandas.
```python
import cudf
df = cudf.read_parquet("data.parquet")
df.dropna().groupby(["A", "B"]).mean()
```
### cudf.pandas
With a Python file containing pandas code:
```python
import pandas as pd
df = pd.read_parquet("data.parquet")
df.dropna().groupby(["A", "B"]).mean()
```
Use cudf.pandas by invoking `python` with `-m cudf.pandas`
```bash
$ python -m cudf.pandas script.py
```
If running the pandas code in an interactive Jupyter environment, call `%load_ext cudf.pandas` before
importing pandas.
```python
In [1]: %load_ext cudf.pandas
In [2]: import pandas as pd
In [3]: df = pd.read_parquet("data.parquet")
In [4]: df.dropna().groupby(["A", "B"]).mean()
```
### cudf-polars
Using Polars' [lazy API](https://docs.pola.rs/user-guide/lazy/), call `collect` with `engine="gpu"` to run
the operation on the GPU
```python
import polars as pl
lf = pl.scan_parquet("data.parquet")
lf.drop_nulls().group_by(["A", "B"]).mean().collect(engine="gpu")
```
## Questions and Discussion
For bug reports or feature requests, please [file an issue](https://github.com/rapidsai/cudf/issues/new/choose) on the GitHub issue tracker.
For questions or discussion about cuDF and GPU data processing, feel free to post in the [RAPIDS Slack](https://rapids.ai/slack-invite) workspace.
## Contributing
cuDF is open to contributions from the community! Please see our [guide for contributing to cuDF](CONTRIBUTING.md) for more information.