Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xorbitsai/xorbits
Scalable Python DS & ML, in an API compatible & lightning fast way.
https://github.com/xorbitsai/xorbits
data-science distributed-systems lightgbm machine-learning ml numpy pandas python scalable xgboost
Last synced: 28 days ago
JSON representation
Scalable Python DS & ML, in an API compatible & lightning fast way.
- Host: GitHub
- URL: https://github.com/xorbitsai/xorbits
- Owner: xorbitsai
- License: apache-2.0
- Created: 2022-07-27T09:34:08.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-06T10:14:47.000Z (about 1 month ago)
- Last Synced: 2024-10-06T10:49:48.775Z (about 1 month ago)
- Topics: data-science, distributed-systems, lightgbm, machine-learning, ml, numpy, pandas, python, scalable, xgboost
- Language: Python
- Homepage: https://xorbits.readthedocs.io
- Size: 6.81 MB
- Stars: 1,112
- Watchers: 19
- Forks: 67
- Open Issues: 122
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - xorbitsai/xorbits - 从数据预处理到调优、训练和模型服务。Xorbits 可以利用多核或 GPU 来加速单台机器上的计算,或者横向扩展到数千台机器,以支持处理数 TB 的数据以及训练或为大型模型提供服务。 (其他_机器学习与深度学习)
README
[![PyPI Latest Release](https://img.shields.io/pypi/v/xorbits.svg?style=for-the-badge)](https://pypi.org/project/xorbits/)
[![License](https://img.shields.io/pypi/l/xorbits.svg?style=for-the-badge)](https://github.com/xorbitsai/xorbits/blob/main/LICENSE)
[![Coverage](https://img.shields.io/codecov/c/github/xorbitsai/xorbits?style=for-the-badge)](https://codecov.io/gh/xorbitsai/xorbits)
[![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/xorbits/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/xorbits/goto?ref=main)
[![Doc](https://readthedocs.org/projects/xorbits/badge/?version=latest&style=for-the-badge)](https://xorbits.readthedocs.io/)
[![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)
[![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=twitter&style=for-the-badge)](https://twitter.com/xorbitsio)## What is Xorbits?
Xorbits is an open-source computing framework that makes it easy to scale data science and machine learning workloads —
from data preprocessing to tuning, training, and model serving. Xorbits can leverage multi-cores or GPUs to accelerate
computation on a single machine or scale out up to thousands of machines to support processing terabytes of data and training or serving large models.Xorbits provides a suite of best-in-class [libraries](https://xorbits.readthedocs.io/en/latest/libraries/index.html) for data
scientists and machine learning practitioners. Xorbits provides the capability to scale tasks without the necessity for
extensive knowledge of infrastructure.Xorbits features a familiar Python API that supports a variety of libraries, including pandas, NumPy, PyTorch,
XGBoost, etc. With a simple modification of just one line of code, your pandas workflow can be seamlessly scaled
using Xorbits:
## Why Xorbits?
As ML and AI workloads continue to grow in complexity, the computational demands soar high. Even though single-node development
environments like your laptop provide convenience, but they fall short when it comes to accommodating these scaling demands.### Seamlessly scale your workflow from laptop to cluster
To use Xorbits, you do not need to specify how to distribute the data or even know how many cores your system has.
You can keep using your existing notebooks and still enjoy a significant speed boost from Xorbits, even on your laptop.### Process large datasets that pandas can't
Xorbits can [leverage all of your computational cores](https://xorbits.readthedocs.io/en/latest/getting_started/why_xorbits/pandas.html#boosting-performance-and-scalability-with-xorbits).
It is especially beneficial for handling [larger datasets](https://xorbits.readthedocs.io/en/latest/getting_started/why_xorbits/pandas.html#overcoming-memory-limitations-in-large-datasets-with-xorbits),
where pandas may slow down or run out of memory.### Lightning-fast speed
According to our benchmark tests, Xorbits surpasses other popular pandas API frameworks in speed and scalability.
See our [performance comparison](https://xorbits.readthedocs.io/en/latest/getting_started/why_xorbits/comparisons.html#performance-comparison)
, [explanation](https://xorbits.readthedocs.io/en/latest/getting_started/why_xorbits/fast.html) and [research paper](https://arxiv.org/abs/2401.00865).### Leverage the Python ecosystem with native integrations
Xorbits aims to take full advantage of the entire ML ecosystem, offering native integration with pandas and other libraries.
## Where to get it?
The source code is currently hosted on GitHub at: https://github.com/xorbitsai/xorbitsBinary installers for the latest released version are available at the [Python
Package Index (PyPI)](https://pypi.org/project/xorbits).```shell
# PyPI
pip install xorbits
```## Other resources
* [Documentation](https://xorbits.readthedocs.io)
* [Performance Benchmarks](https://xorbits.readthedocs.io/en/latest/getting_started/why_xorbits/comparisons.html#performance-comparison)
* [Development Guide](https://xorbits.readthedocs.io/en/latest/development/index.html)
* [Research Paper on Xorbits' Internals](https://arxiv.org/abs/2401.00865)## License
[Apache 2](LICENSE)## Roadmaps
The main goals we want to achieve in the future include the following:* Transitioning from pandas native to arrow native for data storage
will reduce the memory cost substantially and is more friendly for compute engine.
* Introducing native engines that leverage technologies like vectorization and codegen
to accelerate computations.
* Scale as many libraries and algorithms as possible!More detailed roadmaps will be revealed soon. Stay tuned!
## Relationship with Mars
The creators of Xorbits are mainly those of Mars, and we currently built Xorbits on Mars
to reduce duplicated work, but the vision of Xorbits suggests that it's not
appropriate to put everything on Mars. Instead, we need a new project
to support the roadmaps better. In the future, we will replace some core internal components
with other upcoming ones we will propose. Stay tuned!## Getting involved
| Platform | Purpose |
|-----------------------------------------------------------------------------------------------|----------------------------------------------------|
| [Github Issues](https://github.com/xorbitsai/xorbits/issues) | Reporting bugs and filing feature requests. |
| [StackOverflow](https://stackoverflow.com/questions/tagged/xorbits) | Asking questions about how to use Xorbits. |
| [Slack](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg) | Collaborating with other Xorbits users. |## Citing Xorbits
If Xorbits could help you, please cite our paper using the following metadata:
```
@inproceedings{lu2024Xorbits,
title = {Xorbits: Automating Operator Tiling for Distributed Data Science},
shorttitle = {Xorbits},
booktitle = {2024 {{IEEE}} 40th {{International Conference}} on {{Data Engineering}} ({{ICDE}})},
author = {Lu, Weizheng and He, Kaisheng and Qin, Xuye and Li, Chengjie and Wang, Zhong and Yuan, Tao and Liao, Xia and Zhang, Feng and Chen, Yueguo and Du, Xiaoyong},
year = {2024},
month = may,
pages = {5211--5223},
issn = {2375-026X},
doi = {10.1109/ICDE60146.2024.00392},
}
```