{"id":15028773,"url":"https://github.com/blazingdb/blazingsql","last_synced_at":"2025-05-15T13:04:27.904Z","repository":{"id":35236499,"uuid":"150149024","full_name":"BlazingDB/blazingsql","owner":"BlazingDB","description":"BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.","archived":false,"fork":false,"pushed_at":"2022-09-16T23:58:37.000Z","size":43360,"stargazers_count":1932,"open_issues_count":146,"forks_count":183,"subscribers_count":55,"default_branch":"branch-21.08","last_synced_at":"2024-10-29T17:51:28.872Z","etag":null,"topics":["arrow","artificial-intelligence","blazingsql","conda-environment","cudf","data-science","gpu","gpu-acceleration","gpu-dataframes","machine-learning","machine-learning-workflow","python","rapids","rapidsai","sql","sql-engine"],"latest_commit_sha":null,"homepage":"https://blazingsql.com","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BlazingDB.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-09-24T18:25:45.000Z","updated_at":"2024-10-21T11:48:03.000Z","dependencies_parsed_at":"2023-01-15T17:00:25.008Z","dependency_job_id":null,"html_url":"https://github.com/BlazingDB/blazingsql","commit_stats":null,"previous_names":[],"tags_count":47,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlazingDB%2Fblazingsql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlazingDB%2Fblazingsql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlazingDB%2Fblazingsql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlazingDB%2Fblazingsql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BlazingDB","download_url":"https://codeload.github.com/BlazingDB/blazingsql/tar.gz/refs/heads/branch-21.08","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247675597,"owners_count":20977376,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","artificial-intelligence","blazingsql","conda-environment","cudf","data-science","gpu","gpu-acceleration","gpu-dataframes","machine-learning","machine-learning-workflow","python","rapids","rapidsai","sql","sql-engine"],"created_at":"2024-09-24T20:09:03.828Z","updated_at":"2025-04-07T15:04:51.480Z","avatar_url":"https://github.com/BlazingDB.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003e A lightweight, GPU accelerated, SQL engine built on the [RAPIDS.ai](https://rapids.ai) ecosystem.\n\n\u003ca href='https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/welcome.ipynb'\u003eGet Started on app.blazingsql.com\u003c/a\u003e\n\n[Getting Started](#getting-started) | [Documentation](https://docs.blazingdb.com) | [Examples](#examples) | [Contributing](#contributing) | [License](LICENSE) | [Blog](https://blog.blazingdb.com) | [Try Now](https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/welcome.ipynb)\n\nBlazingSQL is a GPU accelerated SQL engine built on top of the RAPIDS ecosystem. RAPIDS is based on the [Apache Arrow](http://arrow.apache.org) columnar memory format, and [cuDF](https://github.com/rapidsai/cudf) is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.\n\nBlazingSQL is a SQL interface for cuDF, with various features to support large scale data science workflows and enterprise datasets.\n* **Query Data Stored Externally** - a single line of code can register remote storage solutions, such as Amazon S3.\n* **Simple SQL** - incredibly easy to use, run a SQL query and the results are GPU DataFrames (GDFs).\n* **Interoperable** - GDFs are immediately accessible to any [RAPIDS](htts://github.com/rapidsai) library for data science workloads.\n\nTry our 5-min [Welcome Notebook](https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/welcome.ipynb) to start using BlazingSQL and RAPIDS AI.\n\n# Getting Started\n\nHere's two copy + paste reproducable BlazingSQL snippets, keep scrolling to find [example Notebooks](#examples) below.\n\nCreate and query a table from a `cudf.DataFrame` with progress bar:\n\n```python\nimport cudf\n\ndf = cudf.DataFrame()\n\ndf['key'] = ['a', 'b', 'c', 'd', 'e']\ndf['val'] = [7.6, 2.9, 7.1, 1.6, 2.2]\n\nfrom blazingsql import BlazingContext\nbc = BlazingContext(enable_progress_bar=True)\n\nbc.create_table('game_1', df)\n\nbc.sql('SELECT * FROM game_1 WHERE val \u003e 4') # the query progress will be shown\n```\n\n| | Key | Value |\n| - | -:| ---:|\n| 0 | a | 7.6 |\n| 1 | b | 7.1 |\n\nCreate and query a table from a AWS S3 bucket:\n\n```python\nfrom blazingsql import BlazingContext\nbc = BlazingContext()\n\nbc.s3('blazingsql-colab', bucket_name='blazingsql-colab')\n\nbc.create_table('taxi', 's3://blazingsql-colab/yellow_taxi/taxi_data.parquet')\n\nbc.sql('SELECT passenger_count, trip_distance FROM taxi LIMIT 2')\n```\n\n| | passenger_count | fare_amount |\n| - | -:| ---:|\n| 0 | 1.0 | 1.1 |\n| 1 | 1.0 | 0.7 |\n\n## Examples\n\n| Notebook Title | Description | Try Now |\n| -------------- | ----------- | ------- |\n| Welcome Notebook | An introduction to BlazingSQL Notebooks and the GPU Data Science Ecosystem. | \u003ca href='https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/welcome.ipynb'\u003e\u003cimg src=\"https://blazingsql.com/launch-notebooks.png\" alt=\"Launch on BlazingSQL Notebooks\" width=\"500\"/\u003e\u003c/a\u003e |\n| The DataFrame | Learn how to use BlazingSQL and cuDF to create GPU DataFrames with SQL and Pandas-like APIs. | \u003ca href='https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/intro_notebooks/the_dataframe.ipynb'\u003e\u003cimg src=\"https://blazingsql.com/launch-notebooks.png\" alt=\"Launch on BlazingSQL Notebooks\" width=\"500\"/\u003e\u003c/a\u003e |\n| Data Visualization | Plug in your favorite Python visualization packages, or use GPU accelerated visualization tools to render millions of rows in a flash. | \u003ca href='https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/intro_notebooks/data_visualization.ipynb'\u003e\u003cimg src=\"https://blazingsql.com/launch-notebooks.png\" alt=\"Launch on BlazingSQL Notebooks\" width=\"500\"/\u003e\u003c/a\u003e |\n| Machine Learning | Learn about cuML, mirrored after the Scikit-Learn API, it offers GPU accelerated machine learning on GPU DataFrames. | \u003ca href='https://app.blazingsql.com/jupyter/user-redirect/lab/workspaces/auto-b/tree/Welcome_to_BlazingSQL_Notebooks/intro_notebooks/machine_learning.ipynb'\u003e\u003cimg src=\"https://blazingsql.com/launch-notebooks.png\" alt=\"Launch on BlazingSQL Notebooks\" width=\"500\"/\u003e\u003c/a\u003e |\n\n## Documentation\nYou can find our full documentation at [docs.blazingdb.com](https://docs.blazingdb.com/docs).\n\n# Prerequisites\n* [Anaconda or Miniconda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html) installed\n* OS Support\n  * Ubuntu 16.04/18.04 LTS\n  * CentOS 7\n* GPU Support\n  * Pascal or Better\n  * Compute Capability \u003e= 6.0\n* CUDA Support\n  * 11.0\n  * 11.2\n  * 11.4\n* Python Support\n  * 3.7\n  * 3.8\n# Install Using Conda\nBlazingSQL can be installed with conda ([miniconda](https://conda.io/miniconda.html), or the full [Anaconda distribution](https://www.anaconda.com/download)) from the [blazingsql](https://anaconda.org/blazingsql/) channel:\n\n## Stable Version\n```bash\nconda install -c blazingsql -c rapidsai -c nvidia -c conda-forge -c defaults blazingsql python=$PYTHON_VERSION cudatoolkit=$CUDA_VERSION\n```\nWhere $CUDA_VERSION is 11.0, 11.2 or 11.4  and $PYTHON_VERSION is 3.7 or 3.8\n*For example for CUDA 11.2 and Python 3.8:*\n```bash\nconda install -c blazingsql -c rapidsai -c nvidia -c conda-forge -c defaults blazingsql python=3.8 cudatoolkit=11.2\n```\n\n## Nightly Version\nFor nightly version cuda 11+ are only supported, see https://github.com/rapidsai/cudf#cudagpu-requirements\n```bash\nconda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=$PYTHON_VERSION  cudatoolkit=$CUDA_VERSION\n```\nWhere $CUDA_VERSION is 11.0, 11.2 or 11.4 and $PYTHON_VERSION is 3.7 or 3.8\n*For example for CUDA 11.2 and Python 3.8:*\n```bash\nconda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=3.8  cudatoolkit=11.2\n```\n\n# Build/Install from Source (Conda Environment)\nThis is the recommended way of building all of the BlazingSQL components and dependencies from source. It ensures that all the dependencies are available to the build process.\n\n## Stable Version\n\n### Install build dependencies\n```bash\nconda create -n bsql python=$PYTHON_VERSION\nconda activate bsql\n./dependencies.sh 21.08 $CUDA_VERSION\n```\nWhere $CUDA_VERSION is is 11.0, 11.2 or 11.4 and $PYTHON_VERSION is 3.7 or 3.8\n*For example for CUDA 11.2 and Python 3.7:*\n```bash\nconda create -n bsql python=3.7\nconda activate bsql\n./dependencies.sh 21.08 11.2\n```\n\n### Build\nThe build process will checkout the BlazingSQL repository and will build and install into the conda environment.\n\n```bash\ncd $CONDA_PREFIX\ngit clone https://github.com/BlazingDB/blazingsql.git\ncd blazingsql\ngit checkout main\nexport CUDACXX=/usr/local/cuda/bin/nvcc\n./build.sh\n```\nNOTE: You can do `./build.sh -h` to see more build options.\n\n$CONDA_PREFIX now has a folder for the blazingsql repository.\n\n## Nightly Version\n\n### Install build dependencies\nFor nightly version cuda 11+ are only supported, see https://github.com/rapidsai/cudf#cudagpu-requirements\n```bash\nconda create -n bsql python=$PYTHON_VERSION\nconda activate bsql\n./dependencies.sh 21.10 $CUDA_VERSION nightly\n```\nWhere $CUDA_VERSION is 11.0, 11.2 or 11.4 and $PYTHON_VERSION is 3.7 or 3.8\n*For example for CUDA 11.2 and Python 3.8:*\n```bash\nconda create -n bsql python=3.8\nconda activate bsql\n./dependencies.sh 21.10 11.2 nightly\n```\n\n### Build\nThe build process will checkout the BlazingSQL repository and will build and install into the conda environment.\n\n```bash\ncd $CONDA_PREFIX\ngit clone https://github.com/BlazingDB/blazingsql.git\ncd blazingsql\nexport CUDACXX=/usr/local/cuda/bin/nvcc\n./build.sh\n```\nNOTE: You can do `./build.sh -h` to see more build options.\n\nNOTE: You can perform static analysis with cppcheck with the command `cppcheck  --project=compile_commands.json` in any of the cpp project build directories.\n\n$CONDA_PREFIX now has a folder for the blazingsql repository.\n\n#### Storage plugins\nTo build without the storage plugins (AWS S3, Google Cloud Storage) use the next arguments:\n```bash\n# Disable all storage plugins\n./build.sh disable-aws-s3 disable-google-gs\n\n# Disable AWS S3 storage plugin\n./build.sh disable-aws-s3\n\n# Disable Google Cloud Storage plugin\n./build.sh disable-google-gs\n```\nNOTE: By disabling the storage plugins you don't need to install previously AWS SDK C++ or Google Cloud Storage (neither any of its dependencies).\n\n#### SQL providers\nTo build without the SQL providers (MySQL, PostgreSQL, SQLite) use the next arguments:\n```bash\n# Disable all SQL providers\n./build.sh disable-mysql disable-sqlite disable-postgresql\n\n# Disable MySQL provider\n./build.sh disable-mysql\n\n...\n```\nNOTES:\n- By disabling the storage plugins you don't need to install mysql-connector-cpp=8.0.23 libpq=13 sqlite=3 (neither any of its dependencies).\n- Currenlty we support only MySQL. but PostgreSQL and SQLite will be ready for the next version!\n\n# Documentation\nUser guides and public APIs documentation can be found at [here](https://docs.blazingdb.com/docs)\n\nOur internal code architecture can be built using Spinx.\n```bash\nconda install -c conda-forge doxygen\ncd $CONDA_PREFIX\ncd blazingsql/docsrc\npip install -r requirements.txt\nmake doxygen\nmake html\n```\nThe generated documentation can be viewed in a browser at `blazingsql/docsrc/build/html/index.html`\n\n\n# Community\n## Contributing\nHave questions or feedback? Post a [new github issue](https://github.com/blazingdb/blazingsql/issues/new/choose).\n\nPlease see our [guide for contributing to BlazingSQL](CONTRIBUTING.md).\n\n## Contact\nFeel free to join our channel (#blazingsql) in the RAPIDS-GoAi Slack: [![join RAPIDS-GoAi workspace](https://badgen.net/badge/slack/RAPIDS-GoAi/purple?icon=slack)](https://join.slack.com/t/rapids-goai/shared_invite/enQtMjE0Njg5NDQ1MDQxLTJiN2FkNTFkYmQ2YjY1OGI4NTc5Y2NlODQ3ZDdiODEwYmRiNTFhMzNlNTU5ZWJhZjA3NTg4NDZkMThkNTkxMGQ).\n\nYou can also email us at [info@blazingsql.com](info@blazingsql.com) or find out more details on [BlazingSQL.com](https://blazingsql.com).\n\n## License\n[Apache License 2.0](LICENSE)\n\n## RAPIDS AI - Open GPU Data Science\n\nThe RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.\n\n## Apache Arrow on GPU\n\nThe GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported. \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblazingdb%2Fblazingsql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblazingdb%2Fblazingsql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblazingdb%2Fblazingsql/lists"}