{"id":18408671,"url":"https://github.com/h2oai/h2o4gpu","last_synced_at":"2025-05-14T23:06:28.960Z","repository":{"id":51316447,"uuid":"85352239","full_name":"h2oai/h2o4gpu","owner":"h2oai","description":"H2Oai GPU Edition","archived":false,"fork":false,"pushed_at":"2024-10-24T17:54:57.000Z","size":27892,"stargazers_count":467,"open_issues_count":155,"forks_count":94,"subscribers_count":130,"default_branch":"master","last_synced_at":"2025-05-13T19:26:22.921Z","etag":null,"topics":["c-plus-plus","cpu","cuda","elastic-net","glm","gpu","lasso","machine-learning","pca","python","r","rstats","svd"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/h2oai.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-17T20:31:42.000Z","updated_at":"2025-05-11T23:57:27.000Z","dependencies_parsed_at":"2024-01-22T01:09:27.969Z","dependency_job_id":"edfd7002-7732-49ec-b922-04d1828b5c32","html_url":"https://github.com/h2oai/h2o4gpu","commit_stats":{"total_commits":2152,"total_committers":32,"mean_commits":67.25,"dds":0.5845724907063197,"last_synced_commit":"a22da27a267e403bf537293b011788d846ab14bc"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2oai%2Fh2o4gpu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2oai%2Fh2o4gpu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2oai%2Fh2o4gpu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2oai%2Fh2o4gpu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/h2oai","download_url":"https://codeload.github.com/h2oai/h2o4gpu/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254243360,"owners_count":22038046,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","cpu","cuda","elastic-net","glm","gpu","lasso","machine-learning","pca","python","r","rstats","svd"],"created_at":"2024-11-06T03:20:24.796Z","updated_at":"2025-05-14T23:06:23.943Z","avatar_url":"https://github.com/h2oai.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# H2O4GPU\n\n[![Join the chat at https://gitter.im/h2oai/h2o4gpu](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/h2oai/h2o4gpu)\n\n**H2O4GPU** is a collection of GPU solvers by [H2Oai](https://www.h2o.ai/) with APIs in Python and R.  The Python API builds upon the easy-to-use [scikit-learn](http://scikit-learn.org) API and its well-tested CPU-based algorithms.  It can be used as a drop-in replacement for scikit-learn (i.e. `import h2o4gpu as sklearn`) with support for GPUs on selected (and ever-growing) algorithms.  H2O4GPU inherits all the existing scikit-learn algorithms and falls back to CPU algorithms when the GPU algorithm does not support an important existing scikit-learn class option.  The R package is a wrapper around the H2O4GPU Python package, and the interface follows standard R conventions for modeling.\n\n\nDaal library added for CPU, currently supported only x86_64 architecture.\n\n## Requirements\n\n* PC running Linux with glibc 2.17+\n\n* Install CUDA with bundled display drivers (\n  [CUDA 8](https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html)\n  or\n  [CUDA 9](https://docs.nvidia.com/cuda/archive/9.0/cuda-installation-guide-linux/index.html)\n  or\n  [CUDA 9.2](https://docs.nvidia.com/cuda/archive/9.2/cuda-installation-guide-linux/index.html))\n  or\n  [CUDA 10](https://docs.nvidia.com/cuda/archive/10.0/cuda-installation-guide-linux/index.html))\n\n* Python shared libraries (e.g. On Ubuntu:  sudo apt-get install libpython3.6-dev)\n\nWhen installing, choose to link the cuda install to /usr/local/cuda .\nEnsure to reboot after installing the new nvidia drivers.\n\n* Nvidia GPU with Compute Capability \u003e= 3.5 ([Capability Lookup](https://developer.nvidia.com/cuda-gpus)).\n\n* For advanced features, like handling rows/32 \u003e 2^16 (i.e., rows \u003e 2,097,152) in K-means, need Capability \u003e= 5.2\n\n* For building the R package, `libcurl4-openssl-dev`, `libssl-dev`, and `libxml2-dev` are needed.\n\n## User Installation\n\nNote: Installation steps mentioned below are for users planning to use H2O4GPU. See [DEVEL.md](DEVEL.md) for developer installation.\n\nH2O4GPU can be installed using either PIP or Conda\n\n\n### Prerequisites\nAdd to `~/.bashrc` or environment (set appropriate paths for your OS):\n\n```\nexport CUDA_HOME=/usr/local/cuda # or choose /usr/local/cuda9 for cuda9 and /usr/local/cuda8 for cuda8\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64/:$CUDA_HOME/lib/:$CUDA_HOME/extras/CUPTI/lib64\n```\n\n- Install OpenBlas dev environment:\n\n```\nsudo apt-get install libopenblas-dev pbzip2\n```\n\nIf you are building the h2o4gpu R package, it is necessary to install the following dependencies:\n\n```\nsudo apt-get -y install libcurl4-openssl-dev libssl-dev libxml2-dev\n```\n\n### PIP install \nDownload the Python wheel file (For Python 3.6):\n\n  * Stable:\n    * [CUDA10 - linux_x86_64](https://s3.amazonaws.com/h2o-release/h2o4gpu/releases/stable/ai/h2o/h2o4gpu/0.4-cuda10/rel-0.4.0/h2o4gpu-0.4.0-cp36-cp36m-linux_x86_64.whl)\n    * [CUDA10 - linux_ppc64le](https://s3.amazonaws.com/h2o-release/h2o4gpu/releases/stable/ai/h2o/h2o4gpu/0.4-cuda10/rel-0.4.0/h2o4gpu-0.4.0-cp36-cp36m-linux_ppc64le.whl)\n  * Bleeding edge (changes with every successful master branch build):\n    * [CUDA10.0 - linux_x86_64](https://s3.amazonaws.com/h2o-release/h2o4gpu/releases/bleeding-edge/ai/h2o/h2o4gpu/0.4-cuda10/h2o4gpu-0.4.1-cp36-cp36m-linux_x86_64.whl)\n    * [CUDA10.0 - linux_ppc64le](https://s3.amazonaws.com/h2o-release/h2o4gpu/releases/bleeding-edge/ai/h2o/h2o4gpu/0.4-cuda10/h2o4gpu-0.4.1-cp36-cp36m-linux_ppc64le.whl)\n\n Start a fresh pyenv or virtualenv session.\n\nInstall the Python wheel file. NOTE: If you don't use a fresh environment, this will\noverwrite your py3nvml and xgboost installations to use our validated\nversions.\n\n```\npip install h2o4gpu-0.3.0-cp36-cp36m-linux_x86_64.whl\n```\n\n### Conda installation\n\nEnsure you meet the Requirements and have installed the Prerequisites.\n\nIf not already done you need to [install conda package manager](https://conda.io/projects/conda/en/latest/user-guide/install/linux.html). Ensure you [test your conda installation](https://docs.conda.io/projects/conda/en/latest/user-guide/install/test-installation.html)\n\nH204GPU packages for CUDA8, CUDA 9 and CUDA 9.2 are available from [h2oai channel in anaconda cloud](https://anaconda.org/h2oai). \n\nCreate a new conda environment with H2O4GPU based on CUDA 9.2 and all its dependencies using the following command. For other cuda versions substitute the package name as needed. Note the requirement for h2oai and conda-forge channels. \n\n```bash\nconda create -n h2o4gpuenv -c h2oai -c conda-forge -c rapidsai h2o4gpu-cuda10\n```\n\nOnce the environment is created activate it `source activate h2o4gpuenv`. \n\nTo test, start an interactive python session in the environment and follow the steps in the Test Installation section below.\n\n### h2o4gpu R package\n\nAt this point, you should have installed the H2O4GPU Python package successfully. You can then go ahead and install the `h2o4gpu` R package via the following:\n\n```r\nif (!require(devtools)) install.packages(\"devtools\")\ndevtools::install_github(\"h2oai/h2o4gpu\", subdir = \"src/interface_r\")\n```\n\nDetailed instructions can be found [here](https://github.com/h2oai/h2o4gpu/tree/master/src/interface_r).\n\n\n\n## Test Installation\n\nTo test your installation of the Python package, the following code:\n\n```\nimport h2o4gpu\nimport numpy as np\n\nX = np.array([[1.,1.], [1.,4.], [1.,0.]])\nmodel = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)\nmodel.cluster_centers_\n```\nshould give input/output of:\n```\n\u003e\u003e\u003e import h2o4gpu\n\u003e\u003e\u003e import numpy as np\n\u003e\u003e\u003e\n\u003e\u003e\u003e X = np.array([[1.,1.], [1.,4.], [1.,0.]])\n\u003e\u003e\u003e model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)\n\u003e\u003e\u003e model.cluster_centers_\narray([[ 1.,  1.  ],\n       [ 1.,  4.  ]])\n```\n\nTo test your installation of the R package, try the following example that builds a simple [XGBoost](https://github.com/dmlc/xgboost) random forest classifier:\n\n``` r\nlibrary(h2o4gpu)\n\n# Setup dataset\nx \u003c- iris[1:4]\ny \u003c- as.integer(iris$Species) - 1\n\n# Initialize and train the classifier\nmodel \u003c- h2o4gpu.random_forest_classifier() %\u003e% fit(x, y)\n\n# Make predictions\npredictions \u003c- model %\u003e% predict(x)\n```\n\n## Next Steps\n\nFor more examples using Python API, please check out our [Jupyter notebook demos](https://github.com/h2oai/h2o4gpu/tree/master/examples/py/demos). To run the demos using a local wheel run, at least download `src/interface_py/requirements_runtime_demos.txt` from the Github repo and do:\n```\npip install -r src/interface_py/requirements_runtime_demos.txt\n```\nand then run the jupyter notebook demos.\n\nFor more examples using R API, please visit the [vignettes](https://github.com/h2oai/h2o4gpu/tree/master/src/interface_r/vignettes).\n\n## Running Jupyter Notebooks\n\nYou can run Jupyter Notebooks with H2O4GPU in the below two ways\n\n### Creating a Conda Environment\n\nEnsure you have a machine that meets the Requirements and Prerequisites mentioned above. \n\nNext follow Conda installation instructions mentioned above. Once you have activated the environment, you will need to downgrade tornado to version 4.5.3 [refer issue #680](https://github.com/h2oai/h2o4gpu/issues/680). Start Jupyter notebook, and navigate to the URL shown in the log output in your browser. \n\n```bash\nsource activate h2o4gpuenv\nconda install tornado==4.5.3\njupyter notebook --ip='*' --no-browser\n```\nStart a Python 3 kernel, and try the code in [example notebooks](https://github.com/h2oai/h2o4gpu/tree/master/examples/py/demos)\n\n### Using precompiled docker image\n\nRequirements:\n\n* Nvidia drivers compatible with CUDA version used (e.g. 384+ for CUDA9)\n* [docker-ce 17](https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/)\n* [nvidia-docker 1.0](https://github.com/NVIDIA/nvidia-docker/tree/1.0)\n\nDownload the Docker file (for linux_x86_64):\n\n  * Bleeding edge (changes with every successful master branch build):\n    * [CUDA10](https://s3.amazonaws.com/h2o-release/h2o4gpu/releases/bleeding-edge/ai/h2o/h2o4gpu/0.3-cuda10/h2o4gpu-0.3.2-cuda10-runtime.tar.bz2)\n    \nLoad and run docker file (e.g. for bleeding-edge of cuda92):\n```\njupyter notebook --generate-config\necho \"c.NotebookApp.allow_remote_access = False \u003e\u003e ~/.jupyter/jupyter_notebook_config.py # Choose True if want to allow remote access\npbzip2 -dc h2o4gpu-0.3.0.10000-cuda92-runtime.tar.bz2 | nvidia-docker load\nmkdir -p log ; nvidia-docker run --name localhost --rm -p 8888:8888 -u `id -u`:`id -g` -v `pwd`/log:/log -v /home/$USER/.jupyter:/jupyter --entrypoint=./run.sh opsh2oai/h2o4gpu-0.3.0.10000-cuda92-runtime \u0026\nfind log -name jupyter* -type f -printf '%T@ %p\\n' | sort -k1 -n | awk '{print $2}' | tail -1 | xargs cat | grep token | grep http | grep -v NotebookApp\n```\nCopy/paste the http link shown into your browser.  If the \"find\" command doesn't work, look for the latest jupyter.log file and look at contents for the http link and token.\n\nIf the link shows no token or shows ... for token, try a token of \"h2o\" (without quotes).  If running on your own host, the weblink will look like http://localhost:8888:token with token replaced by the actual token.\n\nThis container has a /demos directory which contains Jupyter notebooks and some data.\n\n## Plans\n\nThe vision is to develop fast GPU algorithms to complement the CPU\nalgorithms in scikit-learn while keeping full scikit-learn API\ncompatibility and scikit-learn CPU algorithm capability. The h2o4gpu\nPython module is to be used as a drop-in-replacement for scikit-learn\nthat has the full functionality of scikit-learn's CPU algorithms.\n\nFunctions and classes will be gradually overridden by GPU-enabled algorithms (unless\n`n_gpu=0` is set and we have no CPU algorithm except scikit-learn's).\nThe CPU algorithms and code initially will be sklearn, but gradually\nthose may be replaced by faster open-source codes like those in Intel\nDAAL.\n\nThis vision is currently accomplished by using the open-source\nscikit-learn and xgboost and overriding scikit-learn calls with our\nown GPU versions.  In cases when our GPU class is currently\nincapable of an important scikit-learn feature, we revert to the\nscikit-learn class.\n\nAs noted above, there is an R API in development, which will be\nreleased as a stand-alone R package.  All algorithms supported by\nH2O4GPU will be exposed in both Python and R in the future.\n\nAnother primary goal is to support all operations on the GPU via the\n[GOAI\ninitiative](https://devblogs.nvidia.com/parallelforall/goai-open-gpu-accelerated-data-analytics/).\nThis involves ensuring the GPU algorithms can take and return GPU\npointers to data instead of going back to the host.  In scikit-learn\nAPI language these are called fit\\_ptr, predict\\_ptr, transform\\_ptr,\netc., where ptr stands for memory pointer.\n\n\n## RoadMap\n### 2019 Q2:\n* A new processing engine that allows to scale beyond GPU memory limits\n* k-Nearest Neighbors\n* Matrix Factorization\n* Factorization Machines\n* API Support: GOAI API support\n* Data.table support\n\nMore precise information can be found in the [milestone's list](https://github.com/h2oai/h2o4gpu/milestones).\n\n## Solver Classes\n\nAmong others, the solver can be used for the following classes of problems\n\n  + GLM: Lasso, Ridge Regression, Logistic Regression, Elastic Net Regulariation\n  + KMeans\n  + Gradient Boosting Machine (GBM) via [XGBoost](https://devblogs.nvidia.com/parallelforall/gradient-boosting-decision-trees-xgboost-cuda/)\n  + Singular Value Decomposition(SVD) + Truncated Singular Value Decomposition\n  + Principal Components Analysis(PCA)\n\n## Benchmarks\n\nOur benchmarking plan is to clearly highlight when modeling benefits\nfrom the GPU (usually complex models) or does not (e.g. one-shot\nsimple models dominated by data transfer).\n\nWe have benchmarked h2o4gpu, scikit-learn, and h2o-3 on a variety of\nsolvers.  Some benchmarks have been performed for a few selected cases\nthat highlight the GPU capabilities (i.e. compute or on-GPU memory\noperations dominate data transfer to GPU from host):\n\n[Benchmarks for GLM, KMeans, and XGBoost for CPU vs. GPU.](https://github.com/h2oai/h2o4gpu/blob/master/presentations/benchmarks.pdf)\n\nA suite of benchmarks are computed when doing \"make testperf\" from a\nbuild directory. These take all of our tests and benchmarks h2o4gpu\nagainst h2o-3.  These will soon be presented as a live\ncommit-by-commit streaming plots on a website.\n\n\n## Contributing\n\nPlease refer to our [CONTRIBUTING.md](CONTRIBUTING.md) and\n[DEVEL.md](DEVEL.md) for instructions on how to build and test the\nproject and how to contribute.  The h2o4gpu\n[Gitter](https://gitter.im/h2oai/h2o4gpu) chatroom can be used for\ndiscussion related to open source development.\n\nGitHub [issues](https://github.com/h2oai/h2o4gpu/issues) are used for bugs, feature and enhancement discussion/tracking.\n\n\n\n## Questions\n\n* Please ask all code-related questions on [StackOverflow](https://stackoverflow.com/questions/tagged/h2o4gpu) using the \"h2o4gpu\" tag.  \n\n* Questions related to the roadmap can be directed to the developers on [Gitter](https://gitter.im/h2oai/h2o4gpu).\n\n* [Troubleshooting](https://github.com/h2oai/h2o4gpu/tree/master/TROUBLESHOOTING.md)\n\n* [FAQ](https://github.com/h2oai/h2o4gpu/tree/master/FAQ.md)\n\n\n## References\n\n1. [Parameter Selection and Pre-Conditioning for a Graph Form Solver -- C. Fougner and S. Boyd][pogs]\n2. [Block Splitting for Distributed Optimization -- N. Parikh and S. Boyd][block_splitting]\n3. [Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers -- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein][admm_distr_stats]\n4. [Proximal Algorithms -- N. Parikh and S. Boyd][prox_algs]\n\n\n[pogs]: http://stanford.edu/~boyd/papers/pogs.html \"Parameter Selection and Pre-Conditioning for a Graph Form Solver -- C. Fougner and S. Boyd\"\n\n[block_splitting]: http://www.stanford.edu/~boyd/papers/block_splitting.html \"Block Splitting for Distributed Optimization -- N. Parikh and S. Boyd\"\n\n[admm_distr_stats]: http://www.stanford.edu/~boyd/papers/block_splitting.html \"Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers -- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein\"\n\n[prox_algs]: http://www.stanford.edu/~boyd/papers/prox_algs.html \"Proximal Algorithms -- N. Parikh and S. Boyd\"\n\n## Copyright\n\n```\nCopyright (c) 2017, H2O.ai, Inc., Mountain View, CA\nApache License Version 2.0 (see LICENSE file)\n\n\nThis software is based on original work under BSD-3 license by:\n\nCopyright (c) 2015, Christopher Fougner, Stephen Boyd, Stanford University\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without\nmodification, are permitted provided that the following conditions are met:\n    * Redistributions of source code must retain the above copyright\n      notice, this list of conditions and the following disclaimer.\n    * Redistributions in binary form must reproduce the above copyright\n      notice, this list of conditions and the following disclaimer in the\n      documentation and/or other materials provided with the distribution.\n    * Neither the name of the \u003corganization\u003e nor the\n      names of its contributors may be used to endorse or promote products\n      derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND\nANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED\nWARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\nDISCLAIMED. IN NO EVENT SHALL \u003cCOPYRIGHT HOLDER\u003e BE LIABLE FOR ANY\nDIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES\n(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;\nLOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND\nON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS\nSOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh2oai%2Fh2o4gpu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fh2oai%2Fh2o4gpu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh2oai%2Fh2o4gpu/lists"}