{"id":15631626,"url":"https://github.com/ogrisel/pygbm","last_synced_at":"2025-04-09T15:08:05.994Z","repository":{"id":48184295,"uuid":"143539809","full_name":"ogrisel/pygbm","owner":"ogrisel","description":"Experimental Gradient Boosting Machines in Python with numba.","archived":false,"fork":false,"pushed_at":"2018-12-26T18:09:49.000Z","size":256,"stargazers_count":183,"open_issues_count":26,"forks_count":32,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-04-09T15:08:01.707Z","etag":null,"topics":["numba","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ogrisel.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-08-04T15:17:24.000Z","updated_at":"2025-02-23T00:23:11.000Z","dependencies_parsed_at":"2022-08-29T23:50:28.480Z","dependency_job_id":null,"html_url":"https://github.com/ogrisel/pygbm","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ogrisel%2Fpygbm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ogrisel%2Fpygbm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ogrisel%2Fpygbm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ogrisel%2Fpygbm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ogrisel","download_url":"https://codeload.github.com/ogrisel/pygbm/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248055284,"owners_count":21040157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["numba","python"],"created_at":"2024-10-03T10:41:02.554Z","updated_at":"2025-04-09T15:08:05.967Z","avatar_url":"https://github.com/ogrisel.png","language":"Python","funding_links":[],"categories":["Implementations"],"sub_categories":["Other Frameworks"],"readme":"# pygbm [![Build Status](https://travis-ci.org/ogrisel/pygbm.svg?branch=master)](https://travis-ci.org/ogrisel/pygbm) [![codecov](https://codecov.io/gh/ogrisel/pygbm/branch/master/graph/badge.svg)](https://codecov.io/gh/ogrisel/pygbm) [![python versions](https://img.shields.io/badge/python-3.6+-blue.svg)](https://github.com/ogrisel/pygbm)\n\n\n\nExperimental Gradient Boosting Machines in Python.\n\nThe goal of this project is to evaluate whether it's possible to implement a\npure Python yet efficient version histogram-binning of Gradient Boosting\nTrees (possibly with all the LightGBM optimizations) while staying in pure\nPython 3.6+ using the [numba](http://numba.pydata.org/) jit compiler.\n\npygbm provides a set of scikit-learn compatible estimator classes that\nshould play well with the scikit-learn `Pipeline` and model selection tools\n(grid search and randomized hyperparameter search).\n\nLonger term plans include integration with dask and dask-ml for\nout-of-core and distributed fitting on a cluster.\n\n## Installation\n\nThe project is available on PyPI and can be installed with `pip`:\n\n    pip install pygbm\n\nYou'll need Python 3.6 at least.\n\n## Documentation\n\nThe API documentation is available at:\n\nhttps://pygbm.readthedocs.io/\n\nYou might also want to have a look at the `examples/` folder of this repo.\n\n## Status\n\nThe project is experimental. The API is subject to change without deprecation notice. Use at your own risk.\n\nWe welcome any feedback in the github issue tracker:\n\nhttps://github.com/ogrisel/pygbm/issues\n\n## Running the development version\n\nUse pip to install in \"editable\" mode:\n\n    git clone https://github.com/ogrisel/pygbm.git\n    cd pygbm\n    pip install -r requirements.txt\n    pip install --editable .\n\nRun the tests with pytest:\n\n    pip install -r requirements.txt\n    pytest\n\n## Benchmarking\n\nThe `benchmarks` folder contains some scripts to evaluate the computation\nperformance of various parts of pygbm. Keep in mind that numba's JIT\ncompilation [takes\ntime](http://numba.pydata.org/numba-doc/latest/user/5minguide.html#how-to-measure-the-performance-of-numba)!\n\n### Profiling\n\nTo profile the benchmarks, you can use\n[snakeviz](https://jiffyclub.github.io/snakeviz/) to get an interactive\nHTML report:\n\n    pip install snakeviz\n    python -m cProfile -o bench_higgs_boson.prof benchmarks/bench_higgs_boson.py\n    snakeviz bench_higgs_boson.prof\n\n### Debugging numba type inference\n\nTo introspect the results of type inference steps in the numba sections\ncalled by a given benchmarking script:\n\n    numba --annotate-html bench_higgs_boson.html benchmarks/bench_higgs_boson.py\n\nIn particular it is interesting to check that the numerical variables in\nthe hot loops highlighted by the snakeviz profiling report have the\nexpected precision level (e.g. `float32` for loss computation, `uint8`\nfor binned feature values, ...).\n\n### Impact of thread-based parallelism\n\nSome benchmarks can call numba functions that leverage the built-in\nthread-based parallelism with `@njit(parallel=True)` and `prange` loops.\nOn a multicore machine you can evaluate how the thread-based parallelism\nscales by explicitly setting the `NUMBA_NUM_THREAD` environment\nvariable. For instance try:\n\n    NUMBA_NUM_THREADS=1 python benchmarks/bench_binning.py\n\nvs:\n\n    NUMBA_NUM_THREADS=4 python benchmarks/bench_binning.py\n\n\n## Acknowledgements\n\nThe work from Nicolas Hug is supported by the National Science Foundation\nunder Grant No. 1740305 and by DARPA under Grant No. DARPA-BAA-16-51\n\nThe work from Olivier Grisel is supported by the [scikit-learn initiative\nand its partners at Inria Fondation](https://scikit-learn.fondation-inria.fr/en/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fogrisel%2Fpygbm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fogrisel%2Fpygbm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fogrisel%2Fpygbm/lists"}