{"id":26246002,"url":"https://github.com/blei-lab/treeffuser","last_synced_at":"2025-04-23T20:26:02.872Z","repository":{"id":243926097,"uuid":"754311680","full_name":"blei-lab/treeffuser","owner":"blei-lab","description":"Treeffuser is an easy-to-use package for probabilistic prediction and probabilistic regression on tabular data with tree-based diffusion models.","archived":false,"fork":false,"pushed_at":"2025-02-20T16:42:08.000Z","size":84169,"stargazers_count":42,"open_issues_count":4,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-16T20:43:20.643Z","etag":null,"topics":["diffusion-models","diffusions","flexible-prediction","gradient-boosting","heteroscedasticity","lightgbm","prediction","probabilistic-models","probabilistic-prediction","tabular-data","trees"],"latest_commit_sha":null,"homepage":"https://blei-lab.github.io/treeffuser/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blei-lab.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-07T19:59:47.000Z","updated_at":"2025-04-09T01:05:48.000Z","dependencies_parsed_at":"2024-11-04T05:29:34.314Z","dependency_job_id":null,"html_url":"https://github.com/blei-lab/treeffuser","commit_stats":null,"previous_names":["blei-lab/treeffuser"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blei-lab%2Ftreeffuser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blei-lab%2Ftreeffuser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blei-lab%2Ftreeffuser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blei-lab%2Ftreeffuser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blei-lab","download_url":"https://codeload.github.com/blei-lab/treeffuser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250507841,"owners_count":21442108,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","diffusions","flexible-prediction","gradient-boosting","heteroscedasticity","lightgbm","prediction","probabilistic-models","probabilistic-prediction","tabular-data","trees"],"created_at":"2025-03-13T13:17:07.759Z","updated_at":"2025-04-23T20:26:02.852Z","avatar_url":"https://github.com/blei-lab.png","language":"Jupyter Notebook","readme":"====================\nTreeffuser\n====================\n\n.. raw:: html\n\n        \u003c!-- Version and License --\u003e\n        \u003ca href=\"https://badge.fury.io/py/treeffuser\" target=\"_blank\" rel=\"noopener noreferrer\"\u003e\n            \u003cimg src=\"https://badge.fury.io/py/treeffuser.svg\" alt=\"PyPI version\"\u003e\u003c/a\u003e\n\n        \u003ca href=\"https://opensource.org/licenses/MIT\" target=\"_blank\" rel=\"noopener noreferrer\"\u003e\n            \u003cimg src=\"https://img.shields.io/badge/License-MIT-green.svg\" alt=\"License\"\u003e\u003c/a\u003e\n\n        \u003c!-- Usage and Popularity --\u003e\n        \u003ca href=\"https://github.com/blei-lab/treeffuser/stargazers\" target=\"_blank\" rel=\"noopener noreferrer\"\u003e\n            \u003cimg src=\"https://img.shields.io/github/stars/blei-lab/treeffuser?style=flat\u0026logo=GitHub\" alt=\"GitHub repo stars\"\u003e\u003c/a\u003e\n\n        \u003ca href=\"https://badge.fury.io/py/treeffuser\" target=\"_blank\" rel=\"noopener noreferrer\"\u003e\n            \u003cimg src=\"https://img.shields.io/pypi/dm/treeffuser\" alt=\"PyPI - Downloads\"\u003e\u003c/a\u003e\n\n        \u003c!-- Homepage and Documentation --\u003e\n        \u003ca href=\"https://blei-lab.github.io/treeffuser/\" target=\"_blank\" rel=\"noopener noreferrer\"\u003e\n            \u003cimg src=\"https://img.shields.io/badge/website-visit-blue?label=website\" alt=\"Website\"\u003e\u003c/a\u003e\n\n        \u003ca href=\"https://blei-lab.github.io/treeffuser/docs/getting-started.html\" target=\"_blank\" rel=\"noopener noreferrer\"\u003e\n            \u003cimg src=\"https://img.shields.io/badge/docs-passing-green\" alt=\"Documentation\"\u003e\u003c/a\u003e\n\n        \u003c!-- Other Relevant Links --\u003e\n        \u003ca href=\"https://arxiv.org/abs/2406.07658\" target=\"_blank\" rel=\"noopener noreferrer\"\u003e\n            \u003cimg src=\"https://img.shields.io/badge/arXiv-2406.07658-red\" alt=\"arXiv\"\u003e\u003c/a\u003e\n       \u003cbr/\u003e\n       \u003cbr/\u003e\n\nTreeffuser is an easy-to-use package for **probabilistic prediction on tabular data with tree-based diffusion models**.\nIt estimates distributions of the form ``p(y|x)`` where ``x`` is a feature vector and ``y`` is a target vector.\nTreeffuser can model conditional distributions ``p(y|x)`` that are arbitrarily complex (e.g., multimodal, heteroscedastic, non-gaussian, heavy-tailed, etc.).\n\nIt is designed to adhere closely to the scikit-learn API and require minimal user tuning.\n\n.. raw:: html\n\n    \u003cdiv align=\"center\"\u003e\n        \u003cb\u003e\u003ca href=\"https://blei-lab.github.io/treeffuser/\"\u003eWebsite\u003c/a\u003e\u003c/b\u003e |\n        \u003cb\u003e\u003ca href=\"https://github.com/blei-lab/treeffuser/\"\u003eGitHub\u003c/a\u003e\u003c/b\u003e |\n        \u003cb\u003e\u003ca href=\"https://blei-lab.github.io/treeffuser/docs/getting-started.html\"\u003eDocumentation\u003c/a\u003e\u003c/b\u003e |\n      \u003cb\u003e\u003ca href=\"https://arxiv.org/abs/2406.07658\"\u003ePaper (NeurIPS 2024)\u003c/a\u003e\u003c/b\u003e\n    \u003c/div\u003e\n    \u003cbr\u003e\n\n\nInstallation\n============\n\nYou can install Treeffuser via pip from PyPI with the following command:\n\n.. code-block:: bash\n\n    pip install treeffuser\n\nYou can also install the development version with:\n\n.. code-block:: bash\n\n    pip install git+https://github.com/blei-lab/treeffuser.git@main\n\nThe GitHub repository is located at `https://github.com/blei-lab/treeffuser \u003chttps://github.com/blei-lab/treeffuser\u003e`_.\n\nUsage Example\n=============\n\nHere's a simple example demonstrating how to use Treeffuser.\n\nWe generate an heteroscedastic response with two sinusoidal components and heavy tails.\n\n.. code-block:: python\n\n    import matplotlib.pyplot as plt\n    import numpy as np\n    from treeffuser import Treeffuser, Samples\n\n    # Generate data\n    seed = 0\n    rng = np.random.default_rng(seed=seed)\n    n = 5000\n    x = rng.uniform(0, 2 * np.pi, size=n)\n    z = rng.integers(0, 2, size=n)\n    y = z * np.sin(x - np.pi / 2) + (1 - z) * np.cos(x) + rng.laplace(scale=x / 30, size=n)\n\nWe fit Treeffuser and generate samples. We then plot the samples against the raw data.\n\n.. code-block:: python\n\n    # Fit the model\n    model = Treeffuser(seed=seed)\n    model.fit(x, y)\n\n    # Generate and plot samples\n    y_samples = model.sample(x, n_samples=1, seed=seed, verbose=True)\n    plt.scatter(x, y, s=1, label=\"observed data\")\n    plt.scatter(x, y_samples[0, :], s=1, alpha=0.7, label=\"Treeffuser samples\")\n\n.. image:: README_example.png\n   :alt: Treeffuser on heteroscedastic data with sinuisodal response and heavy tails.\n   :align: center\n\nTreeffuser accurately learns the target conditional densities and can generate samples from them.\n\nThese samples can be used to compute any downstream estimates of interest.\n\n.. code-block:: python\n\n    y_samples = model.sample(x, n_samples=100, verbose=True) # y_samples.shape[0] is 100\n\n    # Estimate downstream quantities of interest\n    y_mean = y_samples.mean(axis=0) # conditional mean for each x\n    y_std = y_samples.std(axis=0) # conditional std for each x\n\nFor convenience, we also provide a class ``Samples`` that can estimate standard quantities.\n\n.. code-block:: python\n\n    y_samples = Samples(y_samples)\n    y_mean = y_samples.sample_mean() # same as before\n    y_std = y_samples.sample_std() # same as before\n    y_quantiles = y_samples.sample_quantile(q=[0.05, 0.95]) # conditional quantiles for each x\n\nPlease take a look at the documentation for more information on the available methods and parameters.\n\nCiting Treeffuser\n=================\n\nIf you use Treeffuser in your work, please cite the following paper:\n\n.. code-block:: bibtex\n\n    @article{beltranvelez2024treeffuser,\n      title={Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees},\n      author={Nicolas Beltran-Velez and Alessandro Antonio Grande and Achille Nazaret and Alp Kucukelbir and David Blei},\n      year={2024},\n      eprint={2406.07658},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2406.07658},\n   }\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblei-lab%2Ftreeffuser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblei-lab%2Ftreeffuser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblei-lab%2Ftreeffuser/lists"}