{"id":13752333,"url":"https://github.com/YyzHarry/imbalanced-regression","last_synced_at":"2025-05-09T19:32:07.286Z","repository":{"id":37702238,"uuid":"340089798","full_name":"YyzHarry/imbalanced-regression","owner":"YyzHarry","description":"[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression","archived":false,"fork":false,"pushed_at":"2022-03-22T15:06:27.000Z","size":9658,"stargazers_count":798,"open_issues_count":3,"forks_count":128,"subscribers_count":19,"default_branch":"main","last_synced_at":"2024-08-03T09:03:44.511Z","etag":null,"topics":["computer-vision","healthcare","icml","icml-2021","imbalance","imbalanced-classification","imbalanced-data","imbalanced-learning","imbalanced-regression","long-tail","natural-language-processing","regression"],"latest_commit_sha":null,"homepage":"http://dir.csail.mit.edu","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/YyzHarry.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-02-18T15:17:47.000Z","updated_at":"2024-07-30T08:55:22.000Z","dependencies_parsed_at":"2022-08-09T22:40:11.871Z","dependency_job_id":null,"html_url":"https://github.com/YyzHarry/imbalanced-regression","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YyzHarry%2Fimbalanced-regression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YyzHarry%2Fimbalanced-regression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YyzHarry%2Fimbalanced-regression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YyzHarry%2Fimbalanced-regression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/YyzHarry","download_url":"https://codeload.github.com/YyzHarry/imbalanced-regression/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224880777,"owners_count":17385367,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","healthcare","icml","icml-2021","imbalance","imbalanced-classification","imbalanced-data","imbalanced-learning","imbalanced-regression","long-tail","natural-language-processing","regression"],"created_at":"2024-08-03T09:01:03.935Z","updated_at":"2024-11-16T05:30:28.041Z","avatar_url":"https://github.com/YyzHarry.png","language":"Python","funding_links":[],"categories":["其他_机器学习与深度学习"],"sub_categories":[],"readme":"# Delving into Deep Imbalanced Regression\n\nThis repository contains the implementation code for paper: \u003cbr\u003e\n__Delving into Deep Imbalanced Regression__ \u003cbr\u003e\n[Yuzhe Yang](http://www.mit.edu/~yuzhe/), [Kaiwen Zha](https://kaiwenzha.github.io/), [Ying-Cong Chen](https://yingcong.github.io/), [Hao Wang](http://www.wanghao.in/), [Dina Katabi](https://people.csail.mit.edu/dina/) \u003cbr\u003e\n_38th International Conference on Machine Learning (ICML 2021), **Long Oral**_ \u003cbr\u003e\n[[Project Page](http://dir.csail.mit.edu/)] [[Paper](https://arxiv.org/abs/2102.09554)] [[Video](https://youtu.be/grJGixofQRU)] [[Blog Post](https://towardsdatascience.com/strategies-and-tactics-for-regression-on-imbalanced-data-61eeb0921fca)] [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YyzHarry/imbalanced-regression/blob/master/tutorial/tutorial.ipynb)\n___\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"teaser/overview.gif\" width=\"500\"\u003e \u003cbr\u003e\n\u003cb\u003eDeep Imbalanced Regression (DIR)\u003c/b\u003e aims to learn from imbalanced data with continuous targets, \u003cbr\u003e tackle potential missing data for certain regions, and generalize to the entire target range.\n\u003c/p\u003e\n\n\n## Beyond Imbalanced Classification: Brief Introduction for DIR\nExisting techniques for learning from imbalanced data focus on targets with __categorical__ indices, i.e., the targets are different classes. However, many real-world tasks involve __continuous__ and even infinite target values. We systematically investigate _Deep Imbalanced Regression (DIR)_, which aims to learn continuous targets from natural imbalanced data, deal with potential missing data for certain target values, and generalize to the entire target range.\n\nWe curate and benchmark large-scale DIR datasets for common real-world tasks in _computer vision_, _natural language processing_, and _healthcare_ domains, ranging from single-value prediction such as age, text similarity score, health condition score, to dense-value prediction such as depth.\n\n\n## Usage\nWe separate the codebase for different datasets into different subfolders. Please go into the subfolders for more information (e.g., installation, dataset preparation, training, evaluation \u0026 models).\n\n#### __[IMDB-WIKI-DIR](https://github.com/YyzHarry/imbalanced-regression/tree/main/imdb-wiki-dir)__ \u0026nbsp;|\u0026nbsp; __[AgeDB-DIR](https://github.com/YyzHarry/imbalanced-regression/tree/main/agedb-dir)__ \u0026nbsp;|\u0026nbsp; __[NYUD2-DIR](https://github.com/YyzHarry/imbalanced-regression/tree/main/nyud2-dir)__ \u0026nbsp;|\u0026nbsp; __[STS-B-DIR](https://github.com/YyzHarry/imbalanced-regression/tree/main/sts-b-dir)__\n\n\n## Highlights\n__(1) :heavy_check_mark: New Task:__ Deep Imbalanced Regression (DIR)\n\n__(2) :heavy_check_mark: New Techniques:__\n\n| ![image](teaser/lds.gif) | ![image](teaser/fds.gif) |\n| :-: | :-: |\n| Label distribution smoothing (LDS) | Feature distribution smoothing (FDS) |\n\n__(3) :heavy_check_mark: New Benchmarks:__ \u003cbr\u003e\n- _Computer Vision:_ :bulb: IMDB-WIKI-DIR (age) / AgeDB-DIR (age) / NYUD2-DIR (depth)\n- _Natural Language Processing:_ :clipboard: STS-B-DIR (text similarity score)\n- _Healthcare:_ :hospital: SHHS-DIR (health condition score)\n\n| [IMDB-WIKI-DIR](https://github.com/YyzHarry/imbalanced-regression/tree/main/imdb-wiki-dir) | [AgeDB-DIR](https://github.com/YyzHarry/imbalanced-regression/tree/main/agedb-dir) | [NYUD2-DIR](https://github.com/YyzHarry/imbalanced-regression/tree/main/nyud2-dir) | [STS-B-DIR](https://github.com/YyzHarry/imbalanced-regression/tree/main/sts-b-dir) | SHHS-DIR |\n| :-: | :-: | :-: | :-: | :-: |\n| ![image](teaser/imdb_wiki_dir.png) | ![image](teaser/agedb_dir.png) | ![image](teaser/nyud2_dir.png) | ![image](teaser/stsb_dir.png) | ![image](teaser/shhs_dir.png) |\n\n\n## Apply LDS and FDS on Other Datasets / Models\nWe provide examples of how to apply LDS and FDS on other customized datasets and/or models.\n\n### LDS\nTo apply LDS on your customized dataset, you will first need to estimate the effective label distribution: \n```python\nfrom collections import Counter\nfrom scipy.ndimage import convolve1d\nfrom utils import get_lds_kernel_window\n\n# preds, labels: [Ns,], \"Ns\" is the number of total samples\npreds, labels = ..., ...\n# assign each label to its corresponding bin (start from 0)\n# with your defined get_bin_idx(), return bin_index_per_label: [Ns,] \nbin_index_per_label = [get_bin_idx(label) for label in labels]\n\n# calculate empirical (original) label distribution: [Nb,]\n# \"Nb\" is the number of bins\nNb = max(bin_index_per_label) + 1\nnum_samples_of_bins = dict(Counter(bin_index_per_label))\nemp_label_dist = [num_samples_of_bins.get(i, 0) for i in range(Nb)]\n\n# lds_kernel_window: [ks,], here for example, we use gaussian, ks=5, sigma=2\nlds_kernel_window = get_lds_kernel_window(kernel='gaussian', ks=5, sigma=2)\n# calculate effective label distribution: [Nb,]\neff_label_dist = convolve1d(np.array(emp_label_dist), weights=lds_kernel_window, mode='constant')\n```\nWith the estimated effective label distribution, one straightforward option is to use the loss re-weighting scheme:\n```python\nfrom loss import weighted_mse_loss\n\n# Use re-weighting based on effective label distribution, sample-wise weights: [Ns,]\neff_num_per_label = [eff_label_dist[bin_idx] for bin_idx in bin_index_per_label]\nweights = [np.float32(1 / x) for x in eff_num_per_label]\n\n# calculate loss\nloss = weighted_mse_loss(preds, labels, weights=weights)\n```\n\n### FDS\nTo apply FDS on your customized data/model, you will first need to define the FDS module in your network:\n```python\nfrom fds import FDS\n\nconfig = dict(feature_dim=..., start_update=0, start_smooth=1, kernel='gaussian', ks=5, sigma=2)\n\ndef Network(nn.Module):\n    def __init__(self, **config):\n        super().__init__()\n        self.feature_extractor = ...\n        self.regressor = nn.Linear(config['feature_dim'], 1)  # FDS operates before the final regressor\n        self.FDS = FDS(**config)\n\n    def forward(self, inputs, labels, epoch):\n        features = self.feature_extractor(inputs)  # features: [batch_size, feature_dim]\n        # smooth the feature distributions over the target space\n        smoothed_features = features    \n        if self.training and epoch \u003e= config['start_smooth']:\n            smoothed_features = self.FDS.smooth(smoothed_features, labels, epoch)\n        preds = self.regressor(smoothed_features)\n        \n        return {'preds': preds, 'features': features}\n```\nDuring training, you will need to update the FDS statistics after each training epoch:\n```python\nmodel = Network(**config)\n\nfor epoch in range(num_epochs):\n    for (inputs, labels) in train_loader:\n        # standard training pipeline\n        ...\n\n    # update FDS statistics after each training epoch\n    if epoch \u003e= config['start_update']:\n        # collect features and labels for all training samples\n        ...\n        # training_features: [num_samples, feature_dim], training_labels: [num_samples,]\n        training_features, training_labels = ..., ...\n        model.FDS.update_last_epoch_stats(epoch)\n        model.FDS.update_running_stats(training_features, training_labels, epoch)\n```\n\n\n## Updates\n- [06/2021] We provide a [hands-on tutorial](https://github.com/YyzHarry/imbalanced-regression/tree/main/tutorial) of DIR. Check it out! [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YyzHarry/imbalanced-regression/blob/master/tutorial/tutorial.ipynb)\n- [05/2021] We create a [Blog post](https://towardsdatascience.com/strategies-and-tactics-for-regression-on-imbalanced-data-61eeb0921fca) for this work (version in Chinese is also available [here](https://zhuanlan.zhihu.com/p/369627086)). Check it out for more details!\n- [05/2021] Paper accepted to ICML 2021 as a __Long Talk__. We have released the code and models. You can find all reproduced checkpoints via [this link](https://drive.google.com/drive/folders/1UfFJNIG-LPOMecwi1tfYzEViBiAYhNU0?usp=sharing), or go into each subfolder for models for each dataset.\n- [02/2021] [arXiv version](https://arxiv.org/abs/2102.09554) posted. Please stay tuned for updates.\n\n\n## Citation\nIf you find this code or idea useful, please cite our work:\n```bib\n@inproceedings{yang2021delving,\n  title={Delving into Deep Imbalanced Regression},\n  author={Yang, Yuzhe and Zha, Kaiwen and Chen, Ying-Cong and Wang, Hao and Katabi, Dina},\n  booktitle={International Conference on Machine Learning (ICML)},\n  year={2021}\n}\n```\n\n\n## Contact\nIf you have any questions, feel free to contact us through email (yuzhe@mit.edu \u0026 kzha@mit.edu) or Github issues. Enjoy!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYyzHarry%2Fimbalanced-regression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FYyzHarry%2Fimbalanced-regression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FYyzHarry%2Fimbalanced-regression/lists"}