{"id":21678793,"url":"https://github.com/lucasbotang/gradnorm","last_synced_at":"2025-03-16T09:11:47.509Z","repository":{"id":162696085,"uuid":"529110760","full_name":"LucasBoTang/GradNorm","owner":"LucasBoTang","description":"PyTorch implementation of the GradNorm","archived":false,"fork":false,"pushed_at":"2024-09-04T15:16:04.000Z","size":1901,"stargazers_count":83,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-03T05:32:40.370Z","etag":null,"topics":["deep-learning","gradnorm","multitask-learning","python","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LucasBoTang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-26T04:14:17.000Z","updated_at":"2025-02-08T02:34:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"1aaf355c-7dcf-4371-a89b-dc62567ca2dd","html_url":"https://github.com/LucasBoTang/GradNorm","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucasBoTang%2FGradNorm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucasBoTang%2FGradNorm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucasBoTang%2FGradNorm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucasBoTang%2FGradNorm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LucasBoTang","download_url":"https://codeload.github.com/LucasBoTang/GradNorm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243847061,"owners_count":20357317,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","gradnorm","multitask-learning","python","pytorch"],"created_at":"2024-11-25T14:41:29.797Z","updated_at":"2025-03-16T09:11:47.486Z","avatar_url":"https://github.com/LucasBoTang.png","language":"Jupyter Notebook","readme":"# PyTorch GradNorm\n\n\u003cp align=\"center\"\u003e\u003cimg width=\"90%\" src=\"images/gradnorm.png\" /\u003e\u003c/p\u003e\n\nThis is a PyTorch-based implementation of [GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks](http://proceedings.mlr.press/v80/chen18a.html), which is a gradient normalization algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes.\n\nThe toy example can be found at [**here**](https://github.com/LucasBoTang/GradNorm/blob/main/Test.ipynb).\n\n\n## Algorithm\n\n\u003cp align=\"center\"\u003e\u003cimg width=\"50%\" src=\"images/algo.png\" /\u003e\u003c/p\u003e\n\n\n## Dependencies\n\n- [PyTorch](https://pytorch.org/)\n- [NumPy](https://numpy.org/)\n\n\n## Usage\n\n### Parameters\n\n- net: a multitask network with task loss\n- layer: layers of the network layers where applying GradNorm on the weights\n- alpha: hyperparameter of restoring force\n- dataloader: training dataloader\n- num_epochs: number of epochs\n- lr1:  learning rate of multitask loss\n- lr2:  learning rate of weights\n- log:  flag of result log\n\n### Sample Code\n\n```python\nfrom gradnorm import gradNorm\nlog_weights, log_loss = gradNorm(net=mtlnet, layer=net.fc4, alpha=0.12, dataloader=dataloader,\n                                 num_epochs=100, lr1=1e-5, lr2=1e-4, log=False)\n```\n\n## Toy Example (from Original Paper)\n\n### Data\n\nConsider $T$ regression tasks trained using standard squared loss onto the functions:\n\n$$\nf_i (\\mathbf{x}) = \\sigma_i  \\tanh \\left( ( \\mathbf{B} + \\epsilon_i ) \\mathbf{x} \\right)\n$$\n\nInputs are dimension 250 and outputs dimension 100, while $\\mathbf{B}$ and $\\epsilon_i$ are constant matrices with their elements generated IID from $N(0; 10)$ and $N(0; 3.5)$, respectively. Each task, therefore, shares information in B but also contains task-specific information $\\epsilon_i$. The $\\sigma_i$ sets the scales of the outputs.\n\n```python\nfrom data import toyDataset\ndataset = toyDataset(num_data=10000, dim_features=250, dim_labels=100, scalars=[1,100])\n```\n\n### Model\n\nA 4-layer fully-connected ReLU-activated network with 100 neurons per layer as a common trunk is used to train our toy example. A final affine transformation layer gives *T* final predictions.\n\n\u003cp align=\"center\"\u003e\u003cimg width=\"50%\" src=\"images/model.png\" /\u003e\u003c/p\u003e\n\n```python\nfrom model import fcNet, mtlNet\nnet = fcNet(dim_features=250, dim_labels=100, n_tasks=2) # fc net with multiple heads\nmtlnet = mtlNet(net) # multitask net with task loss\n```\n\n### Result (10 Tasks)\n\n\u003cp align=\"center\"\u003e\u003cimg width=\"75%\" src=\"images/weight.png\" /\u003e\u003c/p\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucasbotang%2Fgradnorm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucasbotang%2Fgradnorm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucasbotang%2Fgradnorm/lists"}