{"id":16843086,"url":"https://github.com/zanussbaum/mup-tf","last_synced_at":"2026-05-18T22:08:29.312Z","repository":{"id":61836131,"uuid":"555598334","full_name":"zanussbaum/mup-tf","owner":"zanussbaum","description":"Maximal Update Parameterization in Tensorflow","archived":false,"fork":false,"pushed_at":"2023-02-07T22:43:22.000Z","size":2297,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-26T06:49:36.639Z","etag":null,"topics":["deep-learning","hyperparameter-optimization","machine-learning","mup","python","tensorflow","transformers"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2203.03466","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zanussbaum.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-21T22:50:58.000Z","updated_at":"2024-12-30T22:27:26.000Z","dependencies_parsed_at":"2024-11-24T10:52:14.439Z","dependency_job_id":null,"html_url":"https://github.com/zanussbaum/mup-tf","commit_stats":{"total_commits":11,"total_committers":1,"mean_commits":11.0,"dds":0.0,"last_synced_commit":"ba7223a5af8af193b5198db9f4800ddd636018db"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zanussbaum%2Fmup-tf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zanussbaum%2Fmup-tf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zanussbaum%2Fmup-tf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zanussbaum%2Fmup-tf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zanussbaum","download_url":"https://codeload.github.com/zanussbaum/mup-tf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244166640,"owners_count":20409177,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","hyperparameter-optimization","machine-learning","mup","python","tensorflow","transformers"],"created_at":"2024-10-13T12:49:34.617Z","updated_at":"2026-05-18T22:08:29.266Z","avatar_url":"https://github.com/zanussbaum.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# μP for Tensorflow\n\nThis is a Tensorflow 2 (very preliminary) port of Yang and Hu et al.'s [μP repo](https://github.com/microsoft/mup)\n\n## Installation\n\nTo install, you can either clone the repo and install the package locally, or install it from pyPI.\n\n```bash\npip install mup-tf\n```\n\n## Install from Source\n\n```bash\ngit clone https://github.com/zanussbaum/mup-tf.git\npip install -e .\n```\n\n## Basic Usage\n\nThis has been adapted from the original MuP repo.\n\n```python\nimport tensorflow as tf\nfrom mup_tf import MuReadout, make_base_shapes, set_base_shapes, MuSGD, MuAdam\n\nclass MyModel(tf.keras.Model):\n    def __init__(self, width, ...):\n        ...\n        ### In model definition, replace output layer with MuReadout\n        # readout = tf.keras.layers.Dense(d_out)\n        readout = MuReadout(d_out)\n        ### If tying weights with an input Embedding layer, do\n        # readout = MuSharedReadout(input_layer.weight)\n        ...\n    def call(self, ...):\n        ...\n        ### If using a transformer, make sure to use\n        ###   1/d instead of 1/sqrt(d) attention scaling\n        # attention_scores = query @ key.T / d**0.5\n        attention_scores = query @ key.T * 8 / d\n        ### We use 8/d instead of 1/d here to be backward compatible\n        ###   with 1/d**0.5 when d=64, a common head dimension.\n        ...\n\n### Instantiate a base model\nbase_model = MyModel(width=1)\n### Instantiate a \"delta\" model that differs from the base model\n###   in all dimensions (\"widths\") that one wishes to scale.\n### Here it's simple, but e.g., in a Transformer, you may want to scale\n###   both nhead and dhead, so the delta model should differ in both.\ndelta_model = MyModel(width=2) \n\n### Instantiate the target model (the model you actually want to train).\n### This should be the same as the base model except \n###   the widths could be potentially different.\n### In particular, base_model and model should have the same depth.\nmodel = MyModel(width=100)\n\n### Set base shapes\n### When `model` has same parameter shapes as `base_model`,\n###   `model` behaves exactly the same as `base_model`\n###   (which is in Tensorflow's default parametrization).\n###   This provides backward compatibility at this particular model size.\n###   Otherwise, `model`'s init and LR are scaled by μP.\n### IMPORTANT: this should be called as soon as possible,\n###   before re-initialization and optimizer definition.\ninfshapes = set_base_shapes(model, base_model, delta=delta_model)\n\n### Alternatively, one can save the base model shapes in a file\n# make_base_shapes(base_model, delta_model, filename)\n### and later set base shapes directly from the filename\n# set_base_shapes(model, filename)\n### This is useful when one cannot fit both \n###   base_model and model in memory at the same time\n\n### Replace your custom init, if any\nfor param in model.parameters():\n    ### If initializing manually with fixed std or bounds,\n    ### then replace with same function from mup.init\n    # torch.nn.init.uniform_(param, -0.1, 0.1)\n    mup.init.uniform_(param, -0.1, 0.1)\n    ### Likewise, if using\n    ###   `xavier_uniform_, xavier_normal_, kaiming_uniform_, kaiming_normal_`\n    ### from `torch.nn.init`, replace with the same functions from `mup.init`\n\n### Use the optimizers from `mup.optim` instead of `tf.keras.optimizers`\n# optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)\nopt_kwargs = {\"infshapes\": name2shapes}\n# need to pass in infshapes to optimizer if you are using tf.distribute.MirrorStrategy\n# as tensors are reset and the `infshape` attribute is lost\noptimizer = MuSGD(learning_rate=0.1 **opt_kwargs)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzanussbaum%2Fmup-tf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzanussbaum%2Fmup-tf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzanussbaum%2Fmup-tf/lists"}