{"id":13614859,"url":"https://github.com/andreamad8/Universal-Transformer-Pytorch","last_synced_at":"2025-04-13T20:32:24.875Z","repository":{"id":63727588,"uuid":"154266928","full_name":"andreamad8/Universal-Transformer-Pytorch","owner":"andreamad8","description":"Implementation of Universal Transformer in Pytorch","archived":false,"fork":false,"pushed_at":"2018-11-19T14:32:16.000Z","size":1527,"stargazers_count":254,"open_issues_count":10,"forks_count":31,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-08-02T20:46:34.023Z","etag":null,"topics":["pytorch","universal-transformer"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andreamad8.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-23T05:20:51.000Z","updated_at":"2024-07-01T13:36:59.000Z","dependencies_parsed_at":"2022-11-24T20:04:46.472Z","dependency_job_id":null,"html_url":"https://github.com/andreamad8/Universal-Transformer-Pytorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreamad8%2FUniversal-Transformer-Pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreamad8%2FUniversal-Transformer-Pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreamad8%2FUniversal-Transformer-Pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreamad8%2FUniversal-Transformer-Pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andreamad8","download_url":"https://codeload.github.com/andreamad8/Universal-Transformer-Pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223605454,"owners_count":17172493,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pytorch","universal-transformer"],"created_at":"2024-08-01T20:01:06.457Z","updated_at":"2024-11-07T23:30:28.867Z","avatar_url":"https://github.com/andreamad8.png","language":"Python","readme":"# Universal-Transformer-Pytorch\nSimple and self-contained implementation of the [Universal Transformer](https://arxiv.org/abs/1807.03819) (Dehghani, 2018) in Pytorch. Please open issues if you find bugs, and send pull request if you want to contribuite. \n\n![](file.gif)\nGIF taken from: [https://twitter.com/OriolVinyalsML/status/1017523208059260929](https://twitter.com/OriolVinyalsML/status/1017523208059260929)\n\n## Universal Transformer \nThe basic Transformer model has been taken from [https://github.com/kolloldas/torchnlp](https://github.com/kolloldas/torchnlp). For now it has been implemented:\n\n- Universal Transformer Encoder Decoder, with position and time embeddings.\n- [Adaptive Computation Time](https://arxiv.org/abs/1603.08983) (Graves, 2016) as describe in Universal Transformer paper. \n- Universal Transformer for bAbI data.  \n \n## Dependendency\n```\npython3\npytorch 0.4\ntorchtext\nargparse\n```\n## How to run\nTo run standard Universal Transformer on bAbI run:\n```\npython main.py --task 1\n```\nTo run Adaptive Computation Time: \n```\npython main.py --task 1 --act\n```\n\n## Results\n10k over 10 run, get the maximum.\n\nIn task 16 17 18 19 I notice that are very hard to converge also in training set. \nThe problem seams to be the lr rate scheduling. Moreover, on 1K setting the results\nare very bad yet, maybe I have to tune some hyper-parameters. \n\n|Task  | Uni-Trs| + ACT  | Original |\n|  --- |---     |---     |---       |     \n|  1   | 0.0    |  0.0   | 0.0      |\n|  2   | 0.0    |  0.2   | 0.0      |\n|  3   | 0.8    |  2.4   | 0.4      |\n|  4   | 0.0    |  0.0   | 0.0      |\n|  5   | 0.4    |  0.1   | 0.0      |\n|  6   | 0.0    |  0.0   | 0.0      |\n|  7   | 0.4    |  0.0   | 0.0      |\n|  8   | 0.2    |  0.1   |  0.0     |\n|  9   | 0.0    |  0.0   |  0.0     |\n| 10   | 0.0    |  0.0   |  0.0     |\n| 11   | 0.0    |  0.0   |  0.0     |\n| 12   | 0.0    |  0.0   |  0.0     |\n| 13   | 0.0    |  0.0   |  0.0     |\n| 14   | 0.0    |  0.0   |  0.0     |\n| 15   | 0.0    |  0.0   |  0.0     |\n| 16   | 50.5   |  50.6  |  0.4     |\n| 17   | 13.7   |  14.1  |  0.6     |\n| 18   | 4      |    6.9 |  0.0     |\n| 19   | 79.2   |  65.2  |  2.8     |\n| 20   | 0.0    |  0.0   |  0.0     |\n|---   | ---    | ---    |  ---     |\n| avg  | 7.46   | 6.98   |  0.21    |\n| fail | 3      | 3      |  0       |\n\n## TODO\n- Visualize ACT on different tasks \n\n\u003c!-- Noam True ACT False Task: 1 Max:  Mean: 1.0 Std: 0.0\nNoam True ACT False Task: 2 Max:  Mean: 0.9858 Std: 0.028480870773204943\nNoam True ACT False Task: 3 Max:  Mean: 0.9186 Std: 0.13648604324252353\nNoam True ACT False Task: 4 Max:  Mean: 1.0 Std: 0.0\nNoam True ACT False Task: 5 Max:  Mean: 0.9423 Std: 0.07518384134905584\nNoam True ACT False Task: 6 Max:  Mean: 0.9991 Std: 0.0009433981132056612\nNoam True ACT False Task: 7 Max:  Mean: 0.9613999999999999 Std: 0.03378816360798555\nNoam True ACT False Task: 8 Max:  Mean: 0.9959 Std: 0.0022113344387495997\nNoam True ACT False Task: 9 Max:  Mean: 0.998 Std: 0.0022360679774997916\nNoam True ACT False Task: 10 Max:  Mean: 0.9972 Std: 0.002600000000000002\nNoam True ACT False Task: 11 Max:  Mean: 0.9994 Std: 0.001200000000000001\nNoam True ACT False Task: 12 Max:  Mean: 0.9998000000000001 Std: 0.0006000000000000005\nNoam True ACT False Task: 13 Max:  Mean: 0.982 Std: 0.025791471458604318\nNoam True ACT False Task: 14 Max:  Mean: 0.9983000000000001 Std: 0.0019519221295943153\nNoam True ACT False Task: 15 Max:  Mean: 0.999 Std: 0.0024083189157584613\nNoam True ACT False Task: 16 Max:  Mean: 0.47669999999999996 Std: 0.014262187770464941\nNoam True ACT False Task: 17 Max:  Mean: 0.6883999999999999 Std: 0.10602754359127634\nNoam True ACT False Task: 18 Max:  Mean: 0.9126 Std: 0.01696584804835878\nNoam True ACT False Task: 19 Max:  Mean: 0.1639 Std: 0.03415098827266936\nNoam True ACT False Task: 20 Max:  Mean: 1.0 Std: 0.0 --\u003e\n\u003c!-- Noam True ACT True Task: 1 Max:  Mean: 0.9996 Std: 0.0009165151389911689\nNoam True ACT True Task: 2 Max:  Mean: 0.9572999999999998 Std: 0.050034088379823614\nNoam True ACT True Task: 3 Max:  Mean: 0.8862999999999998 Std: 0.13403883765536015\nNoam True ACT True Task: 4 Max:  Mean: 0.9999 Std: 0.0003000000000000003\nNoam True ACT True Task: 5 Max:  Mean: 0.9743999999999999 Std: 0.051252707245569\nNoam True ACT True Task: 6 Max:  Mean: 0.9921 Std: 0.02072414051293803\nNoam True ACT True Task: 7 Max:  Mean: 0.9515 Std: 0.032696330069290645\nNoam True ACT True Task: 8 Max:  Mean: 0.9957 Std: 0.0018466185312619402\nNoam True ACT True Task: 9 Max:  Mean: 0.9991 Std: 0.0013747727084867532\nNoam True ACT True Task: 10 Max:  Mean: 0.9986 Std: 0.002653299832284322\nNoam True ACT True Task: 11 Max:  Mean: 0.9987 Std: 0.0019519221295943153\nNoam True ACT True Task: 12 Max:  Mean: 0.9999 Std: 0.00030000000000000024\nNoam True ACT True Task: 13 Max:  Mean: 0.9991 Std: 0.0015132745950421568\nNoam True ACT True Task: 14 Max:  Mean: 0.9926 Std: 0.01517366139071254\nNoam True ACT True Task: 15 Max:  Mean: 1.0 Std: 0.0\nNoam True ACT True Task: 16 Max:  Mean: 0.487 Std: 0.005440588203494182\nNoam True ACT True Task: 17 Max:  Mean: 0.7247 Std: 0.10200691153054287\nNoam True ACT True Task: 18 Max:  Mean: 0.9086000000000001 Std: 0.01060377291344926\nNoam True ACT True Task: 19 Max:  Mean: 0.2424 Std: 0.04844625888549083\nNoam True ACT True Task: 20 Max:  Mean: 0.9996 Std: 0.000489897948556636 --\u003e\n","funding_links":[],"categories":["Transformer"],"sub_categories":["Repositories"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreamad8%2FUniversal-Transformer-Pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandreamad8%2FUniversal-Transformer-Pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreamad8%2FUniversal-Transformer-Pytorch/lists"}