{"id":23418809,"url":"https://github.com/cmpark0126/pytorch-lars","last_synced_at":"2025-10-15T02:12:13.023Z","repository":{"id":99920150,"uuid":"193444528","full_name":"cmpark0126/pytorch-LARS","owner":"cmpark0126","description":null,"archived":false,"fork":false,"pushed_at":"2019-07-14T16:55:51.000Z","size":1959,"stargazers_count":9,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-12T11:59:11.272Z","etag":null,"topics":["deep-learning","large-scale-learning","pytorch","pytorch-examples"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cmpark0126.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-06-24T06:09:40.000Z","updated_at":"2022-04-14T07:43:08.000Z","dependencies_parsed_at":"2023-05-11T02:00:14.581Z","dependency_job_id":null,"html_url":"https://github.com/cmpark0126/pytorch-LARS","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cmpark0126/pytorch-LARS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmpark0126%2Fpytorch-LARS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmpark0126%2Fpytorch-LARS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmpark0126%2Fpytorch-LARS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmpark0126%2Fpytorch-LARS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cmpark0126","download_url":"https://codeload.github.com/cmpark0126/pytorch-LARS/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmpark0126%2Fpytorch-LARS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279034095,"owners_count":26089420,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-15T02:00:07.814Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","large-scale-learning","pytorch","pytorch-examples"],"created_at":"2024-12-23T00:39:10.174Z","updated_at":"2025-10-15T02:12:13.018Z","avatar_url":"https://github.com/cmpark0126.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pytorch-LARS\n\n## Objective\n\n-   link: [\"Large Batch Training of Convolutional Networks (LARS)\"](https://arxiv.org/abs/1708.03888)\n-   위 논문에 소개된 LARS를 PyTorch, CUDA로 구현\n-   Data: CIFAR10\n\n## Requirements\n\n- python == 3.6.8\n- pytorch \u003e= 1.1.0\n- cuda \u003e= 10\n- matplotlib \u003e= 3.1.0 (option)\n- etc.\n\n## Usage\n\n-   Train\n\n```bash\n$ git clone https://github.com/cmpark0126/pytorch-LARS.git\n$ cd pytorch-LARS/\n$ vi hyperparams.py # 학습을 위해 Basic, Hyperparams class 수정\n$ python train.py # CIFAR10 학습 시작\n```\n\n-   Evaluate\n\n```bash\n$ vi hyperparams.py # 학습 결과 확인을 위해 Hyperparams_for_val class 조정, 특정한 checkpoint를 선택하는 것이 가능\n$ python val.py # 학습 결과 확인, 이걸로 학습 진행 도중 update되어온 test accuracy의 history 확인 가능\n```\n\n## Hyperparams (hyperparams.py)\n\n-   Base (class)\n\n    -   batch_size: 기준 Batch size. 실험에서 사용되는 모든 Batch size는 이 size의 배수 형태로 나타난다.\n\n    -   lr: 기준 learning rate. 일반적으로 linear scailing에서 기준 값으로 사용한다.\n\n    -   multiples: 아래에서 설명되는 k를 구하기 위한 지수로 사용되는 배수이다.\n\n-   Hyperparams (class)\n\n    -   batch_size: 실제 학습에서 사용하는 batch size\n\n    -   lr: 실제 학습에서 초기 값으로 사용하는 learning rate\n\n    -   momentum\n\n    -   weight_decay\n\n    -   trust_coef: trust coefficient로 LARS 사용시에 내부에서 구해지는 Local LR의 신뢰도를 의미\n\n    -   warmup_multiplier\n\n    -   warmup_epoch\n\n    -   max_decay_epoch: polynomial decay를 최대한 진행할 epoch 수\n\n    -   end_learning_rate: decay 작업이 모두 완료되었을 때 learning rate가 수렴될 값\n\n    -   num_of_epoch: 학습을 돌릴 총 epoch 수\n\n    -   with_lars\n\n-   Hyperparams_for_val (class)\n\n    -   checkpoint_folder_name: hyperparams.py와 같은 폴더에는 파라미터를 모아둔 checkpoint folder가 존재해야 하며, 이들 중 하나의 이름을 지정(eg. checkpoint_folder_name = 'checkpoint-attempt1')\n\n    -   with_lars: checkpoint 중, lars를 사용한 것 혹은 사용하지 않은 것을 선택\n\n    -   batch_size: checkpoint 중, 사용한 batch_size 크기를 지정\n\n    -   device: evaluation을 위해 모델을 돌릴 때 사용할 cuda device 선택\n\n## Demonstration\n\n-   Terminology\n    -   k\n        -   we increase the batch B by k\n        -   start batch size is 128\n        -   if we use 256 as batch size, k is 2 in this time\n        -   **k = (2 \\*\\* (multiples - 1))**\n    -   (base line)\n        -   target accuracy which we want to get when we train the model using large batch size with LARS\n\n* * *\n\n### Attempt 1\n\n-   Configuration\n\n    -   Hyperparams\n\n        -   momentum = 0.9\n\n        -   weigth_decay\n\n            -   noLars -\u003e 5e-04\n            -   withLARS -\u003e 5e-03\n\n        -   warm-up for 5 epoch\n\n            -   warmup_multiplier = k\n            -   target lr follows linear scailing rule\n\n        -   polynomial decay (power=2) LR policy (after warm-up)\n\n            -   for 200 epoch\n            -   minimum lr = 1.5e-05 \\* k\n\n        -   number of epoch = 200\n\n-   Without LARS\n\n| Batch | Base LR |    top-1 Accuracy, %   | Time to train |\n| :---: | :-----: | :--------------------: | :-----------: |\n|  128  |   0.15  | 89.15 %\u003cbr\u003e(base line) |  2113.52 sec  |\n|  256  |   0.15  |         88.43 %        |  1433.38 sec  |\n|  512  |   0.15  |         88.72 %        |  1820.35 sec  |\n|  1024 |   0.15  |         87.96 %        |  1303.54 sec  |\n|  2048 |   0.15  |         87.05 %        |  1827.90 sec  |\n|  4096 |   0.15  |         78.03 %        |  2083.24 sec  |\n|  8192 |   0.15  |         14.59 %        |  1459.81 sec  |\n\n-   With LARS (closest one to base line, for comparing time to train)\n\n| Batch | Base LR | top-1 Accuracy, % | Time to train |\n| :---: | :-----: | :---------------: | :-----------: |\n|  128  |   0.15  |      89.16 %      |  3203.54 sec  |\n|  256  |   0.15  |      89.19 %      |  2147.74 sec  |\n|  512  |   0.15  |      89.29 %      |  1677.25 sec  |\n|  1024 |   0.15  |      89.17 %      |  1604.91 sec  |\n|  2048 |   0.15  |      88.70 %      |  1413.10 sec  |\n|  4096 |   0.15  |      86.78 %      |  1609.08 sec  |\n|  8192 |   0.15  |      80.85 %      |  1629.48 sec  |\n\n-   With LARS (best accuracy)\n\n| Batch | Base LR | top-1 Accuracy, % | Time to train |\n| :---: | :-----: | :---------------: | :-----------: |\n|  128  |   0.15  |      89.62 %      |  3606.08 sec  |\n|  256  |   0.15  |      89.78 %      |  2675.04 sec  |\n|  512  |   0.15  |      89.38 %      |  1712.90 sec  |\n|  1024 |   0.15  |      89.22 %      |  1967.92 sec  |\n|  2048 |   0.15  |      88.70 %      |  1413.10 sec  |\n|  4096 |   0.15  |      86.78 %      |  1609.08 sec  |\n|  8192 |   0.15  |      80.85 %      |  1629.48 sec  |\n\n* * *\n\n### Attempt 2\n\n-   Configuration\n\n    -   Hyperparams\n\n        -   momentum = 0.9\n\n        -   weigth_decay\n\n            -   noLars -\u003e 5e-04\n            -   withLARS -\u003e 5e-03\n\n        -   trust coefficient = 0.1\n\n        -   warm-up for 5 epoch\n\n            -   warmup_multiplier = 2 \\* k\n            -   target lr follows linear scailing rule\n\n        -   polynomial decay (power=2) LR policy (after warm-up)\n\n            -   for 200 epoch\n            -   minimum lr = 1e-05\n\n        -   number of epoch = 200\n\n-   Without LARS\n\n| Batch | Base LR |    top-1 Accuracy, %   | Time to train |\n| :---: | :-----: | :--------------------: | :-----------: |\n|  128  |   0.05  | 90.40 %\u003cbr\u003e(base line) |  4232.56 sec  |\n|  256  |   0.05  |         90.00 %        |  2968.43 sec  |\n|  512  |   0.05  |         89.50 %        |  2707.79 sec  |\n|  1024 |   0.05  |         89.27 %        |  2627.22 sec  |\n|  2048 |   0.05  |         89.21 %        |  2500.02 sec  |\n|  4096 |   0.05  |         84.73 %        |  2872.25 sec  |\n|  8192 |   0.05  |         20.85 %        |  2923.95 sec  |\n\n-   With LARS (closest one to base line, for comparing time to train)\n\n| Batch | Base LR | top-1 Accuracy, % | Time to train |\n| :---: | :-----: | :---------------: | :-----------: |\n|  128  |   0.05  |      90.21 %      |  6792.61 sec  |\n|  256  |   0.05  |      90.28 %      |  4871.68 sec  |\n|  512  |   0.05  |      90.41 %      |  3581.32 sec  |\n|  1024 |   0.05  |      90.27 %      |  3030.45 sec  |\n|  2048 |   0.05  |      90.19 %      |  2773.21 sec  |\n|  4096 |   0.05  |      88.49 %      |  2866.02 sec  |\n|  8192 |   0.05  |      62.20 %      |  1312.98 sec  |\n\n-   With LARS (best accuracy)\n\n| Batch | Base LR | top-1 Accuracy, % | Time to train |\n| :---: | :-----: | :---------------: | :-----------: |\n|  128  |   0.05  |      90.21 %      |  6792.61 sec  |\n|  256  |   0.05  |      90.28 %      |  4871.68 sec  |\n|  512  |   0.05  |      90.41 %      |  3581.32 sec  |\n|  1024 |   0.05  |      90.27 %      |  3030.45 sec  |\n|  2048 |   0.05  |      90.19 %      |  2773.21 sec  |\n|  4096 |   0.05  |      88.49 %      |  2866.02 sec  |\n|  8192 |   0.05  |      62.20 %      |  1312.98 sec  |\n\n* * *\n\n### Attempt 3\n\n-   Configuration\n\n    -   Hyperparams\n\n        -   momentum = 0.9\n\n        -   weigth_decay\n\n            -   noLars -\u003e 5e-04\n            -   withLARS -\u003e 5e-03\n\n        -   trust coefficient = 0.1\n\n        -   warm-up for 5 epoch\n\n            -   warmup_multiplier = 2\n\n        -   polynomial decay (power=2) LR policy (after warm-up)\n\n            -   for 200 epoch\n            -   minimum lr = 1e-05 \\* k\n\n        -   number of epoch = 200\n\n    -   Additional Jobs\n\n        -   Use He initialization\n\n        -   base lr은 linear scailing rule에 따라 조정\n\n-   Without LARS\n\n| Batch | Base LR |    top-1 Accuracy, %   | Time to train |\n| :---: | :-----: | :--------------------: | :-----------: |\n|  128  |   0.05  |         89.76 %        |  3983.89 sec  |\n|  256  |   0.1   | 90.08 %\u003cbr\u003e(base line) |  3095.91 sec  |\n|  512  |   0.2   |         89.34 %        |  2674.38 sec  |\n|  1024 |   0.4   |         88.82 %        |  2581.19 sec  |\n|  2048 |   0.8   |         89.29 %        |  2660.56 sec  |\n|  4096 |   1.6   |         85.02 %        |  2871.04 sec  |\n|  8192 |   3.2   |         77.72 %        |  3195.90 sec  |\n\n-   With LARS (closest one to base line, for comparing time to train)\n\n| Batch | Base LR | top-1 Accuracy, % | Time to train |\n| :---: | :-----: | :---------------: | :-----------: |\n|  128  |   0.05  |      90.11 %      |  6880.76 sec  |\n|  256  |   0.1   |      90.12 %      |  4262.83 sec  |\n|  512  |   0.2   |      90.11 %      |  3548.07 sec  |\n|  1024 |   0.4   |      90.02 %      |  2760.31 sec  |\n|  2048 |   0.8   |      90.09 %      |  2877.81 sec  |\n|  4096 |   1.6   |      88.38 %      |  2946.53 sec  |\n|  8192 |   3.2   |      86.40 %      |  3260.45 sec  |\n\n-   With LARS (best accuracy)\n\n| Batch | Base LR | top-1 Accuracy, % | Time to train |\n| :---: | :-----: | :---------------: | :-----------: |\n|  128  |   0.05  |      90.37 %      |  7338.71 sec  |\n|  256  |   0.1   |      90.32 %      |  4590.58 sec  |\n|  512  |   0.2   |      90.11 %      |  3548.07 sec  |\n|  1024 |   0.4   |      90.50 %      |  2897.45 sec  |\n|  2048 |   0.8   |      90.09 %      |  2877.81 sec  |\n|  4096 |   1.6   |      88.38 %      |  2946.53 sec  |\n|  8192 |   3.2   |      86.40 %      |  3260.45 sec  |\n\n* * *\n\n### Attempt 4\n\n-   Configuration\n\n    -   Hyperparams\n\n        -   momentum = 0.9\n\n        -   weigth_decay\n\n            -   noLars -\u003e 5e-04\n            -   withLARS -\u003e 5e-03\n\n        -   trust coefficient = 0.1\n\n        -   warm-up for 5 epoch\n\n            -   warmup_multiplier = 5\n\n        -   polynomial decay (power=2) LR policy (after warm-up)\n\n            -   for 200 epoch\n            -   minimum lr = 1e-05 \\* k\n\n        -   number of epoch = 200\n\n    -   Additional Jobs\n\n        -   Use He initialization\n\n        -   base lr은 linear scailing rule에 따라 조정\n\n-   Without LARS\n\n| Batch | Base LR |    top-1 Accuracy, %   | Time to train |\n| :---: | :-----: | :--------------------: | :-----------: |\n|  128  |   0.02  |         89.84 %        |  4146.52 sec  |\n|  256  |   0.04  | 90.22 %\u003cbr\u003e(base line) |  3023.48 sec  |\n|  512  |   0.08  |         89.42 %        |  2588.01 sec  |\n|  1024 |   0.16  |         89.41 %        |  2494.35 sec  |\n|  2048 |   0.32  |         88.97 %        |  2616.32 sec  |\n|  4096 |   0.64  |         85.13 %        |  2872.76 sec  |\n|  8192 |   1.28  |         75.99 %        |  3226.53 sec  |\n\n-   With LARS (closest one to base line, for comparing time to train)\n\n| Batch | Base LR | top-1 Accuracy, % | Time to train |\n| :---: | :-----: | :---------------: | :-----------: |\n|  128  |   0.02  |      90.20 %      |  6740.03 sec  |\n|  256  |   0.04  |      90.25 %      |  4662.09 sec  |\n|  512  |   0.08  |      90.24 %      |  3381.99 sec  |\n|  1024 |   0.16  |      90.07 %      |  2929.32 sec  |\n|  2048 |   0.32  |      89.82 %      |  2908.37 sec  |\n|  4096 |   0.64  |      88.09 %      |  2980.63 sec  |\n|  8192 |   1.28  |      86.56 %      |  3314.60 sec  |\n\n-   With LARS (best accuracy)\n\n| Batch | Base LR | top-1 Accuracy, % | Time to train |\n| :---: | :-----: | :---------------: | :-----------: |\n|  128  |   0.02  |      90.69 %      |  7003.00 sec  |\n|  256  |   0.04  |      90.32 %      |  4808.80 sec  |\n|  512  |   0.08  |      90.40 %      |  3615.13 sec  |\n|  1024 |   0.16  |      90.07 %      |  2929.32 sec  |\n|  2048 |   0.32  |      89.82 %      |  2908.37 sec  |\n|  4096 |   0.64  |      88.09 %      |  2980.63 sec  |\n|  8192 |   1.28  |      86.56 %      |  3314.60 sec  |\n\n* * *\n\n### Attempt 5\n\n-   Configuration\n\n    -   Hyperparams\n\n        -   momentum = 0.9\n\n        -   weigth_decay\n\n            -   noLars -\u003e 5e-04\n            -   withLARS -\u003e 5e-03\n\n        -   trust coefficient = 0.1\n\n        -   warm-up for 5 epoch\n\n            -   warmup_multiplier = 2\n\n        -   polynomial decay (power=2) LR policy (after warm-up)\n\n            -   **for 175 epoch**\n            -   minimum lr = 1e-05 \\* k\n\n        -   **number of epoch = 175**\n\n    -   Additional Jobs\n\n        -   Use He initialization\n\n        -   base lr은 linear scailing rule에 따라 조정\n\n-   Without LARS\n\n| Batch | Base LR |    top-1 Accuracy, %   | Time to train |\n| :---: | :-----: | :--------------------: | :-----------: |\n|  128  |   0.05  | 89.50 %\u003cbr\u003e(base line) |  3682.72 sec  |\n|  256  |   0.1   |         89.22 %        |  2678.24 sec  |\n|  512  |   0.2   |         89.12 %        |  2337.15 sec  |\n|  1024 |   0.4   |         88.70 %        |  2282.48 sec  |\n|  2048 |   0.8   |         88.89 %        |  2316.96 sec  |\n|  4096 |   1.6   |         86.87 %        |  2515.56 sec  |\n|  8192 |   3.2   |         15.50 %        |  2783.00 sec  |\n\n-   With LARS (closest one to base line, for comparing time to train)\n\n| Batch | Base LR | top-1 Accuracy, % | Time to train |\n| :---: | :-----: | :---------------: | :-----------: |\n|  128  |   0.05  |      89.56 %      |  5445.55 sec  |\n|  256  |   0.1   |      89.52 %      |  3461.59 sec  |\n|  512  |   0.2   |      89.60 %      |  2738.91 sec  |\n|  1024 |   0.4   |      89.50 %      |  2410.23 sec  |\n|  2048 |   0.8   |      89.42 %      |  2474.93 sec  |\n|  4096 |   1.6   |      88.43 %      |  2618.97 sec  |\n|  8192 |   3.2   |      74.96 %      |  1835.32 sec  |\n\n-   With LARS (best accuracy)\n\n| Batch | Base LR | top-1 Accuracy, % | Time to train |\n| :---: | :-----: | :---------------: | :-----------: |\n|  128  |   0.05  |      90.36 %      |  6377.71 sec  |\n|  256  |   0.1   |      90.18 %      |  4219.26 sec  |\n|  512  |   0.2   |      90.08 %      |  3130.41 sec  |\n|  1024 |   0.4   |      89.94 %      |  2578.00 sec  |\n|  2048 |   0.8   |      89.42 %      |  2474.93 sec  |\n|  4096 |   1.6   |      88.43 %      |  2618.97 sec  |\n|  8192 |   3.2   |      74.96 %      |  1835.32 sec  |\n\n* * *\n\n## Visualization\n\n \u003cimg src=\"fig\\result_fig-attempt4\\result_fig-noLARS\\noLars-8192.jpg\"\u003e\n\n \u0026lt;Fig1. Attempt4, Without LARS, Batch size = 8192\u003e\n\n \u003cimg src=\"fig\\result_fig-attempt4\\result_fig-withLARS\\withLars-8192.jpg\"\u003e\n\n \u0026lt;Fig2. Attempt4, With LARS, Batch size = 8192\u003e\n\n-   \\\u003cFig1\u003e과 \\\u003cFig2\u003e를 비교하면 LARS를 사용할 때, 좀 더 안정적으로 학습을 시작하고, 부드럽게 accuracy가 증가하는 것을 확인할 수 있다.\n\n-   Attempt3, 4, 5를 작업하면서 만든 Accuracy 변화율 그래프는 아래 링크에서 확인하는 것이 가능하다.\n    -   [Attempt3](https://github.com/cmpark0126/pytorch-LARS/tree/master/fig/result_fig-attempt3)\n    -   [Attempt4](https://github.com/cmpark0126/pytorch-LARS/tree/master/fig/result_fig-attempt4)\n    -   [Attempt5](https://github.com/cmpark0126/pytorch-LARS/tree/master/fig/result_fig-attempt5)\n\n## Analysis of Resnet50 Training With Large Batch (CIFAR10)\n\n-   LARS를 사용하면 1024까지의 Batch를 사용해서 모델이 Base line의 성능을 보일 수 있도록 학습하는 것이 가능하다는 것을 확인\n\n-   LARS만을 사용하는 것보다, He initialization을 포함하여 여러 테크닉을 함께 사용하는 것이 중요하다는 것을 확인\n\n-   LARS를 사용하면 단순히 base line을 만족하는 것이 아니라 더 좋은 성능을 보일 수도 있다는 것을 확인\n    - Local learning rate가 vanishing 문제나 exploding gradient 문제를 완화시킨다는 논문의 언급에 따른 부가 효과로 보임\n\n## Open Issue\n\n-   LARS를 사용하면 약 두 배 정도 시간이 더 들어가는 것을 확인. 학습 시간을 줄일 수 있는 방안이 있는지 찾아보기\n\n## Reference\n\n-   Base code: \u003chttps://github.com/kuangliu/pytorch-cifar\u003e\n\n-   warm-up LR scheduler: \u003chttps://github.com/ildoonet/pytorch-gradual-warmup-lr/tree/master/warmup_scheduler\u003e\n    -   또한, 이를 기반으로 PolynomialLRDecay class 구현\n        -   polynomial LR decay scheduler\n    -   참고: scheduler.py\n\n-   Pytorch Doc / Optimizer: \u003chttps://pytorch.org/docs/stable/_modules/torch/optim/optimizer.html\u003e\n    -   Optimizer class\n    -   SGD class\n\n## Appendix\n\n### val.py 실행 화면\n\n\u003cimg src=\"fig\\appendix\\run_val.PNG\"\u003e\n\n- best accuracy가 update되어 온 history를 확인할 수 있다.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmpark0126%2Fpytorch-lars","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcmpark0126%2Fpytorch-lars","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmpark0126%2Fpytorch-lars/lists"}