{"id":17793013,"url":"https://github.com/evcu/exp.bootstrp","last_synced_at":"2025-04-02T02:18:13.058Z","repository":{"id":79244269,"uuid":"125545799","full_name":"evcu/exp.bootstrp","owner":"evcu","description":"This repo is a bootstrap for experiments and includes helper functions scripts for pytorch training and slurm job scheduler.","archived":false,"fork":false,"pushed_at":"2018-05-11T16:39:22.000Z","size":1553,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-07T17:17:18.353Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/evcu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-16T17:00:19.000Z","updated_at":"2019-02-26T07:46:24.000Z","dependencies_parsed_at":"2023-04-18T11:57:15.085Z","dependency_job_id":null,"html_url":"https://github.com/evcu/exp.bootstrp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evcu%2Fexp.bootstrp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evcu%2Fexp.bootstrp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evcu%2Fexp.bootstrp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evcu%2Fexp.bootstrp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/evcu","download_url":"https://codeload.github.com/evcu/exp.bootstrp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246741117,"owners_count":20826067,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-27T11:03:44.004Z","updated_at":"2025-04-02T02:18:13.029Z","avatar_url":"https://github.com/evcu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# exp.bootstrp\nThis repo is a bootstrap for experiments and includes helper functions scripts for pytorch training and slurm job scheduler.\n\n**Basic idea**:\nCreate a python script(experiment) which accepts command line arguments.\nProvide arg_lists and generate slurm_jobs using the cross product of the given arg lists.\n\n## Quick Start\nFirst thing setup your [ssh workflow](https://evcu.github.io/notes/ssh-setup-notes/). Then lets kick-start our experiment.\n\n```bash\nssh prince\ngit clone git@github.com:evcu/exp.bootstrp.git\nmv exp.bootstrp my_exp\ncd my_exp\n```\n\nFirst debug the experiment on a interactive session\nprince_slurm_bootstrap.sh loads the modules needed, update as needed.\nPersonally I am using python3 with pip --user packages. You can call it with install for the first time\n\n```bash\nsrun -t2:30:00 --mem=5000 --gres=gpu:1 --pty /bin/bash\n\n. ./prince_slurm_bootstrap.sh install\ncd experiments/cifar10/\npython main.py --epoch 1\n```\n\nAfter we are sure that our main script works, we can start create automated experiments with\n`create_experiment_jobs.py` scripts. First thing to do is updating some of the SLURM fields under [experiments/default_conf.yaml](https://github.com/evcu/exp.bootstrp/blob/master/experiments/default_conf.yaml).\nReplace `NET_ID` with you net_id for example if you are a fellow NYU student and using prince. You may need to completely change this file according to your needs if you are working in another system or have different requirements.\n\n![log](img/args.png)\n\nNote that each element of the `experiment` key in the yaml file is a dictionary itself involves argument lists for `\u003cexp_name\u003e/main.py`. Each of the values in these argument lists are cross-product with others in the dictionary to generate all possible combinations.\n\nNow we can generate experiment scripts.\n\n```bash\ncd ../\npython create_experiment_jobs.py --debug\n```\n\nif they all look nice then you can create the experiment folder. and submit the jobs\n\n```bash\npython create_experiment_jobs.py\nbash /scratch/ue225/my_project/exps/cifar10/cifarLR_03.26/submit_all.sh\n```\n\nwhich would output something like this\n\n![log](img/con.png)\nLet say you wanna define a new experiment. You would do by creating a new folder `experiments/new_folder/` and a `experiments/new_folder/main.py`script that is intended to be run. The main.py script should accept\n`--log_folder` and `--conf_file` flags at minimum. Then you can change `exp_name` at `experiments/default_conf.yaml` to `new_folder` and create new experiments.\n\n## Features\n- [read_yaml_args](https://github.com/evcu/exp.bootstrp/blob/master/experiments/exp_utils.py#L224) reads conf.yaml and creates a type-checked argParser out of the definition. Write the conf, read and overwrite with cli args.\n- [Customizable eval-prefixes](https://github.com/evcu/exp.bootstrp/blob/master/experiments/exp_utils.py#L175) inside yaml file, which enables defining programatic eval-able arguments.\n  i.e. the string '+range(5)' would be evaluated and read as the list.\n- configuration [copy to the experiment folder](https://github.com/evcu/exp.bootstrp/blob/master/experiments/create_experiment_jobs.py#L34) such that you can always change experiments default_args after submission\n- [ClassificationTrainer](https://github.com/evcu/exp.bootstrp/blob/master/experiments/exp_utils.py#L83)/[ClassificationTester](https://github.com/evcu/exp.bootstrp/blob/master/experiments/exp_utils.py#L9) which wraps the main training/testing\nfunctionalities and provides hooks for loggers.\n- tensorboardX [logging utils](https://github.com/evcu/exp.bootstrp/blob/master/experiments/exp_loggers.py) and examples.\n- [convNetgeneric](https://github.com/evcu/exp.bootstrp/blob/master/experiments/exp_models.py#L67) implementation\n- [Multiple experiment definitions](https://github.com/evcu/exp.bootstrp/blob/master/experiments/default_conf.yaml#L6) through yaml lists.\n\n## Visualizing Tensorboard Events\nthere are several options\n- You can scp like\n```bash\nscp prince:/scratch/ue225/my_project/exps/cifar10/cifarLR\n.26/tb_logs ./\n```\n- You can open a tunnel to the prince and run tensorboard on prince and connect to it through port forwarding. You can look my (remote Jupyter and port forwarding](https://evcu.github.io/notes/port-forwarding/) notes.\n- You can use sshfs and get the logs sync into your local file system. Details [here](https://evcu.github.io/notes/ssh-setup-notes)\n\nand read your results\n![log](img/tb1.png)\n![log](img/tb2.png)\n![log](img/tb3.png)\n\n## Contribution\nI am excited to collaborate and learn from you if you figured out better ways experimenting or wanna add text/code to this repo. Please create an issue or reach_out to me.\n## TODO\n- change create_experiments such that maybe the defaults included in the experiment.yaml and dumped.\n- Source code needs to be copied!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevcu%2Fexp.bootstrp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevcu%2Fexp.bootstrp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevcu%2Fexp.bootstrp/lists"}