{"id":17030318,"url":"https://github.com/vsoch/watchme-mnist","last_synced_at":"2026-04-11T08:32:15.771Z","repository":{"id":141668075,"uuid":"187382585","full_name":"vsoch/watchme-mnist","owner":"vsoch","description":"An example using the watchme terminal monitor to record resources used during mnist training","archived":false,"fork":false,"pushed_at":"2019-05-19T22:52:08.000Z","size":617,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-28T00:22:06.001Z","etag":null,"topics":["docker","mnist","psutils","singularity","sklearn","watchme"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vsoch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-18T16:47:56.000Z","updated_at":"2019-05-19T22:52:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"ec940682-ed1f-4e6b-9d3b-feec28ce24a2","html_url":"https://github.com/vsoch/watchme-mnist","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fwatchme-mnist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fwatchme-mnist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fwatchme-mnist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fwatchme-mnist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vsoch","download_url":"https://codeload.github.com/vsoch/watchme-mnist/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245016009,"owners_count":20547516,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","mnist","psutils","singularity","sklearn","watchme"],"created_at":"2024-10-14T08:06:07.918Z","updated_at":"2026-04-11T08:32:10.744Z","avatar_url":"https://github.com/vsoch.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Watchme Mnist\n\nThis is a [watchme](https://www.github.com/vsoch/watchme) repository that shows\nhow easy it is to monitor a task at some frequency using the [watchme monitor pid task](https://vsoch.github.io/watchme/watchers/psutils)\nprovided by the psutils set of tasks. Specifically, we are going to:\n\n 1. Start with this [sklearn mnist example](https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mnist_filters.html#sphx-glr-auto-examples-neural-networks-plot-mnist-filters-py)\n 2. Build it into a container, the [Dockerfile](Dockerfile) here served at [vanessa/watchme-mnist](https://hub.docker.com/r/vanessa/watchme-mnist)\n 3. Run the container on an HPC cluster with varying amounts of memory, for a training task that takes approximately 20 minutes.\n\nAnd compare results!\n\n## Included\n\nThis is a fairly simple analysis in that I could install [watchme](https://www.github.com/vsoch/watchme)\nand then write a few quick scripts, run, and be done! \n\n - [run_job.sh](run_job.sh) will submit job.sh to the cluster, specifying input parameters and outputs\n - [job.sh](job.sh) is submit to different nodes with varying memory, each 5 times\n - [data](data) is where output data is written to, including json results files and images from the training.\n\n\n## Usage\n\n### 1. Setup\n\nSpecifically, to install watchme:\n\n```bash\n$ pip install watchme[all]\n```\n\nYou can also clone and install from the master branch directly:\n\n```bash\n$ git clone https://www.github.com/vsoch/watchme\ncd watchme\npip install .[all] --user\n```\n\nAnd then I created a watcher folder (this repo).\n\n```bash\n$ watchme create watchme-mnist\n```\n\nWe aren't going to be using .git as a temporal database, but it's still handy\nto use watchme to create the repo for us :)\n\n### 2. Mnist on the Sherlock Cluster\n\nThis was the script [job.sh](job.sh) submit via [run_job.sh](run_job.sh) and we\nfirst export some variables to the environment to be added to our data:\n\n```bash\n# Add variables for host, cpu, etc.\nexport WATCHMEENV_HOSTNAME=$(hostname)\nexport WATCHMEENV_NPROC=$(nproc)\nexport WATCHMEENV_MAXMEMORY=${mem}\n```\n\nand the command to use watchme looks like this. We are going to run the model and record every 20 seconds. \nThe output will be piped into a json file, and the script is given the name of a png file (in the\nsame directory) to save a plot to. This should take 20-30 mins.\n\n```bash\nwatchme monitor --name $name-$iter --seconds 20 singularity run docker://vanessa/watchme-mnist ${output}.png \u003e ${output}.json\n```\n\nThe above command is submit in a simple loop in [run_job.sh](run_job.sh), notice how\nwe define iter, and mem based on the loops:\n\n```bash\nfor iter in 1 2 3 4 5; do\n    for mem in 4 6 8 12 16 18 24 32 64 128; do\n        output=\"${outdir}/${name}-iter${iter}-${mem}gb\"\n        echo \"sbatch --mem=${mem}GB job.sh ${mem} ${iter} ${name} ${output}\"            \n        sbatch --mem=${mem}GB job.sh \"${mem}\" \"${iter}\" \"${name}\" ${output}\n    done\ndone\n```\n\nThe results were each written directly to files in [data](data) (not using\ngit as a temporal database).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvsoch%2Fwatchme-mnist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvsoch%2Fwatchme-mnist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvsoch%2Fwatchme-mnist/lists"}