{"id":26445567,"url":"https://github.com/cisco-open/multi-task-learning-library","last_synced_at":"2025-03-18T11:19:27.450Z","repository":{"id":248146848,"uuid":"816422603","full_name":"cisco-open/multi-task-learning-library","owner":"cisco-open","description":"This repository contains a Python library to simplify multi-task learning for computer vision models with shared backbones.","archived":false,"fork":false,"pushed_at":"2024-07-12T13:50:17.000Z","size":80,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-07-12T18:11:47.385Z","etag":null,"topics":["computer-vision","multi-task-learning","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cisco-open.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-17T18:07:44.000Z","updated_at":"2024-07-12T18:11:54.530Z","dependencies_parsed_at":"2024-07-12T18:24:49.166Z","dependency_job_id":null,"html_url":"https://github.com/cisco-open/multi-task-learning-library","commit_stats":null,"previous_names":["cisco-open/multi-task-learning-library"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cisco-open%2Fmulti-task-learning-library","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cisco-open%2Fmulti-task-learning-library/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cisco-open%2Fmulti-task-learning-library/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cisco-open%2Fmulti-task-learning-library/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cisco-open","download_url":"https://codeload.github.com/cisco-open/multi-task-learning-library/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244208595,"owners_count":20416110,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","multi-task-learning","python"],"created_at":"2025-03-18T11:19:26.825Z","updated_at":"2025-03-18T11:19:27.442Z","avatar_url":"https://github.com/cisco-open.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- # Easy Multi-Task Learning --\u003e\n\u003cp align=\"center\" width=\"100%\"\u003e\n    \u003cimg width=\"150px\" src=\"https://i.postimg.cc/HWzrbmZX/Screenshot-20230301-054816.png\"\u003e\n\u003c/p\u003e\n\u003ch1 style=\"text-align: center;\"\u003eEasy Multi-Task Learning Library\u003c/h1\u003e\n\nEasy Multi-Task Learning Library (EMTL) is a framework to simplify the prototyping and training processes for multi-task computer vision ML models. EMTL provides a set of interfaces and tools to modularize CV tasks and datasets into consumable objects and standardize models. These modularized objects are then assembled and given to a parameterized trainer, which will produce a complete trained model.\n\n## Setup\nThe minimal installation requirements for this library are `PyTorch` and `tqdm`. Additionally, one\nmay install MLFlow to track experiments, which is fully integrated into this library.\n```bash\n# PyTorch CPU\npip3 install torch tqdm virtualenv\n\n# PyTorch GPU\npip3 install torch --extra-index-url https://download.pytorch.org/whl/cu117 tqdm virtualenv\n\n# Optional MLFlow\npip3 install mlflow\n```\n\n### Run on a remote Jupyter Server\nHere we provide a basic tutorial on how to spin up a Jupyter Server instance on a remote server to \noffload intensive computation when using the EMTL library. First of all, one needs a remote machine,\npossibly equipped with GPUs or TPUs, where Anaconda is installed (installation instructions here\nhttps://docs.anaconda.com/anaconda/install/linux/). \nThen, to connect it to a local Jupyter notebook in VS Code:\n``` bash\n# 1. SSH into the remote machine and forward port 8888 (for Jupyter)\nssh -L 8888:localhost:8888 user@address\n\n# 2. Create \u0026 activate a new Anaconda virtual environment\nvirtualenv -p $(which python3) emtl_environment \nvirtualenv activate emtl_environment\npip3 install torch tqdm mlflow jupyterlab\n\n# 3. Spin up a Jupyter Server instance\njupyter server --no-browser --port=8888\n```\n4. Connect Jupyer Server to a local VS Code\n    1. The last shell command will return *https* addresses with token. Copy one (the whole address).\n    2. Open a notebook in local VS Code, and at the screen bottom select *Jupyter Server: local*.\n    3. In the popup menu, select *Existing* and paste the copied url. Press *Enter* twice.\n    4. You should now be connected to the remote server.\n5. Switch to the correct kernel\n    1. In the top-right part of the screen, where it says *kernel*, click it.\n    2. In the popup menu, look for the option that mentions *server* or *remote*; click it.\n    3. You should now be connected to the remote kernel. You can now run notebooks.\n\n### Experiment Tracking with MLFlow\nHere we show a basic use case of EMTL supported by MLFlow. We create an SQLite database to keep\ntrack of experiments, and serve it through port 5000 of the remote host (that we forward locally).\nAssuming one has already setup the `emtl_environment` as described above on the remote machine, do\nthe following:\n```bash\n# connect to remote server and forward ports (5000 for MLFLow, 8888 for Jupyter)\nssh -L 8888:localhost:8888 -L 5000:localhost:5000 user@address\nconda activate emtl_environment\n\n# spin up mlflow\nmlflow ui --backend-store-uri sqlite:///mlflow.db\n```\nThen, connecting to http://localhost:5000/, you can find the UI of MLFlow with the experiments. To\nconnect to MLFLow from code and run a new experiment, do the following (in Python):\n```python\nimport mlflow\nmlflow.set_tracking_uri('sqlite:///mlflow.db')\nmlflow.set_experiment('EMTL Example')\n```\n\nTo stop its execution, try:\n```bash\nfuser -k 5000/tcp\n```\nYou can find more use cases in the examples and demos provided in this repository.\n\n\n## Design\nEMTL requires a Machine Learning Practitioner (MLP) to write their code in a *modular fashion*: models, datasets, learners, and algorithms are independent of one another, but can interact through the inferfaces and decorators provided by EMTL.\n\nEMTL provides a useful pipeline to handle the creation and training of a multi-task model, with the following steps:\n1. Define and validate the backbone model\n2. For each taks to learn:\n    2a. Create and validate the dataset\n    2b. Define and validate the criterion\n    2c. Define and validate the specialized head\n    2d. Speicfy optional metadata\n3. Choose a learning algorithm\n    3a. EMTL will generate the learners\n4. Launch the training\n    4a. Log intermediate metrics (to console, file, or MLFlow DB)\n5. Save the produced artifacts (trained backbone and heads)\n\nEMTL validates models, datasets, and tasks through the *decorator* design pattern, ensuring that all modules are written according to its specification. Helpful insights are provided upon mismatch.\n\n### Code Architecture\nThe project is structured so that we have an \"umbrella\" trainer class that requires a (backbone) model, a non-empty set of tasks, and a training algorithm. Each of these is one module (the tasks are a list of modules), and exists indipendently from the others (no class/function has imports from one to the other). We use a `config.ini` file to specify global configuration parameters (e.g., PyTorch's configurations, using MLFlow, etc.)\n\n## Workflows\nHere we list all pipelines to make and push changes, generate documentation, and run tests.\n\n### Tests\nCurrently work in progress.\nTests are collected in the `tests` folder. To run tests, execute from the root of the project the\ncommand:\n```bash\npython -m unittest -v\n```\n## Contributing \u0026 Future Directions\n\nIf you wish to contribute or suggest any additional funtionalities, please check out [Contributing Guidelines](/CONTRIBUTING.md)\n\n## License\n\n[Apache License 2.0](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcisco-open%2Fmulti-task-learning-library","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcisco-open%2Fmulti-task-learning-library","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcisco-open%2Fmulti-task-learning-library/lists"}