{"id":26149462,"url":"https://github.com/dlzou/computron","last_synced_at":"2025-04-14T03:52:07.555Z","repository":{"id":180865643,"uuid":"625683778","full_name":"dlzou/computron","owner":"dlzou","description":"Serving distributed deep learning models with model parallel swapping.","archived":false,"fork":false,"pushed_at":"2023-06-19T20:03:59.000Z","size":2202,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-27T17:51:49.532Z","etag":null,"topics":["deep-learning","inference-server","model-parallelism"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dlzou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-04-09T21:28:42.000Z","updated_at":"2024-07-27T10:47:16.000Z","dependencies_parsed_at":"2023-07-13T06:35:13.467Z","dependency_job_id":null,"html_url":"https://github.com/dlzou/computron","commit_stats":null,"previous_names":["dlzou/computron"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlzou%2Fcomputron","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlzou%2Fcomputron/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlzou%2Fcomputron/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dlzou%2Fcomputron/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dlzou","download_url":"https://codeload.github.com/dlzou/computron/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248819354,"owners_count":21166474,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","inference-server","model-parallelism"],"created_at":"2025-03-11T05:32:06.058Z","updated_at":"2025-04-14T03:52:07.530Z","avatar_url":"https://github.com/dlzou.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Computron\n\n## Abstract\n\nMany of the most performant deep learning models today in fields like language and image understanding are fine-tuned models that contain billions of parameters. In anticipation of workloads that involve serving many of such large models to handle different tasks, we develop Computron, a system that uses memory swapping to serve multiple distributed models on a shared GPU cluster. Computron implements a model parallel swapping design that takes advantage of the aggregate CPU-GPU link bandwidth of a cluster to speed up model parameter transfers. This design makes swapping large models feasible and can improve resource utilization. We demonstrate that Computron successfully parallelizes model swapping on multiple GPUs, and we test it on randomized workloads to show how it can tolerate real world variability factors like burstiness and skewed request rates.\n\n## Installation for Development\n\nClone this repository and its submodules:\n\n```shell\ngit clone --recurse-submodules git@github.com:dlzou/computron.git\n```\n\nCreate an environment, install torch and Colossal-AI from PIP, then install Energon-AI and AlpaServe from the included submodules. Finally, install Computron from source.\n\n```shell\nconda create -n computron python=3.10\nconda activate computron\npip install torch==1.13 torchvision colossalai transformers\npip install -e energonai/\npip install -e alpa_serve/\npip install -e .\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdlzou%2Fcomputron","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdlzou%2Fcomputron","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdlzou%2Fcomputron/lists"}