{"id":13605036,"url":"https://github.com/MachineLearningSystem/igniter","last_synced_at":"2025-04-12T02:32:48.774Z","repository":{"id":185461813,"uuid":"583246320","full_name":"MachineLearningSystem/igniter","owner":"MachineLearningSystem","description":"iGniter, an interference-aware GPU resource provisioning framework for achieving predictable performance of DNN inference in the cloud. ","archived":false,"fork":true,"pushed_at":"2022-12-23T08:47:43.000Z","size":88352,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-08-02T19:36:52.900Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"icloud-ecnu/igniter","license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MachineLearningSystem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-12-29T07:44:47.000Z","updated_at":"2022-12-28T00:07:32.000Z","dependencies_parsed_at":null,"dependency_job_id":"989b47cc-980b-4ba2-9225-cb152c3a66c1","html_url":"https://github.com/MachineLearningSystem/igniter","commit_stats":null,"previous_names":["machinelearningsystem/igniter"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Figniter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Figniter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Figniter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2Figniter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MachineLearningSystem","download_url":"https://codeload.github.com/MachineLearningSystem/igniter/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223489720,"owners_count":17153812,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:53.956Z","updated_at":"2024-11-07T09:31:27.719Z","avatar_url":"https://github.com/MachineLearningSystem.png","language":null,"readme":"# iGniter\niGniter, an interference-aware GPU resource provisioning framework for achieving predictable performance of DNN inference in the cloud. \n\n## Prototype of iGniter\n\nOur iGniter framework comprises three pieces of modules: an inference workload placer and a GPU resource allocator as well as an inference performance predictor. With the profiled model coefficients, the inference performance predictor first estimates the inference latency using our performance model. It then guides our GPU resource allocator and inference workload placer to identify an appropriate GPU device with the least performance interference and the guaranteed SLOs from candidate GPUs for each inference workload. According to the cost-efficient GPU resource provisioning plan generated by our algorithm, the GPU device launcher finally builds a GPU cluster and launches the Triton inference serving process for each DNN inference workload on the provisioned GPU devices.\n\n![](images/prototype.png)\n\n## Model the Inference Performance\nThe execution of DNN inference on the GPU can be divided into three sequential steps: data loading, GPU execution, and result feedback. Accordingly, the DNN inference latency can be calculated by summing up the data loading latency, the GPU execution latency, and the result feedback latency, which is formulated as\n\n\u003cdiv align=center\u003e\u003cimg width=\"330\" height=\"37\" src=\"images/inference_latency.png\"/\u003e\u003c/div\u003e\n\nTo improve the GPU resource utilization, the data loading phase overlaps with the GPU execution and result feedback phases in the mainstream DNN inference servers (e.g., Triton). Accordingly, we estimate the DNN inference throughput as \n\n\u003cdiv align=center\u003e\u003cimg width=\"200\" height=\"72\" src=\"images/throughput.png\"/\u003e\u003c/div\u003e\n\nWe calculate the data loading latency and the result feedback latency as\n\n\u003cdiv align=center\u003e\u003cimg width=\"400\" height=\"63\" src=\"images/transfer_latency.png\"/\u003e\u003c/div\u003e\n\nThe GPU execution phase consists of the GPU scheduling delay and kernels running on the allocated SMs. Furthermore, the performance interference can be caused by the reduction of GPU frequency due to the inference workload co-location, which inevitably prolongs the GPU execution phase. Accordingly, we formulate the GPU execution latency as \n\n\u003cdiv align=center\u003e\u003cimg width=\"170\" height=\"66\" src=\"images/GPU_executing_latency.png\"/\u003e\u003c/div\u003e\n\nThe GPU scheduling delay is roughly linear to the number of kernels for a DNN inference workload and there is increased scheduling delay caused by the performance interference on the GPU resource scheduler, which can be estimated as \n\n\u003cdiv align=center\u003e\u003cimg width=\"300\" height=\"53\" src=\"images/scheduling_delay.png\"/\u003e\u003c/div\u003e\n\nGiven a fixed supply of L2 cache space on a GPU device, a higher GPU L2 cache utilization (i.e., demand) indicates severer contention on the GPU L2 cache space, thereby resulting in a longer GPU active time. Accordingly, we estimate the GPU active time as \n\n\u003cdiv align=center\u003e\u003cimg width=\"350\" height=\"65\" src=\"images/GPU_active_time.png\"/\u003e\u003c/div\u003e\n\n\n\n\n## Getting Started\n\n### Requirements\n```\ncd i-Gniter\npython3 -m pip install --upgrade pip\npip install -r requirements.txt\n```\n### Profiling\n\n#### Profiling environment\n\n* Driver Version: 465.19.01  \n* CUDA Version: 11.3   \n* TensorRT Version: 8.0.1.6\n* cuDNN Version: 8.2.0\n\n#### Generating models:\n\n~~~shell\ncd i-Gniter/Profile\npython3 model_onnx.py \n./onnxTOtrt.sh\n~~~\n#### Initializing:\n\n~~~shell\nsource start.sh\n~~~\n#### Computing bandwidth:\n\n~~~shell\ncd tools\npython3 computeBandwidth.py\n~~~\n\n#### Profiling hardware parameters:\n\n~~~shell\n./power_t_freq 1530 # 1530 is the highest frequency of the V100 GPU\n# 1590 is the highest frequency of the T4 GPU\n./coninference\n~~~\n#### Getting the input and output size of different models:\n~~~shell\npython3 getDataSize.py\n~~~\n#### Computing the kernel of different models:\n\n~~~shell\n./l2cache alexnet \n./l2cache resnet50\n./l2cache ssd\n./l2cache vgg19\n~~~\n\n#### Geting the parameters for each model:\n\n `idletime_1`, `activetime_1`,`transferdata` \n\n~~~shell\n./soloinference alexnet\n./soloinference resnet50\n./soloinference ssd\n./soloinference vgg19\n~~~\n\n`activetime_2`,`power_2`,`frequency_2`\n\n~~~shell\n./multiinference alexnet\n./multiinference resnet50\n./multiinference ssd\n./multiinference vgg19\n~~~\n`GPU latency`, `inference latency`, `power`, `frequency`\n~~~shell\n./recordpower.sh alexnet\n./recordpower.sh resnet50\n./recordpower.sh ssd\n./recordpower.sh vgg19\n~~~\n`l2caches`\n~~~\n./model_l2caches.sh alexnet\n./model_l2caches.sh resnet50\n./model_l2caches.sh ssd\n./model_l2caches.sh vgg19 \n~~~\n#### Copying config to Algorithm directory\n~~~\ncp config ../../Algorithm/config\n~~~\n\nThe configured file is shown in `i-Gniter/Algorithm/config`, which is the result of running on the V100 GPU.\n\n### Obtaining the GPU resources provisioning plan\n\n```\ncd i-Gniter/Algorithm\npython3 start.py -f 1590 -p 300 -s 80 #(1590,300,80) is the config of V100 GPU. \npython3 igniter-algorithm.py\n```\nAfter you run the script, you will get the GPU resources provisioning plan, which is a JSON config file. The configuration will specify models, inference arrival rates, SLOs, GPU resources and batches. The file will be used in Performance Measurement part to measuring performance.\n```\n{\n  \"models\": [\"alexnet_dynamic\", \"resnet50_dynamic\", \"vgg19_dynamic\"], \n  \"rates\": [500, 400, 200], \n  \"slos\": [7.5, 20.0, 30.0], \n  \"resources\": [10.0, 30.0, 37.5], \n  \"batches\": [4, 8, 6]\n}\n```\n### Downloading Model Files\nRunning the script to download the model files.\n```\ncd i-Gniter/Launch/model/\n./fetch_models.sh\n```\n\n### Downloading Docker Image From NGC\nWe use the Triton as our inference server. Before you can use the Triton Docker image you must install Docker. In order to use a GPU for inference, you must also install the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker).\n```\ndocker pull nvcr.io/nvidia/tritonserver:21.07-py3\ndocker pull nvcr.io/nvidia/tritonserver:21.07-py3-sdk\n```\n\n### Real Input Data\nYou can provide data to be used with every inference request made by program in a JSON file. The program will use the provided data in a round-robin order when sending inference requests. Skip this section if you want to use random data for inference, otherwise run the following command to generate JSON files from a set of real pictures. You need to prepare your own real pictures. In the addition, the name of JSON files need to be the same as your model name.\n```\ncd i-Gniter/Launch\npython3 data_transfer.py -c 1000 -d /your/pictures/directory -f resnet50_dynamic.json -k actual_input_resnet50 -s 3:224:224\npython3 data_transfer.py -c 1000 -d /your/pictures/directory -f vgg19_dynamic.json    -k actual_input_vgg19    -s 3:224:224\npython3 data_transfer.py -c 1000 -d /your/pictures/directory -f alexnet_dynamic.json  -k actual_input_alexnet  -s 3:224:224\npython3 data_transfer.py -c 558  -d /your/pictures/directory -f ssd_dynamic.json      -k actual_input_ssd      -s 3:300:300\n```\n\n### Performance Measurement\nIf you want to use the random data,\n```\npython3 evaluation.py -t 10 -c ../Algorithm/config_gpu1.json\n```\nIf you want to use the real data,\n```\npython3 evaluation.py -i ./input_data -t 10 -c ../Algorithm/config_gpu1.json\n```\n\n### Understanding the Results\nAfter the program runs, the information and running results of each model will be output on the screen. This slo_vio is expressed as a percentage.\n```\nalexnet_dynamic:10.0 4\n[throughout_per_second, gpu_latency_ms]: (500.0, 6.612)\nslo_vio: 0.05 %\nresnet50_dynamic:30.0 8\n[throughout_per_second, gpu_latency_ms]: (400.0, 18.458)\nslo_vio: 0.01 %\nvgg19_dynamic:37.5 6\n[throughout_per_second, gpu_latency_ms]: (199.2, 27.702)\nslo_vio: 0.0 %\n```\n\n\n\n## Publication\n\nFei Xu, Jianian Xu, Jiabin Chen, Li Chen, Ruitao Shang, Zhi Zhou, Fangming Liu, \"[iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud](https://github.com/icloud-ecnu/igniter/raw/main/pdf/igniter.pdf),\" to appear in IEEE Transactions on Parallel and Distributed Systems, 2022.\n \nWe have uploaded the paper to [arxiv](https://arxiv.org/abs/2211.01713), and we would encourage anybody interested in our work to cite our paper. After our paper is published in an upcoming issue of IEEE TPDS, we'll change the bibliography below within several months.\n\n```\n@misc{xu2022igniter,\n    title={iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud},\n    author={Fei Xu and Jianian Xu and Jiabin Chen and Li Chen and Ruitao Shang and Zhi Zhou and Fangming Liu},\n    year={2022},\n    eprint={2211.01713},\n    archivePrefix={arXiv},\n    primaryClass={cs.DC}\n}\n```\n","funding_links":[],"categories":["Paper-Code"],"sub_categories":["Optimization"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2Figniter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMachineLearningSystem%2Figniter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2Figniter/lists"}