{"id":18329930,"url":"https://github.com/noewangjy/cloud-computing","last_synced_at":"2025-07-06T01:34:21.956Z","repository":{"id":193557983,"uuid":"493961300","full_name":"noewangjy/cloud-computing","owner":"noewangjy","description":"Source code and report for course ICE6405P at SJTU","archived":false,"fork":false,"pushed_at":"2022-05-24T10:22:25.000Z","size":13146,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-09-08T18:31:54.686Z","etag":null,"topics":["federated-learning","machine-learning","open-whisk","qemu-kvm","serverless","virtualization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/noewangjy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-05-19T07:13:30.000Z","updated_at":"2023-09-08T18:31:57.702Z","dependencies_parsed_at":null,"dependency_job_id":"996c5166-fdf6-4284-91ce-67e07cbc1407","html_url":"https://github.com/noewangjy/cloud-computing","commit_stats":null,"previous_names":["noewangjy/cloud-computing"],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noewangjy%2Fcloud-computing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noewangjy%2Fcloud-computing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noewangjy%2Fcloud-computing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/noewangjy%2Fcloud-computing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/noewangjy","download_url":"https://codeload.github.com/noewangjy/cloud-computing/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223230407,"owners_count":17110080,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["federated-learning","machine-learning","open-whisk","qemu-kvm","serverless","virtualization"],"created_at":"2024-11-05T19:19:08.541Z","updated_at":"2024-11-05T19:19:10.885Z","avatar_url":"https://github.com/noewangjy.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 云计算\n\n本仓库是上海交通大学ICE6405P-云计算（2021年秋）的项目代码及项目报告。\n\n课程网站请访问[这里](https://tsingz0.github.io/ICE6405P-260-M01/).\n\n作者：[Steve Wang](mailto:steve_wang@sjtu.edu.cn)\n\n## 目录\n\n本仓库的项目如下:\n\n1. [基于KVM的网络虚拟化和虚拟机热迁移](Virtualization)\n2. [基于无服务计算框架的分布式训练及机器学习应用](Serverless_compute)\n   - 任务1: 基于无服务器计算框架的机器学习应用\n   - 任务2：基于无服务器计算框架的分布式训练\n3. [基于云计算平台的联邦学习部署和研究](FedAvg)\n\n## 项目简介\n\n### 1. 基于KVM的网络虚拟化和虚拟机热迁移\n\n在本任务中，我们需要完成以下要求：\n\n1. 基于QEMU/KVM创建一个使用`virtio-net`的虚拟机：\n    - 虚拟机安装CentOS操作系统；\n    - 虚拟机可以通过`virtio-net` 设备访问外部网络；\n\n2. 高性能的虚拟网络是云计算核心之一：\n    - 基于DPDK和OVS，部署`vhost-user` 方案；\n    - 评测`vhost-user` 方案和`virtio-net`方案；\n\n3. 向高性能+多功能的方向优化：\n    - 配置`vhost-user` 和QEMU启动参数，研究网络设备的多队列特性；\n    - 在`vhost-user` 的基础上进行虚拟机热迁移；\n\n本实验的计算资源为：\n\n- 计算平台：\n    - `OS`: macOS Catalina 10.15.7\n    - `Memory`: 32GB\n    - `Processor`: Intel Core i9-9880H @ 2.3GHz * 16\n    - `Platform`: VMware Fusion Pro 11.5.0\n\n- 虚拟机环境：\n    - `OS`: Ubuntu 20.04LTS\n    - `Memory`: 16GB\n    - `Disk Capacity`: 110GB\n    - `Processor`: Intel Core i9-9880H @ 2.3GHz * 8\n\n---\n\n\n\n\n\n### 2. 基于无服务计算框架的分布式训练及机器学习应用\n\n#### 任务1: 基于无服务器计算框架的机器学习应用\n\n首先，我们要选择合适的无服务器计算框架，在此项目中，我们有如下三个选择：\n\n1. AWS Lamda. AWS Lambda是最为广泛应用的无服务器计算框架，AWS为学生提供了免费的使用额度，使我们可以在AWS Lambda上部署我们的计算函数。\n2. OpenWhisk. OpenWhisk是一个开源的无服务器计算框架，我们需要在本地部署 OpenWhisk 框架，并提交框架部署方法记录。\n3. 其他**开源**无服务器计算框架。\n\n在本项目中，我们选择使用开源的无服务器计算框架OpenWhisk。Openwhisk是属于Apache基金会的开源Faas计算平台，由IBM在2016年公布并贡献给开源社区。IBM Cloud本身也提供完全托管的OpenWhisk Faas服务IBM Cloud Function。从业务逻辑来看，OpenWhisk同AWS Lambda一样，为用户提供基于事件驱动的无状态的计算模型，并直接支持多种编程语言。\n\n本章节使用的ML预测性任务为基于CNN的MNIST手写数字辨识项目，我们从`GitHub` 上找到了一个使用`PyTorch`在MNIST数据集上训练好的CNN模型，以及相关的部署代码。由于OpenWhisk框架的限制，所有函数的输入和输出都必须为`.json` 文件，这使得直接传入图片变得有些困难。该代码将ML模型包装成一个WebApp，以便于OpenWhisk的函数在docker容器中可以仅通过URL来获取图片。\n\n在本实验中，我们在一个开源的Docker容器中，下载已经训练好的开源的模型检查点，使用HTTP Server将其加载到模型，并使用OpenWhisk框架进行预测任务。我们的工作流程如下：\n\n1. 从开源项目中搭建Docker镜像；\n2. 在Docker中运行基于flask的WebApp；\n3. 创建OpenWhisk Action；\n4. 启动HTTP Server；\n5. Invoke刚才定义的OpenWhisk Action，传入预测图片的URL；\n6. 获得预测结果。\n\n本实验的计算资源为：\n\n- 计算平台：\n    - `OS`: macOS Catalina 10.15.7\n    - `Memory`: 32GB\n    - `Processor`: Intel Core i9-9880H @ 2.3GHz * 16\n    - `Platform`: VMware Fusion Pro 11.5.0\n\n- 虚拟机环境：\n    - `OS`: Ubuntu 20.04LTS\n    - `Memory`: 16GB\n    - `Disk Capacity`: 110GB\n    - `Processor`: Intel Core i9-9880H @ 2.3GHz * 8\n\n\n#### 任务2：基于无服务器计算框架的分布式训练\n\n在本实验中，我们需要自定义分布式机器学习训练任务，在任务1中选择的无服务器计算框架部署训练任务，并进行分布式训练。在任务1中，我们选择开源的无服务器计算框架OpenWhisk，并选择基于PyTorch实现的MINIST数据集手写数字辨识任务。在本任务中，我们依然选择MNIST手写数字辨识任务，并基于OpenWhisk框架对任务1中的LeNet5模型进行分布式训练。\n\n本实验的计算资源为：\n\n- 计算平台：\n    - `OS`: macOS Catalina 10.15.7\n    - `Memory`: 32GB\n    - `Processor`: Intel Core i9-9880H @ 2.3GHz * 16\n    - `Platform`: VMware Fusion Pro 11.5.0\n\n- 虚拟机环境：\n    - `OS`: Ubuntu 20.04LTS\n    - `Memory`: 16GB\n    - `Disk Capacity`: 110GB\n    - `Processor`: Intel Core i9-9880H @ 2.3GHz * 8\n\n---\n\n### 3. 基于云计算平台的联邦学习部署和研究\n\n在本项目中，我们将参考[这篇论文](https://arxiv.org/pdf/1602.05629.pdf) ，根据其中的`Algorithm 1` 来实现`FedAvg`算法。我们使用[Conda](https://www.anaconda.com/products/individual) 环境中的Python + [PyTorch](https://pytorch.org/) 进行编程，并在[上海交通大学云计算平台](https://home.jcloud.sjtu.edu.cn/) 部署模型进行训练。 我们选择基于PyTorch实现的MNIST手写数字辨识任务，使用`FedAvg`算法来实现在多个客户端的联邦学习。\n\n在本项目中，我们手动实现非独立同分布（Non-IID）的MNIST数据集，并探索不同数量的客户机对算法准确率的影响。我们的模型部署在上海交通大学云计算平台上，所有的模型训练和参数更新都是基于CPU的。在初步的实验中，我们使用循环算法来实现多客户机的训练，在后续的实验中，我们使用Python的多进程模块来进一步实现客户机的并行计算。\n\n本实验的计算资源为：\n\n- 上海交通大学云计算平台\n\n    - A服务器* 1：\n        - `OS`: Ubuntu 20.04 LTS\n        - `Memory`: 128GB\n        - `Processor`: Intel Xeon Processor @ 2.4GHz * 64\n        - `Disk Capacity`: 500GB\n        - `GPU`: None\n\n    - B服务器* 4：\n        - `OS`: Ubuntu 20.04 LTS\n        - `Memory`: 64GB\n        - `Processor`: Intel Xeon Processor @ 2.4GHz * 32\n        - `Disk Capacity`: 500GB\n        - `GPU`: None\n\n- 实验环境：\n    - `Anaconda3-2021.11-Linux-x86_64`\n    - `Python 3.8.12`\n    - `PyTorch 1.10.1`\n\n## 致谢\n\n- 感谢为本仓库提供教程和在实验过程中提供指导和帮助的 [davidliyutong](https://github.com/davidliyutong/)\n\n## 参考资料\n\n- [GitHub - davidliyutonf/ICE6405P-260-M01](https://github.com/davidliyutong/ICE6405P-260-M01/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnoewangjy%2Fcloud-computing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnoewangjy%2Fcloud-computing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnoewangjy%2Fcloud-computing/lists"}