{"id":18096111,"url":"https://github.com/newplan/dml-benchmark","last_synced_at":"2025-04-06T03:23:22.236Z","repository":{"id":129987995,"uuid":"278859863","full_name":"NEWPLAN/DML-benchmark","owner":"NEWPLAN","description":"NASP DML benchmark for automaticlly topology searching","archived":false,"fork":false,"pushed_at":"2020-07-21T06:06:11.000Z","size":418,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-12T09:35:03.567Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NEWPLAN.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-07-11T12:45:10.000Z","updated_at":"2020-12-28T07:19:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"c8bb9439-6f1c-4d7b-a139-499eb6206396","html_url":"https://github.com/NEWPLAN/DML-benchmark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NEWPLAN%2FDML-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NEWPLAN%2FDML-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NEWPLAN%2FDML-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NEWPLAN%2FDML-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NEWPLAN","download_url":"https://codeload.github.com/NEWPLAN/DML-benchmark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247428096,"owners_count":20937406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-31T19:12:39.313Z","updated_at":"2025-04-06T03:23:22.215Z","avatar_url":"https://github.com/NEWPLAN.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DML-benchmark\nNASP DML benchmark for automaticlly topology searching\n\n\n## Roadmap of experiment\n\n### part1:先测单项的比较： \n\t##### A：网络开销：\n\t\t1:启动开销测量与分析\n\t\t\t实验方法：发送等量数据块（通过网络传输的整体数据量一致：1M*256 vs 256M*1），包含不同启动次数，观察整体传输完成时间\n\t\t\t状态：已完成；\n\t\t\t计划：或许还可以再反复测量一下，变换数据块大小和轮数，使得结果更明确\n\t\t2:n-vs-n 通信incast问题与分析带宽下降原因\n\t\t\t实验方法：不停发送等大小的msg （default：16K*128，尝试改变一下？），观测接受端的吞吐变化（1G/10G/40G/100G）；\n\t\t\t挑战：100Gbps下的 1-vs-1双向带宽不满问题（待解决）\n\t\t\t状态：目前已完成第一阶段（基于DCQCN的CC）\n\t\t\t计划：关闭ECN，启用PFC实现流量管理，再测量一下\n\t\t3:1-vs-n接收数据时的blocking waiting开销\n\t\t\t实验方法：发送等大小（1K, 10K, ..., 100M）msg，统计收齐所有子节点发过来的数据的时间开销；\n\t\t\t难点：没办法保证同时发送数据，时钟同步问题（us级别）\n\t\t\t状态：代码部分需要修改（poll-cq部分逻辑）\n\t\t\t计划：//修改为epoll方法非阻塞查询完成队列，发送指定数据量 （未来再说）\n\t\t4:分阶段同步，耦合开销（端到端的只评估网络的开销）：\n\t\t\t实验方法：\n\t\t\t\ta: 变换深度（1-16）（ring），两个点之间传输等大小msg，统计时间；\n\t\t\t\tb: 变换宽度（1-16）（PS），多个点之间传输统计时间；\n\t\t\t\t要求: 每次传输的数据块1M/10M/100M是固定的，达到公平对比的要求。\n\t\t\t状态：未测\n\t\t\t计划：TBA\n\t##### B：merge开销：\n\t\t分析了解不同数据块merge的效率问题 （基本完成，2-16块数据）\n\t\t\t1.1: 固定每个数据块大小，单次合并不同数量的数据块的效率分析（SIMD/cache的原因）\n\t\t\t\t实验方法：变换数据块个数，观察完成合并的时间开销\n\t\t\t\t状态：基本实验结果已经得出，但是展示的是单位加法的时间开销（保留）\n\t\t\t\t计划：绘出（16个节点）完整merge的整体时间开销（接下来）\n\t\t\t\t目的：评估SIMD/cache带来的好处\n\t\t\t1.2:变换数据块的大小，测试merge的效率 （选取典型的数据块个数：2/4/8/16）\n\t\t\t\t实验方法：变换数据块大小，观察完成合并的时间开销\n\t\t\t\t状态：基本测试（1K-100M）单个数据块的大小\n\t\t\t\t计划：绘出（16个节点）完整merge的整体时间开销（接下来）\n### Part2:不同拓扑下的端到端的AllReduce benchmark实验：\n\t实验方法：包含16个节点不同拓扑下的AllReduce时间开销对比（网络利用率）\n\t状态：缺少butterfly/HD模型，目前缺少broadcast的阶段\n\t提升：进一步完善以上要求。\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnewplan%2Fdml-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnewplan%2Fdml-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnewplan%2Fdml-benchmark/lists"}