{"id":30577673,"url":"https://github.com/infinitensor/learning-cuda","last_synced_at":"2025-08-29T02:41:58.642Z","repository":{"id":307497945,"uuid":"1029513222","full_name":"InfiniTensor/Learning-CUDA","owner":"InfiniTensor","description":"2025夏季训练营CUDA方向项目","archived":false,"fork":false,"pushed_at":"2025-08-18T08:40:53.000Z","size":111,"stargazers_count":8,"open_issues_count":3,"forks_count":43,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-08-18T10:13:00.294Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InfiniTensor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-31T06:51:58.000Z","updated_at":"2025-08-18T08:40:56.000Z","dependencies_parsed_at":"2025-07-31T17:30:22.123Z","dependency_job_id":null,"html_url":"https://github.com/InfiniTensor/Learning-CUDA","commit_stats":null,"previous_names":["infinitensor/learning-cuda"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/InfiniTensor/Learning-CUDA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2FLearning-CUDA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2FLearning-CUDA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2FLearning-CUDA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2FLearning-CUDA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InfiniTensor","download_url":"https://codeload.github.com/InfiniTensor/Learning-CUDA/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2FLearning-CUDA/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272612273,"owners_count":24964388,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-29T02:00:10.610Z","response_time":87,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-29T02:41:57.503Z","updated_at":"2025-08-29T02:41:58.604Z","avatar_url":"https://github.com/InfiniTensor.png","language":"Makefile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Learning-CUDA\n\n本项目为 2025 年夏季 InfiniTensor 大模型与人工智能系统训练营 CUDA 方向专业阶段的作业系统。\n\n## 📁 项目结构\n\n```text\nlearning-CUDA/\n├── Makefile\n├── README\n├── src\n│   └── kernels.cu\n└── tester\n    ├── tester.o\n    └── utils.h\n```  \n\n## 环境配置\n\n如果你使用的是训练营提供的服务器，则该步骤可直接跳过。\n\n请确保系统已安装以下工具：\n\n1. **CUDA Toolkit**（版本11.0及以上）：\n    - 验证安装：运行`nvcc --version`。\n    - 安装：从[NVIDIA CUDA Toolkit下载页](https://developer.nvidia.com/cuda-downloads)获取。\n2. **GNU Make**：\n    - 验证安装：运行`make --version`（大多数Linux/macOS已预装）。\n\n## 🧠 作业\n\n作业一共有两题。需实现 `src/kernels.cu` 中给定的 **2 个 CUDA  函数** 。\n\n1. **kthLargest**\n\n实现 CUDA 的 kthLargest 函数。给定一个连续的输入数组和非负数 k，返回该数组中第 k 大的数。该函数需支持 int 和 float 两种类型的输入。具体边界处理和一些条件可见文件中的注释。\n  \n2. **flashAttention**\n\n实现 Flash Attention 算子。需支持 causal masking 和 GQA。具体行为与 [torch.nn.functional.scaled_dot_product_attention](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) 保持一致。接口未提供的参数所代表的功能无需支持和实现。具体参数要求请参考文件中的注释。该函数需支持 float。\n\n### 注意事项\n\n1. **禁止抄袭与舞弊**，包括抄袭其他学员的代码和开源实现。可以讨论和参考思路，但禁止直接看/抄代码。一经发现，成绩作废并失去进入项目阶段和后续实习与推荐等资格；\n2. 两个题目都**禁止使用任何库函数**来直接实现关键功能；\n3. 主要计算均需在 GPU 上实现；如有一些信息和程序准备性质的（例如元信息计算/转换、资源准备等）则可以在 CPU/Host 上进行；\n4. 代码风格不限，但需保持一致；\n5. 需进行**适当**的代码注释解释重要部分；\n\n### 提交方式\n在网站 [InfiniTensor 开源社区](https://beta.infinitensor.com/camp/summer2025) 上提交 GitHub 链接，以最新提交为准。\n\n## 🛠️ 编译与运行\n\n  代码编译与运行可以使用提供的 `Makefile` 十分简便的实现。\n\n### 构建与运行指令\n\n使用`Makefile`简化构建流程，以下命令需在**项目根目录**（即 `Makefile` 所在的目录）执行：\n\n#### 默认：构建并运行测试（非 verbose 模式）\n\n直接在命令行使用 `make` 指令编译代码并执行测试，输出简洁结果。\n\n#### 构建并运行测试（verbose 模式）\n\n直接在命令行使用 `make VERBOSE=true` 指令编译代码并执行测试，输出包括执行时间在内的结果。\n\n## 📊 评分规则\n\n本次作业的评分标准如下：\n\n1. **正确性优先**  \n   - 所有提交首先以正确性为前提，需在提供的测试用例中正确输出结果；\n   - 正确性提供基础分：每通过一个测例，获得相应的基础得分；\n   - 未通过的测试用例，不计入性能排名；\n   - 不符合**注意事项**中要求的，不得分。\n\n2. **性能加分**  \n   - 在正确性的基础上，会对各实现的性能进行排名；\n   - 性能越优，获得的额外分数越多；\n   - 性能评判将在提供的服务器上进行，因此请在服务器上进行性能评估。\n\n3. **最终成绩**  \n   - 总体得分由「通过的测试用例数量」与「性能排名加分」共同决定。  \n   - 各测试用例的分数相加，形成最终成绩。  \n\n## 📬 有疑问?\n\n可以在群里直接询问助教！\n\nGood luck and happy coding! 🚀\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitensor%2Flearning-cuda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfinitensor%2Flearning-cuda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitensor%2Flearning-cuda/lists"}