{"id":13568634,"url":"https://github.com/tpoisonooo/chgemm","last_synced_at":"2025-08-08T14:06:50.884Z","repository":{"id":48174250,"uuid":"196927741","full_name":"tpoisonooo/chgemm","owner":"tpoisonooo","description":"symmetric int8 gemm","archived":false,"fork":false,"pushed_at":"2020-06-07T07:09:04.000Z","size":436,"stargazers_count":66,"open_issues_count":1,"forks_count":12,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-16T00:51:49.761Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Assembly","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tpoisonooo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-07-15T05:07:58.000Z","updated_at":"2024-07-30T09:24:22.000Z","dependencies_parsed_at":"2022-09-14T12:31:17.957Z","dependency_job_id":null,"html_url":"https://github.com/tpoisonooo/chgemm","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tpoisonooo%2Fchgemm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tpoisonooo%2Fchgemm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tpoisonooo%2Fchgemm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tpoisonooo%2Fchgemm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tpoisonooo","download_url":"https://codeload.github.com/tpoisonooo/chgemm/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":231742828,"owners_count":18419857,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T14:00:29.535Z","updated_at":"2024-12-29T17:43:15.282Z","avatar_url":"https://github.com/tpoisonooo.png","language":"Assembly","funding_links":[],"categories":["Example Implementations 💡"],"sub_categories":["Blogs 🖋️"],"readme":"# chgemm\n---\n![License](https://img.shields.io/badge/license-BSD--3--Clause-blue.svg) ![build](https://travis-ci.org/tpoisonooo/chgemm.svg?branch=master)\n\nchgemm is an symmetric int8 project, which is slightly different from BLAS sgemm:\n1. when you input an int8_t type of matrix [-127,+127], you will get an int32_t one. PS: pay attention to the overflow;\n2. considering the application scene of the deeep learning, the packAB interface is open and can be adjusted;\n3. the common design plan is `alpha*A*B+beta*C=C`, but mine is `C=A*B`, because they have no utility in deep learning inference;\n4. row major;\n5. the speed of this project is quicker than any other projects'.\n\nchgemm 是一个 int8 gemm 工程，与 BLAS gemm 不完全相同：\n\n1. 输入为 [-127, +127] 范围内的 int8_t 类型矩阵，输出为 int32_t 矩阵。需注意溢出；\n2. 更多地为深度学习应用场景考虑，packAB 接口暴露出来可以调整；\n3. 实现为 C = A * B。alpha 和 beta 在深度学习推理中无实用意义；\n4. 行主序实现，放弃远古 fortran 时代的列主序；\n5. 不低于其他项目的 symmint8 gemm 速度。\n\n---\n### test result\nCompiled on RK3399 with `-O3` flag. The current peek can be 18.6 gflops, and the orange line is the single-core fp32 limit(14.3 gflops). \n\n### 速度\n-O3 编译，目前在 rk3399 单核结果。目前极限可以到 18.6 gflops，橙线是 rk3399 单核 fp32 极限。 在 aws A72 单核测试约 23 gflops，是此实现方法的极限（发挥 100% 性能）。\n\n![尺寸和gflops结果](0.png)\n\n---\n### 使用方式\n1. 修改`makefile`中的`OLD`和`NEW`挑选不同实现方式。首次运行需要`OLD`和`NEW`是同一个\n2. `make run` 即输出速度结果\n3. `parameters.h`可修改测试参数\n\n### 集成方式\n参照 MMult_4x8_21.c 调用矩阵乘法，将代码嵌入到自己的项目中。可根据推理库的实现做相应修改。\n\n### application with chgemm inside\nchgemm is pleased to support [ncnn](https://github.com/tencent/ncnn) available, check [gemm_symm_int8.h](https://github.com/Tencent/ncnn/blob/master/src/layer/arm/gemm_symm_int8.h).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftpoisonooo%2Fchgemm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftpoisonooo%2Fchgemm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftpoisonooo%2Fchgemm/lists"}