{"id":21389463,"url":"https://github.com/feifeibear/swgemm","last_synced_at":"2025-10-18T13:02:30.931Z","repository":{"id":85017025,"uuid":"163481480","full_name":"feifeibear/swGEMM","owner":"feifeibear","description":"A highly efficient library for GEMM operations on Sunway TaihuLight","archived":false,"fork":false,"pushed_at":"2020-09-07T08:36:57.000Z","size":579,"stargazers_count":17,"open_issues_count":1,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-07T14:07:06.790Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/feifeibear.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-29T06:06:32.000Z","updated_at":"2024-12-13T02:18:54.000Z","dependencies_parsed_at":"2023-04-09T04:16:40.255Z","dependency_job_id":null,"html_url":"https://github.com/feifeibear/swGEMM","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feifeibear%2FswGEMM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feifeibear%2FswGEMM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feifeibear%2FswGEMM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feifeibear%2FswGEMM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/feifeibear","download_url":"https://codeload.github.com/feifeibear/swGEMM/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feifeibear%2FswGEMM/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259198000,"owners_count":22820153,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-22T12:26:37.722Z","updated_at":"2025-10-18T13:02:30.847Z","avatar_url":"https://github.com/feifeibear.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# swGEMM: a customized GEMM library for swDNN\n\n## Usage:\n### 1. build test/debug case use Makefile\nmake - generate a test case using test.c\nmake ar - generate swblas.a\n\n### 2. build release library: use cmake\nmkdir build\ncd build \u0026\u0026 cmake ..\n\n### 3. use swGEMM in other program - \nlink ./build/libswBLASlib.a  and include cblas.h swblas.h\n\n### 4. debug or unitest\nA test case in ./test/\nsh run.sh $M $K $N\n\n### MACRO:\n-DUSE_RTC count time inside CPE\n-DUSE_COMP without it, you will get DMA time\n-DCHECK_RES check answer with xMath\n\n## API\nvoid sw_sgemm_trans(float* input, float* weight, float* output, int M, int N, int K, int blkM, int blkN, int blkK);\ninput(K, M) * weight(K, N) -\u003e output (N, K)\ninput, weight , output are in 2D matrix (high dim, low dim)\nblkM/N/K are block size on the corresponding dimension.\nRequirments : M and blkM should be 128x, K and blkK should be 8x, N and blkN should be 32x;\n\n## Profile\nsh ./auto_test.sh\npython ./show_raw_data.py\n\n## BUGs Report\n1. use -O1 rather than -O2 for sw_slave_XXX files, otherwise you will get stuck\n2. function name in ./asm should not be too long. For example, dgemmasmnoinit will not pass compilation\n3. If you need to use SIMD inside CPE, you should allocate LDM space with points in type of floatv4*/doublev4\n4. When we use ./build/libswBLASlib.a in other code, accessing MBW map will cause unpredicatable bug! Maybe allocate a large array\nin stack space is not supported very well.\n\n## Warning\nrpcc time is different with timer for eslapse bwteen athread spawn and join.\nif you use rpcc to get time, you will get wrong time in MPE.\nMaybe athread time is large in small case.\n\n## Author\nJiarui Fang [THU and NSCCWX] \u003c\\br\u003e\nfang_jiarui@163.com\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeifeibear%2Fswgemm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffeifeibear%2Fswgemm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeifeibear%2Fswgemm/lists"}