{"id":15014045,"url":"https://github.com/explosion/thinc-apple-ops","last_synced_at":"2025-04-08T03:15:48.796Z","repository":{"id":37813800,"uuid":"318726532","full_name":"explosion/thinc-apple-ops","owner":"explosion","description":"🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library","archived":false,"fork":false,"pushed_at":"2024-10-01T09:43:56.000Z","size":55,"stargazers_count":93,"open_issues_count":0,"forks_count":8,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-28T06:08:06.211Z","etag":null,"topics":["apple","spacy","thinc"],"latest_commit_sha":null,"homepage":"https://thinc.ai","language":"Cython","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/explosion.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-12-05T07:13:25.000Z","updated_at":"2025-03-24T01:45:54.000Z","dependencies_parsed_at":"2024-06-19T00:10:39.579Z","dependency_job_id":"3c988804-5e80-40d6-97d4-b64b0c7a95a8","html_url":"https://github.com/explosion/thinc-apple-ops","commit_stats":{"total_commits":52,"total_committers":5,"mean_commits":10.4,"dds":0.5769230769230769,"last_synced_commit":"848f124a6d2002e4af30ee5d1041659323fe0118"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fthinc-apple-ops","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fthinc-apple-ops/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fthinc-apple-ops/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/explosion%2Fthinc-apple-ops/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/explosion","download_url":"https://codeload.github.com/explosion/thinc-apple-ops/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247767236,"owners_count":20992548,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple","spacy","thinc"],"created_at":"2024-09-24T19:45:06.818Z","updated_at":"2025-04-08T03:15:48.763Z","avatar_url":"https://github.com/explosion.png","language":"Cython","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ca href=\"https://explosion.ai\"\u003e\u003cimg src=\"https://explosion.ai/assets/img/logo.svg\" width=\"125\" height=\"125\" align=\"right\" /\u003e\u003c/a\u003e\n\n# thinc-apple-ops\n\nMake [spaCy](https://spacy.io) and [Thinc](https://thinc.ai) **up to 8 \u0026times; faster**\non macOS by calling into Apple's native libraries.\n\n## ⏳ Install\n\nMake sure you have [Xcode](https://developer.apple.com/xcode/) installed and\nthen install with `pip`:\n\n```bash\npip install thinc-apple-ops\n```\n\n## 🏫 Motivation\n\nMatrix multiplication is one of the primary operations in machine learning.\nSince matrix multiplication is computationally expensive, using a fast matrix\nmultiplication implementation can speed up training and prediction\nsignificantly.\n\nMost linear algebra libraries provide matrix multiplication in the form of the\nstandardized\n[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) `gemm`\nfunctions. The work behind scences is done by a set of matrix multiplication\nkernels that are meticulously tuned for specific architectures. Matrix\nmultiplication kernels use architecture-specific\n[SIMD](https://en.wikipedia.org/wiki/SIMD) instructions for data-level parallism\nand can take factors such as cache sizes and intstruction latency into account.\n[Thinc](https://github.com/explosion/thinc) uses the\n[BLIS](https://github.com/flame/blis) linear algebra library, which provides\noptimized matrix multiplication kernels for most x86_64 and some ARM CPUs.\n\nRecent [Apple Silicon](https://en.wikipedia.org/wiki/Apple_silicon) CPUs, such\nas the [M-series](https://en.wikipedia.org/wiki/Apple_silicon#M_series) used in\nMacs, differ from traditional x86_64 and ARM CPUs in that they have a separate\nmatrix co-processor(s) called AMX. Since AMX is not well-documented, it is\nunclear how many AMX units Apple M CPUs have. It is certain that the (single)\nperformance cluster of the M1 has an AMX unit and there is [empirical\nevidence](https://twitter.com/danieldekok/status/1454383754512945155?s=20) that\nboth performance clusters of the M1 Pro/Max have an AMX unit.\n\n\nEven though AMX units use a set of [undocumented\ninstructions](https://gist.github.com/dougallj/7a75a3be1ec69ca550e7c36dc75e0d6f),\nthe units can be used through Apple's\n[Accelerate](https://developer.apple.com/documentation/accelerate) linear\nalgebra library. Since Accelerate implements the BLAS interface, it can be used\nas a replacement of the BLIS library that is used by Thinc. This is where the\n`thinc-apple-ops` package comes in. `thinc-apple-ops` extends the default Thinc\nops, so that `gemm` matrix multiplication from Accelerate is used in place of\nthe BLIS implementation of `gemm`. As a result, matrix multiplication in Thinc\nis performed on the fast AMX unit(s).\n\n## ⏱ Benchmarks\n\nUsing `thinc-apple-ops` leads to large speedups in prediction and training on\nApple Silicon Macs, as shown by the benchmarks below.\n\n### Prediction\n\nThis first benchmark compares prediction speed of the `de_core_news_lg` spaCy\nmodel between the M1 with and without `thinc-apple-ops`. Results for an Intel\nMac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in\nwords per second. In this prediction benchmark, using `thinc-apple-ops` improves\nperformance by **4.3** times.\n\n| *CPU*                      | *BLIS* | *thinc-apple-ops* | *Package power (Watt)* |\n| -------------------------- | -----: | ----------------: | ---------------------: |\n| Mac Mini (M1)              |   6492 |             27676 |                      5 |\n| MacBook Air Core i5 2020   |   9790 |             10983 |                      9 |\n| Mac Mini Core i7 Late 2018 |  16364 |             14858 |                     31 |\n| AMD Ryzen 5900X            |  22568 |               N/A |                     52 |\n\n### Training\n\nIn the second benchmark, we compare the training speed of the `de_core_news_lg`\nspaCy model (without NER). The results are in training iterations per second.\nUsing `thinc-apple-ops` improves training time by **3.0** times.\n\n| *CPU*                      | *BLIS* | *thinc-apple-ops* | *Package power (Watt)* |\n| -------------------------- | -----: | ----------------: | ---------------------: |\n| Mac Mini M1 2020           |   3.34 |             10.07 |                      5 |\n| MacBook Air Core i5 2020   |   3.10 |              3.27 |                     10 |\n| Mac Mini Core i7 Late 2018 |   4.71 |              4.93 |                     32 |\n| AMD Ryzen 5900X            |   6.53 |               N/A |                     53 |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexplosion%2Fthinc-apple-ops","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexplosion%2Fthinc-apple-ops","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexplosion%2Fthinc-apple-ops/lists"}