{"id":16315380,"url":"https://github.com/kimwalisch/libpopcnt","last_synced_at":"2025-04-05T22:08:10.666Z","repository":{"id":48382608,"uuid":"74994204","full_name":"kimwalisch/libpopcnt","owner":"kimwalisch","description":"🚀 Fast C/C++ bit population count library","archived":false,"fork":false,"pushed_at":"2024-06-29T09:50:09.000Z","size":203,"stargazers_count":320,"open_issues_count":0,"forks_count":38,"subscribers_count":23,"default_branch":"master","last_synced_at":"2024-10-11T21:57:21.732Z","etag":null,"topics":["avx2","avx512","c","cpp","neon","popcnt","popcount","simd","sve"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kimwalisch.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"kimwalisch"}},"created_at":"2016-11-28T16:55:14.000Z","updated_at":"2024-09-15T17:41:45.000Z","dependencies_parsed_at":"2024-06-25T17:44:16.265Z","dependency_job_id":"05aa121c-fd10-4261-8cb2-6b17231eed35","html_url":"https://github.com/kimwalisch/libpopcnt","commit_stats":null,"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kimwalisch%2Flibpopcnt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kimwalisch%2Flibpopcnt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kimwalisch%2Flibpopcnt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kimwalisch%2Flibpopcnt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kimwalisch","download_url":"https://codeload.github.com/kimwalisch/libpopcnt/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247406090,"owners_count":20933803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avx2","avx512","c","cpp","neon","popcnt","popcount","simd","sve"],"created_at":"2024-10-10T21:57:18.790Z","updated_at":"2025-04-05T22:08:10.643Z","avatar_url":"https://github.com/kimwalisch.png","language":"C","funding_links":["https://github.com/sponsors/kimwalisch"],"categories":["Miscellaneous","Recently Updated","C","C++"],"sub_categories":["[Dec 26, 2024](/content/2024/12/26/README.md)"],"readme":"# libpopcnt\n\n[![Build status](https://github.com/kimwalisch/libpopcnt/actions/workflows/ci.yml/badge.svg)](https://github.com/kimwalisch/libpopcnt/actions/workflows/ci.yml)\n[![Github Releases](https://img.shields.io/github/release/kimwalisch/libpopcnt.svg)](https://github.com/kimwalisch/libpopcnt/releases)\n\n```libpopcnt.h``` is a header-only C/C++ library for counting the\nnumber of 1 bits (bit population count) in an array as quickly as\npossible using specialized CPU instructions i.e.\n[POPCNT](https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT),\n[AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions),\n[AVX512](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions),\n[NEON](https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)),\n[SVE](https://en.wikipedia.org/wiki/AArch64#Scalable_Vector_Extension_(SVE)).\n```libpopcnt.h``` has been tested successfully using the GCC,\nClang and MSVC compilers.\n\n## C/C++ API\n\n```C\n#include \"libpopcnt.h\"\n\n/*\n * Count the number of 1 bits in the data array\n * @data: An array\n * @size: Size of data in bytes\n */\nuint64_t popcnt(const void* data, uint64_t size);\n```\n\n## How to compile\n\n```libpopcnt.h``` does not require any special compiler flags like ```-mavx2```!\nTo get the best performance we only recommend to compile with\noptimizations enabled e.g. ```-O3``` or ```-O2```.\n\n```bash\ncc  -O3 program.c\nc++ -O3 program.cpp\n```\n\n## CPU architectures\n\n```libpopcnt.h``` has hardware accelerated popcount algorithms for\nthe following CPU architectures:\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cb\u003ex86\u003c/b\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ccode\u003ePOPCNT\u003c/code\u003e, \u003ccode\u003eAVX2\u003c/code\u003e, \u003ccode\u003eAVX512\u003c/code\u003e\u003c/td\u003e \n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cb\u003ex86-64\u003c/b\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ccode\u003ePOPCNT\u003c/code\u003e, \u003ccode\u003eAVX2\u003c/code\u003e, \u003ccode\u003eAVX512\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cb\u003eARM\u003c/b\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ccode\u003eNEON\u003c/code\u003e, \u003ccode\u003eSVE\u003c/code\u003e\u003c/td\u003e \n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cb\u003ePPC64\u003c/b\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ccode\u003ePOPCNTD\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\nFor other CPU architectures a fast integer popcount algorithm is used.\n\n## How it works\n\nOn x86 CPUs, ```libpopcnt.h``` first queries your CPU's supported\ninstruction sets using the ```CPUID``` instruction (this is done only once).\nThen ```libpopcnt.h``` chooses the fastest bit population count algorithm\nsupported by your CPU:\n\n* If the CPU supports ```AVX512``` the ```AVX512 VPOPCNT``` algorithm is used.\n* Else if the CPU supports ```AVX2``` the ```AVX2 Harley Seal``` algorithm is used.\n* Else if the CPU supports ```POPCNT``` the ```POPCNT``` algorithm is used.\n* For CPUs without ```POPCNT``` instruction a portable integer algorithm is used.\n\nNote that ```libpopcnt.h``` works on all CPUs (x86, ARM, PPC, WebAssembly, ...).\nIt is portable by default and hardware acceleration is only enabled if the CPU\nsupports it. ```libpopcnt.h``` it is also thread-safe.\n\nWe take performance seriously, if you compile using e.g. ```-march=native```\non an x86 CPU with AVX512 support then all runtime ```CPUID``` checks are removed!\n\n## ARM SVE (Scalable Vector Extension)\n\nARM SVE is a new vector instruction set for ARM CPUs that was first released in\n2020. ARM SVE supports a variable vector length from 128 to 2048 bits. Hence\nARM SVE algorithms can be much faster than ARM NEON algorithms which are limited\nto 128 bits vector length.\n\nlibpopcnt's new ARM SVE popcount algorithm is up to 3x faster than its ARM NEON\npopcount algorithm (on AWS Graviton3  CPUs). Unfortunately runtime dispatching to\nARM SVE is not yet well supported by the GCC and Clang compilers and libc's.\nTherefore, by default only the (portable) ARM NEON popcount algorithm is enabled\nwhen using libpopcnt on ARM CPUs.\n\nTo enable libpopcnt's ARM SVE popcount algorithm you need to compile your program\nusing your compiler's ARM SVE option e.g.:\n\n```bash\ngcc -O3 -march=armv8-a+sve program.c\ng++ -O3 -march=armv8-a+sve program.cpp\n```\n\n## Development\n\n```bash\ncmake .\nmake -j\nmake test\n```\n\nThe above commands also build the ```benchmark``` program which is\nuseful for benchmarking ```libpopcnt.h```. Below is a\nusage example run on an AMD EPYC 9R14 CPU from 2023:\n\n```bash\n# Usage: ./benchmark [array bytes] [iters]\n./benchmark\nIters: 10000000\nArray size: 16.00 KB\nAlgorithm: AVX512\nStatus: 100%\nSeconds: 1.23\n133.5 GB/s\n```\n\n## Acknowledgments\n\nSome of the algorithms used in ```libpopcnt.h``` are described in the paper\n[Faster Population Counts using AVX2 Instructions](https://arxiv.org/abs/1611.07612)\nby Daniel Lemire, Nathan Kurz and Wojciech Mula (23 Nov 2016). The AVX2 Harley Seal\npopcount algorithm used in ```libpopcnt.h``` has been copied from Wojciech Muła's\n[sse-popcount](https://github.com/WojciechMula/sse-popcount) GitHub repo.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkimwalisch%2Flibpopcnt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkimwalisch%2Flibpopcnt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkimwalisch%2Flibpopcnt/lists"}