{"id":22911989,"url":"https://github.com/opencoff/portable-lib","last_synced_at":"2025-05-01T09:16:52.431Z","repository":{"id":52060502,"uuid":"91110178","full_name":"opencoff/portable-lib","owner":"opencoff","description":"Portable C, C++ code for hash tables, bloom filters, string-search, string utilities, hash functions, arc4random","archived":false,"fork":false,"pushed_at":"2024-07-30T02:21:18.000Z","size":2859,"stargazers_count":53,"open_issues_count":3,"forks_count":16,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-05-01T09:16:47.230Z","etag":null,"topics":["arc4random","bloom-filter","c","c-plus-plus","c-template","c-templates","cdb","hash-functions","hash-tables","mmap","mpsc","mpsc-queue","portable","queues","spsc","spsc-queue","string-manipulation","templates-in-c"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/opencoff.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-12T16:28:38.000Z","updated_at":"2024-09-09T08:53:07.000Z","dependencies_parsed_at":"2024-02-12T22:47:10.105Z","dependency_job_id":"41197c89-0778-4cc1-9367-d246e2e0d478","html_url":"https://github.com/opencoff/portable-lib","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fportable-lib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fportable-lib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fportable-lib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencoff%2Fportable-lib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/opencoff","download_url":"https://codeload.github.com/opencoff/portable-lib/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251850182,"owners_count":21653978,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arc4random","bloom-filter","c","c-plus-plus","c-template","c-templates","cdb","hash-functions","hash-tables","mmap","mpsc","mpsc-queue","portable","queues","spsc","spsc-queue","string-manipulation","templates-in-c"],"created_at":"2024-12-14T04:19:29.256Z","updated_at":"2025-05-01T09:16:52.408Z","avatar_url":"https://github.com/opencoff.png","language":"C","readme":"====================================\nPortable Library of Useful C/++ code\n====================================\n\nThis directory contains code for many common use cases:\n\n- Bloom filters: Standard, counting and scalable\n- Hash tables: Policy based, super-fast (cache friendly)\n- Variety of good, fast Hash functions\n- Fixed-size memory allocator ('mempools')\n- Typesafe templates in \"C\": Linked lists, vectors, queues\n- Single-Producer, Single-Consumer lock-free bounded queue\n- Multi-Producer, Multi-Consumer lock-free bounded queue\n- Blocking, bounded, producer-consumer queue\n- Thread pool (job-handlers) with CPU affinity using pthreads and a\n  shared queue across the threads.\n- Round-robin work distribution across N threads using pthreads;\n  each thread has its own queue enabling work to be queued to\n  specific threads.\n- Growable, resizable string buffer\n- Collection of random number generators (ARC4Random-chacha20,\n  XORshift, Mersenne-Twister)\n\nAlmost all code is written in Portable C (and some C++).  It is\ntested to work on at least Linux 3.x/4.x, Darwin (Sierra, macOS),\nOpenBSD 5.9/6.0/6.1. Some of the code has been in production use\nfor over a decade.\n\nWhat is available in this code base?\n====================================\n\n- Collection of Bloom filters (Simple, Counting, Scalable). The\n  Bloom filters can be serialized to disk and read back in mmap\n  mode. The serialized code has a strong checksum (SHA256) to\n  maintain the integrity of the data when read back.\n\n  The filters share a common interface for add, query and destructor.\n  A filter specific constructor returns an opaque pointer.\n\n  All the tests were done with false-positive rate of 0.005.\n\n- Multiple implementation of hash tables:\n\n    * Scalable hash table with policy based memory management and\n      locking. It resizes dynamically based on load-factor. It has\n      several iterators to safely traverse the hash-table. This uses\n      a doubly linked list for collision resolution.\n\n    * A very fast, cache-friendly hash table that uses \"linked list of arrays\"\n      for collision resolution. Each such array has 7 elements. The idea\n      is to exploit cache-locality when searching for nodes in the\n      same bucket. If the collision chain is more than 7 elements, a\n      new array of 7 elements is allocated. The hash table uses short\n      \"fingerprints\" of the hash-key to quickly select the array slot.\n\n    * Open addressed hash table that uses a power-of-2 sized bucket\n      list and a smaller power-of-2 sized bucket list for overflow.\n\n- A collection of hash functions for use in hash-tables and other\n  places:\n\n    * FNV\n    * Jenkins\n    * Murmur3\n    * Siphash24\n    * Metrohash\n    * xxHash\n    * Superfast hash\n    * Hsieh hash\n    * Cityhash\n\n  These are benchmarked in the test code *test/t_hashbench.c*.\n\n  If you are going to pick a hash function for use in a hash-table,\n  pick one that uses a seed as initializer. This ensures that your\n  hash table doesn't suffer DoS attacks. All the code I write uses\n  Zilong Tan's superfast hash (*fasthash.c*).\n\n- A portable, thread-safe, user-space implementation of OpenBSD's\n  arc4random(3). This uses per-thread random state to ensure that\n  there are no locks when reading random data.\n\n- Implementation of Xoroshiro, Xorshift+ PRNG (XS64-Star, XS128+,\n  XS1024-Star)\n\n- Wrappers for process and thread affinity -- provides\n  implementations for Linux, OpenBSD and Darwin.\n\n- gstring.h: Growable C strings library\n\n- zbuf.h: Buffered I/O interface to zlib.h; this enables callers to\n  safely call compress/uncompress using user output functions.\n\n- C++ Code:\n\n    * strmatch.h: Templatized implementations of Rabin-Karp,\n      Knuth-Morris-Pratt, Boyer-Moore string match algorithms.\n\n    * mmap.h: Memory mapped file reader and writer; implementations\n      for POSIX and Win32 platforms exist.\n\n- Specialized memory management:\n\n    * arena.h: Object lifetime based memory allocator. Allocate\n      frequently in different sizes, free the entire allocator once.\n\n    * mempool.h: Very fast, fixed size memory allocator\n\n- OSX Darwin specific code:\n\n    * POSIX un-named semaphores (`sem_init(3)`, `sem_wait(3)`, `sem_post(3)`)\n    * Replacement for \u003ctime.h\u003e to include POSIX clock_gettime().\n      This is implemented using Mach APIs (May not be needed post MacOS\n      Sierra).\n\n- Portable routines to read password (POSIX and Win32)\n\n- POSIX compatible wrappers for Win32: mmap(2), pthreads(7),\n  opendir(3), inet_pton(3) and inet_ntop(3), sys/time.h\n\n- Portable implementation of getopt_long(3).\n\nSingle Header Utilities\n-----------------------\n- Templates in \"C\" -- these leverage the pre-processor to create type-safe\n  containers for several common data structures:\n\n    * list.h: Single and Doubly linked list (BSD inspired)\n    * vect.h: Dynamically growable type-safe \"vector\" (array)\n    * queue.h: Fast, bounded FIFO that uses separate read/write\n      pointers\n    * syncq.h: Type-safe, bounded producer/consumer queue. Uses\n      POSIX semaphores and mutexes.\n    * spsc_bounded_queue.h: A single-producer, single-consumer,\n      lock-free queue. Requires C11 (stdatomic.h).\n    * mpmc_bounded_queue.h: Templatized version of Dmitry Vyukov's\n      excellent lock-free algorithm for bounded multiple-producer,\n      multiple-consumer queue. Requires C11 (stdatomic.h).\n      Performance on late 2013 13\" MBP (Core i7, 2.8GHz) with 4\n      Producers and 4 Consumers: 236 cyc/producer, 727 cyc/consumer.\n\n- Portable, inline little-endian/big-endian encode and decode functions\n  for fixed-width ordinal types (u16, u32, u64).\n\n- Arbitrary sized bitset (uses largest available wordsize on the\n  platform).\n\n\nPerformance Measurements\n========================\nSPSC Lock-free Bounded Queue\n----------------------------\nPerformance on a late 2018 15\" MBP (6-Core i9, 2.9GHz):\n    * Q size 1048576: ~120 cyc/producer, ~80 cyc/consumer\n    * Q size 128: ~30 cyc/producer, ~ 29 cyc/consumer\n\nMPMC Lock-free bounded Queue\n----------------------------\nPerformance on a late 2018 15\" MBP (6-Core i9, 2.9GHz):\n    * 6 producers and 6 consumers: ~2300 cyc/producer, ~2400 cyc/consumer\n    * 2 producers and 2 consumer: ~515 cyc/producer, ~550 cyc/consumer\n\nBloom Filters\n-------------\nPerformance on a late 2018 15\" MBP (6-Core i9, 2.9GHz):\n\n    * Standard Bloom filter: 155 cycles/add, 148 cycles/search\n    * Counting Bloom filter: 157 cycles/add, 150 cycles/search\n    * Scalable Bloom filter: 716 cycles/add, 770 cycles/search\n\n\nFast Hash Table\n---------------\nPerformance on a 2022 Core i7 on ChromeOS Linux env:\n\n    * insert if not present: 1093 cycles (490 ns/insert)\n    * find existing element:  80 cycles (362.90 ns/find)\n    * find non-existing element: 235 cycles (252 ns/find)\n    * delete existing element: 112 cycles (367.39 ns/del)\n    * delete non-existent element: 163 element (216 ns/del)\n\nMemory Allocators\n-----------------\nPerformance on a late 2018 15\" MBP (6-Core i9, 2.8GHz):\n    * Arena: ~5700 cycles/alloc\n    * Mempool: 20 cycles/alloc 33M alloc/sec, 19 cycles/free (27M free/sec)\n\nHow is portability achieved?\n============================\nThe code above tries to be portable without use of ``#ifdef`` or\nother pre-processor constructs. In cases where a particular platform\ndoes not provide a required symbol or function, a compatibility\nheader is provided in ``inc/$PLATFORM/``. e.g., Darwin doesn't have\na working POSIX un-named semaphore implementation (``sem_init(3)``);\nthe file ``inc/Darwin/semaphore.h`` provides a working\nimplementation of the API. Thus, any program using un-named\nsemaphores can function without any wrappers or ugly ``ifdef``.\n\nWhile the compatibility functions and symbols are provided via the\nmechanism above, the next question is - \"how does one tailor the\nbuild environment to accommodate these peculiarities?\". This is\nwhere we leverage features of ``make`` to have a conditional build\nenvironment.\n\nGNUmakefile Tricks and Tips\n---------------------------\nThis library comes with a set of ``GNUmake`` fragments and an\nexample top-level ``GNUmakefile`` to make building programs easy.\n\nThese makefiles are written to be cross-platform and incorporates\nmany idioms to make building for multiple platforms possible\n**without** needing the bloated ``configure`` infrastructure.\n\nFor each platform that is supported, ``portablelib.mk`` defines a\nset of macros for that platform like so::\n\n    darwin_incdirs += /opt/local/include /usr/local/include\n    darwin_ldlibs  += /opt/local/lib/libsodium.a\n    darwin_objs    += darwin_cpu.o darwin_sem.o darwin_clock.o\n\n    linux_defs   += -D_GNU_SOURCE=1\n    linux_ldlibs += -lpthread\n    linux_objs   += linux_cpu.o arc4random.o\n\n    openbsd_ldlibs += -L/usr/local/lib -lsodium -lpthread\n    openbsd_objs   += openbsd_cpu.o\n\n\nThen, these flags are used to set ``CFLAGS`` and ``objs`` via\n\"double variable expansion\"  like so::\n\n    os := $(shell uname -s | tr '[A-Z]' '[a-z]')\n\n    INCDIRS = $($(os)_incdirs) $(TOPDIR)/inc/$(os) $(TOPDIR)/inc\n\n    INCS = $(addprefix -I, $(INCDIRS))\n    DEFS = -D__$(os)__=1 $($(os)_defs)\n\n    CFLAGS = -g -O2 $(INCS) $(DEFS)\n    LDFLAGS = $($(os)_ldlibs)\n\n\nIn similar fashion, the list of object files to be built is expanded\nto include platform specific object files.\nThis Makefile feature allows us to separate platform specific\npeculiarities without the mess of ``autoconf`` and ``automake``.\n\nWhat is in the *tools/* subdirectory?\n=====================================\nThe *tools* subdirectory has several utility scripts that are useful\nfor the productive programmer.\n\nmkgetopt.py\n-----------\nThis script generates command line parsing routines from a human readable\nspecification file. For more details, see *tools/mkgetopt-manual.rst*.\nA fully usable example specification is in *tools/example.in*.\n\ndepweed.py\n----------\nParse ``gcc -MM -MD`` output and validate each of the dependents. If\nany dependent file doesn't exist, then the owning ``.d`` file is\ndeleted. This script is most-useful in a GNUmakefile: instead of\n``include $(depfiles)``, one can now do::\n\n    include $(shell depweed.py $(depfiles))\n\nThis makes sure that invalid dependencies never make it into the\nMakefile.\n\nThe sample ``Sample-GNUMakefile`` in the top-dir is a good reference for\nincorporating these ideas and library into a larger program.\n\n.. vim: ft=rst:sw=4:ts=4:expandtab:tw=78:\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencoff%2Fportable-lib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopencoff%2Fportable-lib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencoff%2Fportable-lib/lists"}