{"id":46520,"url":"https://github.com/olest/awesome-performance","name":"awesome-performance","description":"reading list on software performance","projects_count":186,"last_synced_at":"2026-06-10T08:00:29.559Z","repository":{"id":168036154,"uuid":"643656469","full_name":"olest/awesome-performance","owner":"olest","description":"reading list on software performance","archived":false,"fork":false,"pushed_at":"2026-04-18T22:24:23.000Z","size":49,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-24T17:04:16.401Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/olest.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-05-21T21:00:59.000Z","updated_at":"2026-04-26T19:03:55.000Z","dependencies_parsed_at":"2023-11-21T00:24:17.982Z","dependency_job_id":"f031f82f-d0d1-4a86-9337-91a251e9de3f","html_url":"https://github.com/olest/awesome-performance","commit_stats":null,"previous_names":["olest/awesome-performance"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/olest/awesome-performance","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olest%2Fawesome-performance","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olest%2Fawesome-performance/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olest%2Fawesome-performance/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olest%2Fawesome-performance/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/olest","download_url":"https://codeload.github.com/olest/awesome-performance/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/olest%2Fawesome-performance/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34142643,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"created_at":"2024-01-14T11:14:58.617Z","updated_at":"2026-06-10T08:00:29.559Z","primary_language":null,"list_of_lists":false,"displayable":true,"categories":["Static code analysis","Compilers","Memory","Blogs","Operating systems","Books","Algorithms and data structures","Journal articles or technical reports","Profiling","Machine learning","Concurrency","Benchmarks","Lectures or conference talks","Programming languages","Libraries","SIMD","Tools \u0026 Observability","GPU acceleration","I/O","CPU"],"sub_categories":[],"readme":"# Awesome performance and low-level programming links\n\n## Blogs\n* [Abseil.io - Performance Tips of Week](https://abseil.io/fast/)\n* [Abseil.io - Performance Hints](https://abseil.io/fast/hints.html)\n* [Agner Fog - Software optimization resources](https://www.agner.org/optimize/)\n* [Ahmad Yasin : perf-tools](https://sites.google.com/site/analysismethods/yasin-pubs)\n* [Brendan Gregg's Blog](https://www.brendangregg.com/blog/index.html)\n* [Chris Feilbach's Blog](https://chrisfeilbach.com/2025/07/05/understand-cpu-branch-instructions-better/)\n* [Computer, Enhance!](https://www.computerenhance.com/)\n* [Confessions of a Code Addict](https://blog.codingconfessions.com/p/simultaneous-multithreading)\n* [Daniel Lemire's blog](https://lemire.me/blog/)\n* [EasyPerf - Denis Bakhvalov](https://easyperf.net/notes/)\n* [Israel Ogbole : Profile-Guided Optimization: A Hands-On Guide to Reducing Computational Wastage](https://israelo.io/blog/pgo/)\n* [JabPerf](https://www.jabperf.com/blog/)\n* [John Farrier - For Software Engineers](https://johnfarrier.com/c-performance-checklist-for-low-latency-systems/?utm_source=rss\u0026utm_medium=rss\u0026utm_campaign=c-performance-checklist-for-low-latency-systems)\n* [Johnny's Software Lab](https://johnnysswlab.com/)\n* [justinblank.com](https://justinblank.com/notebooks/benchmarking.html)\n* [marek.ai](https://marek.ai/matrix-multiplication-on-cpu.html)\n* [Martin Ayvazyan - Advanced C++ Optimization Techniques](https://medium.com/@martin00001313/advanced-c-optimization-techniques-for-high-performance-applications-part-3-4602df9284d8)\n* [MattPD's C++ links: performance tools](https://github.com/MattPD/cpplinks/blob/master/performance.tools.md)\n* [Modern Hardware Numbers for System Design Interviews](https://hellointerview.substack.com/p/modern-hardware-numbers-for-system)\n* [n0derunner - platform performance](https://www.n0derunner.com/)\n* [Performance Engineering For Parallel Applications](https://pramodkumbhar.com/)\n* [Performance Engineers Digest](https://substack.com/home/post/p-170692740)\n* [Redpanda blog](https://www.redpanda.com/blog/always-on-production-memory-profiling-seastar)\n* [Software Bits Newsletter](https://softwarebits.substack.com)\n* [Stephan's blog - perf tool examples](https://dollberg.xyz/programming/2016/07/02/perf-tool/)\n* [strlcpy and how CPUs can defy common sense](https://nrk.neocities.org/articles/cpu-vs-common-sense)\n* [The Every Computer Performance Blog](https://rwwescott.wordpress.com/)\n* [uops.info](https://uops.info/)\n* [Wojciech Muła - blog](http://0x80.pl/notesen.html)\n* [purplesyringa - Why performance optimization is hard work](https://purplesyringa.moe/blog/why-performance-optimization-is-hard-work/)\n* [mcyoung - designing a simd algorithm from scratch](https://mcyoung.xyz/2023/11/27/simd-base64)\n* [daniellockyer/awesome-performance](https://github.com/daniellockyer/awesome-performance)\n* [A Short History of Performance Engineering](https://calendar.perfplanet.com/2025/a-short-history-of-performance-engineering/)\n\n## Libraries\n* [fast base64 conversion](https://github.com/simdutf/simdutf)\n* [fast_float](https://github.com/fastfloat/fast_float)\n* [qiti - brings profiling and instrumentation directly into your unit tests](https://github.com/ComposerShield/qiti)\n\n## Books\n* [Algorithms for Modern Hardware](https://en.algorithmica.org/hpc/)\n* [Computer Systems: A Programmer's Perspective](https://csapp.cs.cmu.edu/3e/home.html)\n* [High Performance Browser Networking](https://hpbn.co/)\n* [How to make things faster](https://method-r.com/books/faster/)\n* [Is Parallel Programming Hard, and, if so, what can you do about it?](https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.2022.09.25a.pdf)\n* [Performance Analysis and Tuning on Modern CPUs](https://book.easyperf.net/perf_book)\n\n## SIMD\n* [SIMD a practical guide](https://vectrx.substack.com/p/simd-a-practical-guide)\n* [SIMD for C++ developers](http://const.me/articles/simd/simd.pdf)\n* [SIMD visualizer](https://github.com/piotte13/SIMD-Visualiser)\n\n## Profiling\n* [0x.tools - X-Ray vision for Linux systems](https://0x.tools/)\n* [Advanced usage of last branch records](https://lwn.net/Articles/680996/)\n* [aperf - A CLI tool to gather performance data and visualize using HTML graphs](https://github.com/aws/aperf)\n* [Bloaty: a size profiler for binaries](https://github.com/google/bloaty)\n* [Coz: Finding Code that Counts with Causal Profiling](https://github.com/plasma-umass/coz)\n* [cpplinks - performance](https://github.com/MattPD/cpplinks/blob/master/performance.tools.md)\n* [firefox profiler](https://profiler.firefox.com/)\n* [Flameshow (Terminal Flamegraph viewer)](https://github.com/laixintao/flameshow)\n* [Google performance tools](https://github.com/gperftools/gperftools/)\n* [GWPSan: Sampling-Based Sanitizer Framework](https://github.com/google/gwpsan)\n* [health-check](https://github.com/ColinIanKing/health-check)\n* [hotpath - find and profile bottlenecks in Rust](https://github.com/pawurb/hotpath)\n* [How to enable performance counters in google benchmark](https://github.com/google/benchmark/blob/main/docs/perf_counters.md)\n* [Intel Performance Counter Monitor](https://www.intel.com/content/www/us/en/developer/articles/technical/performance-counter-monitor.html)\n* [Laurence Tratt - four kinds of optimization](https://tratt.net/laurie/blog/2023/four_kinds_of_optimisation.html)\n* [magic-trace](https://github.com/janestreet/magic-trace)\n* [Memray : memory profiler for Python](https://github.com/bloomberg/memray)\n* [MTuner:  C/C++ memory profiler and memory leak finder for Windows, PlayStation 4 and 3, Android and other platforms](https://github.com/milostosic/MTuner)\n* [Performance Myths and Continuous Profiling](https://richardstartin.github.io/posts/perf-myths-and-continuous-profiling)\n* [Performance tuning tutorial](https://github.com/NAThompson/performance_tuning_tutorial)\n* [pmu tools : tools and libraries for profile collection and performance analysis on Intel CPUs](https://github.com/andikleen/pmu-tools)\n* [Profile-Guided Optimization: A Hands-On Guide](https://israelo.io/blog/pgo/)\n* [strace cheatsheet](https://blog.packagecloud.io/strace-cheat-sheet/)\n* [Terminal flame graph](https://github.com/4rtzel/tfg)\n* [Tracy - a hybrid frame and sampling profiler for games and other applications](https://github.com/wolfpld/tracy)\n* [uftrace : function call graph tracer for C, C++, Rust and Python](https://github.com/namhyung/uftrace)\n* [Useful web sites about the Linux perf tools](https://perfwiki.github.io/main/useful-links/)\n* [llvm-mca - LLVM Machine Code Analyzer](https://llvm.org/docs/CommandGuide/llvm-mca.html)\n\n\n## Operating systems\n* [4Kb page size is obsolete](https://ieeexplore.ieee.org/abstract/document/5211562)\n* [Controlling the page cache](https://alg-eng.blogspot.com/?m=1)\n* [How to troubleshoot high I/O wait time in Linux](https://www.site24x7.com/learn/linux/troubleshoot-high-io-wait.html)\n* [Interactive map of the Linux kernel](https://makelinux.github.io/kernel/map/)\n* [io_uring explained (unzip.dev)](https://unzip.dev/0x013-io_uring/)\n* [Learning low-level programming and systems programming](https://github.com/mohitmishra786/amILearningEnough)\n* [linux-insides](https://github.com/0xAX/linux-insides/blob/master/SUMMARY.md)\n* [Modern Microprocessors A 90-Minute Guide!](https://www.lighterra.com/papers/modernmicroprocessors/)\n* [On the cost of syscalls](https://gms.tf/on-the-costs-of-syscalls.html)\n* [Operating Systems: Three Easy Pieces](https://pages.cs.wisc.edu/~remzi/OSTEP/)\n* [Phoronix - Linux Hardware Reviews](https://www.phoronix.com/)\n* [Unwinding the stack the hard way](https://lesenechal.fr/en/linux/unwinding-the-stack-the-hard-way)\n* [vock - lightweight, wrapper-based kernel coverage viewer](https://github.com/kzall0c/vock)\n* [CMU: Introduction to Computer Systems](https://www.cs.cmu.edu/~213/schedule.html)\n\n## Concurrency\n* [core-to-core-latency: A Nice Little Tool!](https://pramodkumbhar.com/2023/09/core-to-core-latency-a-nice-little-tool/)\n* [Measuring CPU core-to-core latency](https://github.com/nviennot/core-to-core-latency)\n* [Why core to core latency matters (JVM)](https://foojay.io/today/why-core-to-core-latency-matters/)\n\n## Compilers\n* [Don't use the likely or unlikely attributes](https://blog.aaronballman.com/2020/08/dont-use-the-likely-or-unlikely-attributes/)\n* [Horrible code - clean performance](https://johnnysswlab.com/horrible-code-clean-performance/)\n* [Intel Implicit SPMD Program Compiler](https://ispc.github.io/)\n* [mold: A Modern Linker](https://github.com/rui314/mold)\n\n## Memory\n* [Are you sure you want to use MMP in your DBMS?](https://db.cs.cmu.edu/mmap-cidr2022/)\n* [Determining whether an application has poor cache performance](https://developers.redhat.com/blog/2014/03/10/determining-whether-an-application-has-poor-cache-performance-2#)\n* [Dmalloc - Debug Malloc Library](https://dmalloc.com/)\n* [Garbage Collection for Systems Programmers](https://bitbashing.io/gc-for-systems-programmers.html)\n* [Heaptrack](https://github.com/KDE/heaptrack)\n* [How Does the Memory Management Unit (MMU) Work with the Unix/Linux Kernel?](https://chessman7.substack.com/p/how-does-the-memory-management-unit)\n* [Huge pages are a good idea](https://www.evanjones.ca/hugepages-are-a-good-idea.html)\n* [Intel Cache Allocation Technology](https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-cache-allocation-technology.html)\n* [JVM field guide memory](https://serce.me/posts/01-02-2023-jvm-field-guide-memory)\n* [Latency numbers every programmer should know](https://gist.github.com/jboner/2841832)\n* [Linux weekly news on Transparent Huge pages](https://lwn.net/Articles/374424/)\n* [malloc_count - Tools for Runtime Memory Usage Analysis and Profiling](https://panthema.net/2013/malloc_count/)\n* [Memory Allocation Strategies - Part 1](https://www.gingerbill.org/article/2019/02/01/memory-allocation-strategies-001/)\n* [Memory allocation](https://samwho.dev/memory-allocation/)\n* [Memory management reading list](https://gist.github.com/simonrenger/d1da2a10d11f8a971fc6f1b574ab3e99)\n* [Pagemon - browse the memory map of an active running process](https://github.com/ColinIanKing/pagemon)\n* [Poul-Henning Kamp - Malloc(3) in modern Virtual Memory environments](https://docs-archive.freebsd.org/44doc/papers/malloc.pdf)\n* [Red Hat: Huge pages and transparent huge pages](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge)\n* [Sam on Memory allocation](https://samwho.dev/memory-allocation/)\n* [Sneaky (transparent) huge pages](https://tbenthompson.com/post/sneaky-transparent-huge-pages/)\n* [snmalloc high-performance allocator](https://github.com/microsoft/snmalloc)\n* [SRE deep dive into page cache](https://biriukov.dev/docs/page-cache/0-linux-page-cache-for-sre/)\n* [TCMalloc and RocksDB](https://blog.cloudflare.com/the-effect-of-switching-to-tcmalloc-on-rocksdb-memory-use/)\n* [Testing Memory Allocators](http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/)\n* [Transparent huge pages](https://www.digitalocean.com/blog/transparent-huge-pages-and-alternative-memory-allocators)\n* [Using Huge Pages on Linux](https://rigtorp.se/hugepages/)\n* [What Every Programmer Should Know About Memory](https://people.freebsd.org/~lstewart/articles/cpumemory.pdf)\n* [Dan Luu - Malloc tutorial](https://danluu.com/malloc-tutorial/)\n* [Cache simulator](https://courses.cs.washington.edu/courses/cse351/cachesim/)\n\n## Benchmarks\n* [A cross-platform C library to retrieve CPU features](https://github.com/google/cpu_features)\n* [All Measurements are Wrong - Guerilla Aphorisms](http://www.perfdynamics.com/Manifesto/gcaprules.html#tth_sEc2.25)\n* [An Extensive Benchmark of C and C++ Hash Tables](https://jacksonallan.github.io/c_cpp_hash_tables_benchmark/)\n* [Performance and Benchmarking - Beyond the Bottleneck: From Classic Systems to Modern AI and HPC](https://github.com/djiangtw/performance-and-benchmarking-public)\n* [AnandTech 2021 SSD Benchmark Suite](https://www.anandtech.com/show/16458/2021-ssd-benchmark-suite)\n* [bonnie++](https://www.coker.com.au/bonnie++/)\n* [Celero](https://github.com/DigitalInBlue/Celero)\n* [Cinebench 2024: Reviewing the Benchmark](https://chipsandcheese.com/2023/10/22/cinebench-2024-reviewing-the-benchmark/)\n* [comprehensive set of IO benchmarks for Linux and OS X](https://github.com/adityaramesh/io_benchmark/)\n* [Flexible I/O Tester](https://github.com/axboe/fio)\n* [Folly - benchmarks](https://github.com/facebook/folly/blob/main/folly/docs/Benchmark.md)\n* [Godbolt's Law](https://xania.org/200504/godbolt's-law)\n* [hyperfine - a command-line benchmarking tool in Rust](https://github.com/sharkdp/hyperfine)\n* [Mastering C++ with Google Benchmark](https://ashvardanian.com/posts/google-benchmark/)\n* [nanobench](https://github.com/martinus/nanobench)\n* [Open benchmarking](https://openbenchmarking.org/)\n* [Open Catalog on best practices for performance](https://github.com/codee-com/open-catalog)\n* [sysbench - scriptable database and system performance benchmark](https://github.com/akopytov/sysbench)\n\n## Algorithms and data structures\n* [Algorithms by Jeff Erickson](https://jeffe.cs.illinois.edu/teaching/algorithms/)\n* [Bitwise binary search](https://orlp.net/blog/bitwise-binary-search/)\n* [Colony - An unordered bucket-like data container providing fast iteration/insertion/erasure](https://plflib.org/colony.htm)\n* [Novel base64 implementation using lookup tables](https://github.com/npodonnell/fast-base64)\n* [Open Data Structures - an open content textbook](https://opendatastructures.org/)\n* [Sort benchmark](https://sortbenchmark.org/)\n* [Data Structures in Practice - A Hardware-Aware Approach for System Software Engineers](https://github.com/djiangtw/data-structures-in-practice-public)\n* [One Billion Row Challenge - C++ Implementation](https://github.com/graphicsMan/1brc)\n\n## Lectures or conference talks\n* [Brendan Gregg: Kernel Recipes 2023 - Fast by Friday: Why Kernel Superpowers are Essential](https://www.youtube.com/watch?v=XudHNF4k_x0)\n* [Casey Muratori: Simple Code, High Performance](https://www.youtube.com/watch?v=Ge3aKEmZcqY)\n* [CppCon 2014: Chander Charruth on 'Efficiency with Algorithms, Performance with Data Structures'](https://youtu.be/fHNmRkzxHWs)\n* [CppCon 2014: Mike Acton on 'Data-Oriented Design'](https://youtu.be/rX0ItVEVjHc)\n* [Kris Jusiak : Performance is not a number](https://kris-jusiak.github.io/talks/cppcon-2025/#/)\n* [Performance Engineering of Software Systems - MIT Open Course Ware](https://ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/)\n* [Software Optimizations Become Simple with Top-Down Analysis on Intel Skylake - Ahmad Yasin](https://www.youtube.com/watch?v=kjufVhyuV_A)\n* [Visualizing Performance - The Developers’ Guide to Flame Graphs • Brendan Gregg • YOW! 2022](https://www.youtube.com/watch?v=VMpTU15rIZY)\n\n## Journal articles or technical reports\n* [John Ousterhout: Always measure one level deeper](https://cacm.acm.org/research/always-measure-one-level-deeper/)\n* [NanoLog: A Nanosecond Scale Logging System](https://www.usenix.org/system/files/conference/atc18/atc18-yang.pdf)\n* [Raasveldt et al: Fair Benchmarking Considered Difficult](https://mytherin.github.io/papers/2018-dbtest.pdf)\n* [Li et al: Eliminate Branches by Melding IR Instructions](https://arxiv.org/abs/2512.22390)\n\n## Static code analysis\n* [Cobra](https://github.com/nimble-code/Cobra/)\n* [Infer](https://github.com/facebook/infer)\n\n## Programming languages\n* [Open Catalog on best practices for performance](https://github.com/codee-com/open-catalog)\n* [Python Speed Center](https://speed.python.org/about/)\n* [Modern C++ Features for high performance low latency systems](https://github.com/leannejdong/lowlat)\n\n## Machine learning\n* [Making Deep Learning Go Brrrr From First Principles](https://horace.io/brrr_intro.html)\n\n## I/O\n* [Userland Disk I/O](https://transactional.blog/how-to-learn/disk-io)\n* [O_DIRECT - The Problem That Grew Up With Multi-Threading](https://zazolabs.com/odirect-the-problem-that-grew-up/)\n\n## GPU acceleration\n* [Advanced NVIDIA CUDA Kernel Optimization Techniques](https://developer.nvidia.com/blog/advanced-nvidia-cuda-kernel-optimization-techniques-handwritten-ptx/)\n* [Basic facts about GPUs](https://damek.github.io/random/basic-facts-about-gpus/)\n* [Huggingface - The Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook)\n* [leetgpu](https://leetgpu.com/)\n* [Outperforming cuBLAS on H100](https://cudaforfun.substack.com/p/outperforming-cublas-on-h100-a-worklog)\n* [Pingpong GEMM from scratch](https://github.com/bertmaher/simplegemm)\n* [Performance Engineering for AI Infra](https://github.com/wafer-ai/gpu-perf-engineering-resources?tab=readme-ov-file)\n* [AMD GPUs go brr](https://hazyresearch.stanford.edu/blog/2025-11-09-amd-brr)\n* [How to scale your model](https://jax-ml.github.io/scaling-book/)\n* [GPU concepts: Visual Learning](https://brrrviz.com/)\n\n## Tools \u0026 Observability\n* [facebookincubator/below : interactive tool to view and record historical system data](https://github.com/facebookincubator/below)\n* [likwid performance tools](https://github.com/RRZE-HPC/likwid?tab=readme-ov-file)\n* [Measuring workloads with toplev](https://github.com/andikleen/pmu-tools/wiki/toplev-manual)\n* [perf: C++23 Performance library](https://github.com/qlibs/perf)\n* [Julia Evans - Profiling and Tracing with perf](https://jvns.ca/perf-zine.pdf)\n* [xCapture v3 :  Linux Performance Analysis with Modern eBPF and DuckDB](https://tanelpoder.com/posts/xcapture-v3-alpha-ebpf-performance-analysis-with-duckdb/)\n\n## CPU\n* [leetcpu](https://www.leetcpu.com/)\n","projects_url":"https://awesome.ecosyste.ms/api/v1/lists/olest%2Fawesome-performance/projects"}