https://github.com/olest/awesome-performance
reading list on software performance
https://github.com/olest/awesome-performance
List: awesome-performance
Last synced: about 1 month ago
JSON representation
reading list on software performance
- Host: GitHub
- URL: https://github.com/olest/awesome-performance
- Owner: olest
- Created: 2023-05-21T21:00:59.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-11T21:38:26.000Z (over 1 year ago)
- Last Synced: 2024-11-25T17:02:22.470Z (about 1 year ago)
- Size: 26.4 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-performance - Reading list on software performance. (Other Lists / TeX Lists)
README
# Awesome performance and low-level programming links
## Blogs
* [Abseil.io - Performance Tips of Week](https://abseil.io/fast/)
* [Abseil.io - Performance Hints](https://abseil.io/fast/hints.html)
* [Agner Fog - Software optimization resources](https://www.agner.org/optimize/)
* [Ahmad Yasin : perf-tools](https://sites.google.com/site/analysismethods/yasin-pubs)
* [Brendan Gregg's Blog](https://www.brendangregg.com/blog/index.html)
* [Chris Feilbach's Blog](https://chrisfeilbach.com/2025/07/05/understand-cpu-branch-instructions-better/)
* [Computer, Enhance!](https://www.computerenhance.com/)
* [Confessions of a Code Addict](https://blog.codingconfessions.com/p/simultaneous-multithreading)
* [Daniel Lemire's blog](https://lemire.me/blog/)
* [EasyPerf - Denis Bakhvalov](https://easyperf.net/notes/)
* [Israel Ogbole : Profile-Guided Optimization: A Hands-On Guide to Reducing Computational Wastage](https://israelo.io/blog/pgo/)
* [JabPerf](https://www.jabperf.com/blog/)
* [John Farrier - For Software Engineers](https://johnfarrier.com/c-performance-checklist-for-low-latency-systems/?utm_source=rss&utm_medium=rss&utm_campaign=c-performance-checklist-for-low-latency-systems)
* [Johnny's Software Lab](https://johnnysswlab.com/)
* [justinblank.com](https://justinblank.com/notebooks/benchmarking.html)
* [marek.ai](https://marek.ai/matrix-multiplication-on-cpu.html)
* [Martin Ayvazyan - Advanced C++ Optimization Techniques](https://medium.com/@martin00001313/advanced-c-optimization-techniques-for-high-performance-applications-part-3-4602df9284d8)
* [MattPD's C++ links: performance tools](https://github.com/MattPD/cpplinks/blob/master/performance.tools.md)
* [Modern Hardware Numbers for System Design Interviews](https://hellointerview.substack.com/p/modern-hardware-numbers-for-system)
* [n0derunner - platform performance](https://www.n0derunner.com/)
* [Performance Engineering For Parallel Applications](https://pramodkumbhar.com/)
* [Performance Engineers Digest](https://substack.com/home/post/p-170692740)
* [Redpanda blog](https://www.redpanda.com/blog/always-on-production-memory-profiling-seastar)
* [Software Bits Newsletter](https://softwarebits.substack.com)
* [Stephan's blog - perf tool examples](https://dollberg.xyz/programming/2016/07/02/perf-tool/)
* [strlcpy and how CPUs can defy common sense](https://nrk.neocities.org/articles/cpu-vs-common-sense)
* [The Every Computer Performance Blog](https://rwwescott.wordpress.com/)
* [uops.info](https://uops.info/)
* [Wojciech Muła - blog](http://0x80.pl/notesen.html)
* [purplesyringa - Why performance optimization is hard work](https://purplesyringa.moe/blog/why-performance-optimization-is-hard-work/)
* [mcyoung - designing a simd algorithm from scratch](https://mcyoung.xyz/2023/11/27/simd-base64)
* [daniellockyer/awesome-performance](https://github.com/daniellockyer/awesome-performance)
* [A Short History of Performance Engineering](https://calendar.perfplanet.com/2025/a-short-history-of-performance-engineering/)
## Libraries
* [fast base64 conversion](https://github.com/simdutf/simdutf)
* [fast_float](https://github.com/fastfloat/fast_float)
* [qiti - brings profiling and instrumentation directly into your unit tests](https://github.com/ComposerShield/qiti)
## Books
* [Algorithms for Modern Hardware](https://en.algorithmica.org/hpc/)
* [Computer Systems: A Programmer's Perspective](https://csapp.cs.cmu.edu/3e/home.html)
* [High Performance Browser Networking](https://hpbn.co/)
* [How to make things faster](https://method-r.com/books/faster/)
* [Is Parallel Programming Hard, and, if so, what can you do about it?](https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.2022.09.25a.pdf)
* [Performance Analysis and Tuning on Modern CPUs](https://book.easyperf.net/perf_book)
## SIMD
* [SIMD a practical guide](https://vectrx.substack.com/p/simd-a-practical-guide)
* [SIMD for C++ developers](http://const.me/articles/simd/simd.pdf)
* [SIMD visualizer](https://github.com/piotte13/SIMD-Visualiser)
## Profiling
* [0x.tools - X-Ray vision for Linux systems](https://0x.tools/)
* [Advanced usage of last branch records](https://lwn.net/Articles/680996/)
* [aperf - A CLI tool to gather performance data and visualize using HTML graphs](https://github.com/aws/aperf)
* [Bloaty: a size profiler for binaries](https://github.com/google/bloaty)
* [Coz: Finding Code that Counts with Causal Profiling](https://github.com/plasma-umass/coz)
* [cpplinks - performance](https://github.com/MattPD/cpplinks/blob/master/performance.tools.md)
* [firefox profiler](https://profiler.firefox.com/)
* [Flameshow (Terminal Flamegraph viewer)](https://github.com/laixintao/flameshow)
* [Google performance tools](https://github.com/gperftools/gperftools/)
* [GWPSan: Sampling-Based Sanitizer Framework](https://github.com/google/gwpsan)
* [health-check](https://github.com/ColinIanKing/health-check)
* [hotpath - find and profile bottlenecks in Rust](https://github.com/pawurb/hotpath)
* [How to enable performance counters in google benchmark](https://github.com/google/benchmark/blob/main/docs/perf_counters.md)
* [Intel Performance Counter Monitor](https://www.intel.com/content/www/us/en/developer/articles/technical/performance-counter-monitor.html)
* [Laurence Tratt - four kinds of optimization](https://tratt.net/laurie/blog/2023/four_kinds_of_optimisation.html)
* [magic-trace](https://github.com/janestreet/magic-trace)
* [Memray : memory profiler for Python](https://github.com/bloomberg/memray)
* [MTuner: C/C++ memory profiler and memory leak finder for Windows, PlayStation 4 and 3, Android and other platforms](https://github.com/milostosic/MTuner)
* [Performance Myths and Continuous Profiling](https://richardstartin.github.io/posts/perf-myths-and-continuous-profiling)
* [Performance tuning tutorial](https://github.com/NAThompson/performance_tuning_tutorial)
* [pmu tools : tools and libraries for profile collection and performance analysis on Intel CPUs](https://github.com/andikleen/pmu-tools)
* [Profile-Guided Optimization: A Hands-On Guide](https://israelo.io/blog/pgo/)
* [strace cheatsheet](https://blog.packagecloud.io/strace-cheat-sheet/)
* [Terminal flame graph](https://github.com/4rtzel/tfg)
* [Tracy - a hybrid frame and sampling profiler for games and other applications](https://github.com/wolfpld/tracy)
* [uftrace : function call graph tracer for C, C++, Rust and Python](https://github.com/namhyung/uftrace)
* [Useful web sites about the Linux perf tools](https://perfwiki.github.io/main/useful-links/)
* [llvm-mca - LLVM Machine Code Analyzer](https://llvm.org/docs/CommandGuide/llvm-mca.html)
## Operating systems
* [4Kb page size is obsolete](https://ieeexplore.ieee.org/abstract/document/5211562)
* [Controlling the page cache](https://alg-eng.blogspot.com/?m=1)
* [How to troubleshoot high I/O wait time in Linux](https://www.site24x7.com/learn/linux/troubleshoot-high-io-wait.html)
* [Interactive map of the Linux kernel](https://makelinux.github.io/kernel/map/)
* [io_uring explained (unzip.dev)](https://unzip.dev/0x013-io_uring/)
* [Learning low-level programming and systems programming](https://github.com/mohitmishra786/amILearningEnough)
* [linux-insides](https://github.com/0xAX/linux-insides/blob/master/SUMMARY.md)
* [Modern Microprocessors A 90-Minute Guide!](https://www.lighterra.com/papers/modernmicroprocessors/)
* [On the cost of syscalls](https://gms.tf/on-the-costs-of-syscalls.html)
* [Operating Systems: Three Easy Pieces](https://pages.cs.wisc.edu/~remzi/OSTEP/)
* [Phoronix - Linux Hardware Reviews](https://www.phoronix.com/)
* [Unwinding the stack the hard way](https://lesenechal.fr/en/linux/unwinding-the-stack-the-hard-way)
* [vock - lightweight, wrapper-based kernel coverage viewer](https://github.com/kzall0c/vock)
## Concurrency
* [core-to-core-latency: A Nice Little Tool!](https://pramodkumbhar.com/2023/09/core-to-core-latency-a-nice-little-tool/)
* [Measuring CPU core-to-core latency](https://github.com/nviennot/core-to-core-latency)
* [Why core to core latency matters (JVM)](https://foojay.io/today/why-core-to-core-latency-matters/)
## Compilers
* [Don't use the likely or unlikely attributes](https://blog.aaronballman.com/2020/08/dont-use-the-likely-or-unlikely-attributes/)
* [Horrible code - clean performance](https://johnnysswlab.com/horrible-code-clean-performance/)
* [Intel Implicit SPMD Program Compiler](https://ispc.github.io/)
* [mold: A Modern Linker](https://github.com/rui314/mold)
## Memory
* [Are you sure you want to use MMP in your DBMS?](https://db.cs.cmu.edu/mmap-cidr2022/)
* [Determining whether an application has poor cache performance](https://developers.redhat.com/blog/2014/03/10/determining-whether-an-application-has-poor-cache-performance-2#)
* [Dmalloc - Debug Malloc Library](https://dmalloc.com/)
* [Garbage Collection for Systems Programmers](https://bitbashing.io/gc-for-systems-programmers.html)
* [Heaptrack](https://github.com/KDE/heaptrack)
* [How Does the Memory Management Unit (MMU) Work with the Unix/Linux Kernel?](https://chessman7.substack.com/p/how-does-the-memory-management-unit)
* [Huge pages are a good idea](https://www.evanjones.ca/hugepages-are-a-good-idea.html)
* [Intel Cache Allocation Technology](https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-cache-allocation-technology.html)
* [JVM field guide memory](https://serce.me/posts/01-02-2023-jvm-field-guide-memory)
* [Latency numbers every programmer should know](https://gist.github.com/jboner/2841832)
* [Linux weekly news on Transparent Huge pages](https://lwn.net/Articles/374424/)
* [malloc_count - Tools for Runtime Memory Usage Analysis and Profiling](https://panthema.net/2013/malloc_count/)
* [Memory Allocation Strategies - Part 1](https://www.gingerbill.org/article/2019/02/01/memory-allocation-strategies-001/)
* [Memory allocation](https://samwho.dev/memory-allocation/)
* [Memory management reading list](https://gist.github.com/simonrenger/d1da2a10d11f8a971fc6f1b574ab3e99)
* [Pagemon - browse the memory map of an active running process](https://github.com/ColinIanKing/pagemon)
* [Poul-Henning Kamp - Malloc(3) in modern Virtual Memory environments](https://docs-archive.freebsd.org/44doc/papers/malloc.pdf)
* [Red Hat: Huge pages and transparent huge pages](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge)
* [Sam on Memory allocation](https://samwho.dev/memory-allocation/)
* [Sneaky (transparent) huge pages](https://tbenthompson.com/post/sneaky-transparent-huge-pages/)
* [snmalloc high-performance allocator](https://github.com/microsoft/snmalloc)
* [SRE deep dive into page cache](https://biriukov.dev/docs/page-cache/0-linux-page-cache-for-sre/)
* [TCMalloc and RocksDB](https://blog.cloudflare.com/the-effect-of-switching-to-tcmalloc-on-rocksdb-memory-use/)
* [Testing Memory Allocators](http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/)
* [Transparent huge pages](https://www.digitalocean.com/blog/transparent-huge-pages-and-alternative-memory-allocators)
* [Using Huge Pages on Linux](https://rigtorp.se/hugepages/)
* [What Every Programmer Should Know About Memory](https://people.freebsd.org/~lstewart/articles/cpumemory.pdf)
* [Dan Luu - Malloc tutorial](https://danluu.com/malloc-tutorial/)
## Benchmarks
* [A cross-platform C library to retrieve CPU features](https://github.com/google/cpu_features)
* [All Measurements are Wrong - Guerilla Aphorisms](http://www.perfdynamics.com/Manifesto/gcaprules.html#tth_sEc2.25)
* [An Extensive Benchmark of C and C++ Hash Tables](https://jacksonallan.github.io/c_cpp_hash_tables_benchmark/)
* [AnandTech 2021 SSD Benchmark Suite](https://www.anandtech.com/show/16458/2021-ssd-benchmark-suite)
* [bonnie++](https://www.coker.com.au/bonnie++/)
* [Celero](https://github.com/DigitalInBlue/Celero)
* [Cinebench 2024: Reviewing the Benchmark](https://chipsandcheese.com/2023/10/22/cinebench-2024-reviewing-the-benchmark/)
* [comprehensive set of IO benchmarks for Linux and OS X](https://github.com/adityaramesh/io_benchmark/)
* [Flexible I/O Tester](https://github.com/axboe/fio)
* [Folly - benchmarks](https://github.com/facebook/folly/blob/main/folly/docs/Benchmark.md)
* [Godbolt's Law](https://xania.org/200504/godbolt's-law)
* [hyperfine - a command-line benchmarking tool in Rust](https://github.com/sharkdp/hyperfine)
* [Mastering C++ with Google Benchmark](https://ashvardanian.com/posts/google-benchmark/)
* [nanobench](https://github.com/martinus/nanobench)
* [Open benchmarking](https://openbenchmarking.org/)
* [Open Catalog on best practices for performance](https://github.com/codee-com/open-catalog)
* [sysbench - scriptable database and system performance benchmark](https://github.com/akopytov/sysbench)
## Algorithms and data structures
* [Algorithms by Jeff Erickson](https://jeffe.cs.illinois.edu/teaching/algorithms/)
* [Bitwise binary search](https://orlp.net/blog/bitwise-binary-search/)
* [Colony - An unordered bucket-like data container providing fast iteration/insertion/erasure](https://plflib.org/colony.htm)
* [Novel base64 implementation using lookup tables](https://github.com/npodonnell/fast-base64)
* [Open Data Structures - an open content textbook](https://opendatastructures.org/)
* [Sort benchmark](https://sortbenchmark.org/)
* [Data Structures in Practice - A Hardware-Aware Approach for System Software Engineers](https://github.com/djiangtw/data-structures-in-practice-public)
## Lectures or conference talks
* [Brendan Gregg: Kernel Recipes 2023 - Fast by Friday: Why Kernel Superpowers are Essential](https://www.youtube.com/watch?v=XudHNF4k_x0)
* [Casey Muratori: Simple Code, High Performance](https://www.youtube.com/watch?v=Ge3aKEmZcqY)
* [CppCon 2014: Chander Charruth on 'Efficiency with Algorithms, Performance with Data Structures'](https://youtu.be/fHNmRkzxHWs)
* [CppCon 2014: Mike Acton on 'Data-Oriented Design'](https://youtu.be/rX0ItVEVjHc)
* [Kris Jusiak : Performance is not a number](https://kris-jusiak.github.io/talks/cppcon-2025/#/)
* [Performance Engineering of Software Systems - MIT Open Course Ware](https://ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/)
* [Software Optimizations Become Simple with Top-Down Analysis on Intel Skylake - Ahmad Yasin](https://www.youtube.com/watch?v=kjufVhyuV_A)
## Journal articles or technical reports
* [John Ousterhout: Always measure one level deeper](https://cacm.acm.org/research/always-measure-one-level-deeper/)
* [NanoLog: A Nanosecond Scale Logging System](https://www.usenix.org/system/files/conference/atc18/atc18-yang.pdf)
* [Raasveldt et al: Fair Benchmarking Considered Difficult](https://mytherin.github.io/papers/2018-dbtest.pdf)
* [Li et al: Eliminate Branches by Melding IR Instructions](https://arxiv.org/abs/2512.22390)
## Static code analysis
* [Cobra](https://github.com/nimble-code/Cobra/)
* [Infer](https://github.com/facebook/infer)
## Programming languages
* [Open Catalog on best practices for performance](https://github.com/codee-com/open-catalog)
* [Python Speed Center](https://speed.python.org/about/)
## Machine learning
* [Making Deep Learning Go Brrrr From First Principles](https://horace.io/brrr_intro.html)
## I/O
* [Userland Disk I/O](https://transactional.blog/how-to-learn/disk-io)
## GPU acceleration
* [Advanced NVIDIA CUDA Kernel Optimization Techniques](https://developer.nvidia.com/blog/advanced-nvidia-cuda-kernel-optimization-techniques-handwritten-ptx/)
* [Basic facts about GPUs](https://damek.github.io/random/basic-facts-about-gpus/)
* [Huggingface - The Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook)
* [leetgpu](https://leetgpu.com/)
* [Outperforming cuBLAS on H100](https://cudaforfun.substack.com/p/outperforming-cublas-on-h100-a-worklog)
* [Pingpong GEMM from scratch](https://github.com/bertmaher/simplegemm)
## Tools & Observability
* [facebookincubator/below : interactive tool to view and record historical system data](https://github.com/facebookincubator/below)
* [likwid performance tools](https://github.com/RRZE-HPC/likwid?tab=readme-ov-file)
* [Measuring workloads with toplev](https://github.com/andikleen/pmu-tools/wiki/toplev-manual)
* [perf: C++23 Performance library](https://github.com/qlibs/perf)
* [xCapture v3 : Linux Performance Analysis with Modern eBPF and DuckDB](https://tanelpoder.com/posts/xcapture-v3-alpha-ebpf-performance-analysis-with-duckdb/)