{"id":19756775,"url":"https://github.com/thebracket/gettingfriendlywithcpucaches","last_synced_at":"2025-04-30T12:30:32.664Z","repository":{"id":233622324,"uuid":"669977343","full_name":"thebracket/GettingFriendlyWithCpuCaches","owner":"thebracket","description":"A Rust extension to https://www.ardanlabs.com/blog/2023/07/getting-friendly-with-cpu-caches.html","archived":false,"fork":false,"pushed_at":"2023-07-24T02:50:40.000Z","size":31,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-17T00:01:57.208Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thebracket.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-07-24T02:50:33.000Z","updated_at":"2024-04-17T00:02:01.007Z","dependencies_parsed_at":"2024-04-17T00:01:59.999Z","dependency_job_id":"57ad4b45-7c90-46ad-9ffd-005f497f16cc","html_url":"https://github.com/thebracket/GettingFriendlyWithCpuCaches","commit_stats":null,"previous_names":["thebracket/gettingfriendlywithcpucaches"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thebracket%2FGettingFriendlyWithCpuCaches","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thebracket%2FGettingFriendlyWithCpuCaches/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thebracket%2FGettingFriendlyWithCpuCaches/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thebracket%2FGettingFriendlyWithCpuCaches/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thebracket","download_url":"https://codeload.github.com/thebracket/GettingFriendlyWithCpuCaches/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224207897,"owners_count":17273674,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T03:16:53.606Z","updated_at":"2024-11-12T03:16:54.153Z","avatar_url":"https://github.com/thebracket.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Getting Friendly With CPU Caches\n\nReading [Getting Friendly With CPU Caches](https://www.ardanlabs.com/blog/2023/07/getting-friendly-with-cpu-caches.html), by Miki Tebeka and William Kennedy, inspired me to look at some Rust equivalents.\n\nI've used Criterion for benchmarks, and the final version users the `itertools` crate.\n\n## Techniques\n\n1. [original_slow_go.rs](./src/original_slow_go.rs) - a line-by-line port of the original---sluggish---Go code.\n2. [original_fast_go.rs](./src/original_fast_go.rs) - a line-by-line port of the improved---fast---Go code. `Image` has been turned into a `Box`, a safe (no null pointer issues here) pointer to a heap-allocated `Image` struct.\n3. [idiomatic_rust](./src/idiomatic_rust.rs) takes the code from (2), and replaces the `for` loops with an iterator-based approach. This retains the `HashMap`, countries are still strings---but using an iterator allows the compiler to elide some bounds checks.\n4. [no_map](./src/no_map.rs) removes the `HashMap` completely---because hashing is slow. Instead, it returns a vector of tuples (count, country string).\n5. [no_map_country](./src/no_map_country.rs) is the same as (4), but replaces the country string with a pointer to the static countries list.\n6. [no_map_country_idx](./src/no_map_country_idx.rs) replaces country altogether with an index into the countries list. This could easily be stored separately and re-attached as needed (when returning the user via the API). It'll make your API faster if your client obtains and keeps a country list, too!\n\nAll benchmarks were performed under Windows 11, on a 12th generation Intel Core i7 with 32 gb of RAM.\n\n## Results\n\nTest | Mean Performance\n--- | ---\noriginal_slow_go | 419.24 µs\noriginal_fast_go | 329.51 µs\nidiomatic_rust | 330.13 µs\nno_map | 77.627 µs\nno_map_country | 77.256 µs\nno_map_country_idx | 21.911 µs\n\n![](./graph.png)\n\n## Explanation\n\nThe original article explains the difference between the \"slow\" and \"fast\" Go---the `User` structure shrinks massively by storing a pointer to the image data, allowing for much better cache utilization. Translating the `for` loop into a Rust iterator makes a negligible difference---they compile into very similar code.\n\n`no_map` reasoned that the `HashMap`---in particular hashing values---was taking up a lot of time. Sorting is *very* fast, and `itertools` provides a great `dedup_with_counts` function. Combining the two gives you a `HashMap`-free solution. The speed increase is huge.\n\nI then reasoned that chasing pointers for strings was problematic. The `no_map_country` example offered very little improvement: instead of discrete strings, it reduces memory usage by storing the countries once and pointing to that structure. The performance difference was negligible.\n\nUsing an *index* of the country table is massively faster. The `User` structure is still the same size---a `usize` and a pointer are the same size. But storing just the index removes an entire \"pointer chase\"---the program doesn't have to follow the pointer into the countries table to read the value. It just reads the index. This is a huge win.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthebracket%2Fgettingfriendlywithcpucaches","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthebracket%2Fgettingfriendlywithcpucaches","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthebracket%2Fgettingfriendlywithcpucaches/lists"}