{"id":21922938,"url":"https://github.com/tanguilp/http_cache_store_disk","last_synced_at":"2025-06-10T09:04:14.703Z","repository":{"id":190693843,"uuid":"567298718","full_name":"tanguilp/http_cache_store_disk","owner":"tanguilp","description":"Disk store for http_cache","archived":false,"fork":false,"pushed_at":"2024-03-29T20:32:22.000Z","size":75,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-19T18:21:58.461Z","etag":null,"topics":["elixir","erlang","http-caching"],"latest_commit_sha":null,"homepage":"https://hexdocs.pm/http_cache_store_disk","language":"Erlang","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tanguilp.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-11-17T13:53:49.000Z","updated_at":"2024-06-12T10:06:37.000Z","dependencies_parsed_at":"2024-03-03T15:24:04.248Z","dependency_job_id":"5e7ba60a-e49b-4b62-8f95-7678b7144d62","html_url":"https://github.com/tanguilp/http_cache_store_disk","commit_stats":null,"previous_names":["tanguilp/http_cache_store_disk"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanguilp%2Fhttp_cache_store_disk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanguilp%2Fhttp_cache_store_disk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanguilp%2Fhttp_cache_store_disk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanguilp%2Fhttp_cache_store_disk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tanguilp","download_url":"https://codeload.github.com/tanguilp/http_cache_store_disk/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tanguilp%2Fhttp_cache_store_disk/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259043761,"owners_count":22797159,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elixir","erlang","http-caching"],"created_at":"2024-11-28T21:08:04.063Z","updated_at":"2025-06-10T09:04:14.681Z","avatar_url":"https://github.com/tanguilp.png","language":"Erlang","funding_links":[],"categories":[],"sub_categories":[],"readme":"http_cache_store_disk\n=====\n\n`http_cache_store_disk` is an disk LRU cache that can be used as a backend for `http_cache`.\nIt implements the `http_cache_store` behaviour.\n\nIt supports:\n- on-disk caching, with limit in % of disk usage\n- clustering, using BEAM distribution. The following events are broadcast:\n  - newly cached HTTP responses (in an efficient manner)\n  - invalidation requests\n  - warmup: already present nodes send their most recently used cached HTTP responses to joining nodes\n- telemetry events (see [Telemetry](#telemetry))\n- backpressure mechanisms to avoid overloading the whole system with caching operations\n- the optional `http_cache_store:invalidate_by_alternate_key/1` callback\n\nUnder the hood, it simply saves the HTTP response body on disk as a file (other HTTP caches use other methods)\nand related libraries (such as `plug_http_cache`) use the `sendfile` system call when available.\nThis enables sending files extremely fast because:\n- this avoids going back and forth from userland and kernel. All sending operation is done in the\nkernel, from the file to the socket directly\n- read files are cached in-memory by the kernel. That is, popular content will keep in memory and\nsend to the socket direct, without being reread from the disk. In reality, this implementation is\na **memory + disk** backend for `http_cache`, the memory part being handled directly by the kernel\n\nStored responses are nuked as soon as a configurable disk space occupation threshold is reached.\nIt does not support configuration of a fixed amount of bytes for disk usage, mainly because:\n- the number of bytes does not reflect the disk usage, since file occupy more space that their\nnumber of bytes (see the `--apparent-size` option of the `du` program for instance)\n- the author couldn't actualy have it working, probably for the reason mentioned above\n\nFor in-memory caching, see: [`http_cache_store_memory`](https://github.com/tanguilp/http_cache_store_memory).\n\n## Support\n\nOTP26+\n\n## Usage\n\nThis is an OTP application, and automatically starts.\n\n### Setting the right thresholds\n\nMetadata about HTTP responses written on disk are stored in-memory. The overhead is about 1kb per\nstored response.\n\nTherefore, 1 million stored responses will occupy around 1GB or memory.\n\nWhen using this store as a backend for library that uses the `sendfile` system call,\nsuch as [`plug_http_cache`](https://github.com/tanguilp/plug_http_cache), you should take into\nconsideration that the kernel caches responses in memory. In other words, if caching metadata\ntakes a huge amount of memory (say 99%), then you will not benefit from the automatic caching\nfrom the kernel and files will be read from the disk every time they are sent, resulting in\nslower sending operations.\n\nMemory limit is set to `0.7` for this purpose. If you use very rapid disk (SSD), you might want to\nreconsider this default.\n\n### Configuration parameters\n\n- `cache_dir` [**Mandatory**]: the directory where to store cache data. If it does not exists, it is\ncreated. No default. **Beware**: this directory is irreversibly swept on startup. Don't set `/` or\neven `/tmp`!\n- `disk_limit`: maximum disk usage as a float. Above this limit, oldest objects start being nuked.\nDefaults to `0.92`. Note that some file systems start performing poorly when approaching the 100%\nmark\n- `memory_limit`: how much memory is allocated for metadata.\nIf this is an integer, then it's the number of bytes allocated to store the cached\nresponses. If it is a float, it's the system memory threshold that triggers nuking older entries.\nDefaults to `0.7`, that is, as soon as 70% of the system memory is used, objects are deleted until\nsystem memory use no longer exceeds this threshold\n- `delay_before_delete`: when a cached object is deleted, it's kept on disk for some time to allow\nreading the file content before deletion (for example from your code) and avoid race condition.\nWhen using this backend's API directly, you should always consider the case when the file is deleted\nbefore you can actually read it. Note that the `sendfile` syscall doesn't care if the file is deleted\nwhile being sent (the kernel keeps the file somewhere until it is sent in full). Defaults to\n`1000` ms\n- `max_worker_queue_len`: max number of objects in the workers' mailbox before they start\ndiscarding it. Defaults to 50\n- `cluster_enabled`: exchange of information between nodes of the Erlang cluster is enabled.\nDefaults to `false`\n- `nb_workers`: how many workers are to be working at the same time for adding new cache\nentries (including from remote nodes). Defaults to the number of active schedulers\n- `pull_table_stats_interval`: how often memory stats are retrieved and associated telemetry event\nemitted, in milliseconds. Defaults to `1000`\n- `warmup_nb_objects`: how many objects are sent to joining nodes when they request warm-up.\nDefault to `5000`\n- `warmup_timeout`: how long the warmup process is active, that is it tries to get objects from\njoining nodes, in milliseconds. Default to `20000`\n- `disk_limit_check_interval`: how often to check for disk limit, and trigger LRU nuking when\nexceeded, in milliseconds. Defaults to `60000`. Take under consideration that the Unix `df` program\nis called, so you should not call it too often (\u003c 1 second).\n- `mem_limit_check_interval`: how often to check for memory limit, and trigger LRU nuking when\nexceeded, in milliseconds. Defaults to `1000`.\n- `expired_resp_sweep_interval`: how often expired responses are purged, in milliseconds.\nDefaults to `3000`\n- `outdated_lru_sweep_interval`: how often outdated LRU entries are purged, in milliseconds.\nDefaults to `2000`\n\nThe following options can be modified at runtime:\n- `disk_limit`\n- `memory_limit`\n- `delay_before_delete`\n- `max_worker_queue_len`\n- `pull_table_stats_interval`\n- `disk_limit_check_interval`\n- `mem_limit_check_interval`\n- `expired_resp_sweep_interval`\n- `outdated_lru_sweep_interval`\n\n## Installation\n\nErlang (rebar3):\n\n```erlang\n{deps, [{http_cache_store_disk, \"~\u003e 0.3.0\"}]}.\n```\n\nElixir:\n\n```elixir\n{:http_cache_store_disk, \"~\u003e 0.3.0\"}\n```\n\n## Telemetry\n\n- `[http_cache_store_disk, object_deleted]` is emitted whenever an object is deleted\n  - Measurements: none\n  - Metadata:\n    - `reason`: one of `lru_nuked_memory`, `lru_nuked_disk`, `expired`, `url_invalidation`, `alternate_key_invalidation`\n- `[http_cache_store_disk, memory]` is emitted regularly by the stats service\n  - Measurements:\n    - `total_mem`: total memory used by `http_cache_store_disk` subsystems\n    - `objects_mem`: memory used by `http_cache_store_disk` to store HTTP responses\n    - `lru_mem`: memory used by `http_cache_store_disk` to store LRU data\n    - `objects_count`: number of HTTP responses cached\n  - Metadata: none\n- `[http_cache_store_disk, lru_nuker]`: events triggered by the LRU nuker process\n(uses `telemetry:span/3`)\n- `[http_cache_store_disk, expired_lru_entry_sweeper]`: events triggered by the LRU sweeper process\n(uses `telemetry:span/3`)\n- `[http_cache_store_disk, expired_resp_sweeper]`: events triggered by the outdated response\nsweeper process (uses `telemetry:span/3`)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftanguilp%2Fhttp_cache_store_disk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftanguilp%2Fhttp_cache_store_disk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftanguilp%2Fhttp_cache_store_disk/lists"}