{"id":28183233,"url":"https://github.com/xiangpenghao/liquid-cache","last_synced_at":"2026-02-21T06:15:42.528Z","repository":{"id":280193273,"uuid":"904510168","full_name":"XiangpengHao/liquid-cache","owner":"XiangpengHao","description":"10x lower latency for cloud-native DataFusion","archived":false,"fork":false,"pushed_at":"2025-05-10T19:26:43.000Z","size":4704,"stargazers_count":146,"open_issues_count":49,"forks_count":12,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-05-14T02:58:34.012Z","etag":null,"topics":["arrow","cache","data-analytics","datafusion","object-store","parquet","query-engine"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/XiangpengHao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-17T03:14:56.000Z","updated_at":"2025-05-11T19:55:18.000Z","dependencies_parsed_at":"2025-03-16T05:18:50.256Z","dependency_job_id":"e084b9e4-0172-41e9-9d50-46f10e9fa8b2","html_url":"https://github.com/XiangpengHao/liquid-cache","commit_stats":null,"previous_names":["xiangpenghao/liquid-cache"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XiangpengHao%2Fliquid-cache","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XiangpengHao%2Fliquid-cache/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XiangpengHao%2Fliquid-cache/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/XiangpengHao%2Fliquid-cache/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/XiangpengHao","download_url":"https://codeload.github.com/XiangpengHao/liquid-cache/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254464873,"owners_count":22075572,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","cache","data-analytics","datafusion","object-store","parquet","query-engine"],"created_at":"2025-05-16T04:15:32.735Z","updated_at":"2026-02-21T06:15:42.473Z","avatar_url":"https://github.com/XiangpengHao.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e \u003cimg src=\"https://raw.githubusercontent.com/XiangpengHao/liquid-cache/main/dev/doc/logo.png\" alt=\"liquid_cache_logo\" width=\"450\"/\u003e \u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![Crates.io Version](https://img.shields.io/crates/v/liquid-cache-client?label=liquid-cache-client)](https://crates.io/crates/liquid-cache-client)\n[![Crates.io Version](https://img.shields.io/crates/v/liquid-cache-server?label=liquid-cache-server)](https://crates.io/crates/liquid-cache-server)\n[![docs.rs](https://img.shields.io/docsrs/liquid-cache-client?style=flat\u0026label=client-doc)](https://docs.rs/liquid-cache-client/latest/liquid_cache_client/)\n[![docs.rs](https://img.shields.io/docsrs/liquid-cache-server?style=flat\u0026label=server-doc)](https://docs.rs/liquid-cache-server/latest/liquid_cache_server/)\n\n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e\n\n[![Rust CI](https://github.com/XiangpengHao/liquid-cache/actions/workflows/ci.yml/badge.svg)](https://github.com/XiangpengHao/liquid-cache/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/XiangpengHao/liquid-cache/graph/badge.svg?token=yTeQR2lVnd)](https://codecov.io/gh/XiangpengHao/liquid-cache)\n[![Codacy Badge](https://app.codacy.com/project/badge/Grade/1a23a108cd2b4d2b9ffd2c2258599dfa)](https://app.codacy.com/gh/XiangpengHao/liquid-cache/dashboard?utm_source=gh\u0026utm_medium=referral\u0026utm_content=\u0026utm_campaign=Badge_grade)\n\n\u003c/div\u003e\n\nLiquidCache is a pushdown cache for S3 --\nprojections, filters, and aggregations are evaluated at the cache server before returning data to query engines (e.g., [DataFusion](https://github.com/apache/datafusion)).\n\n## Features\nLiquidCache is a radical redesign of caching: it **caches logical data** rather than its physical representations.\n\nThis means that:\n- LiquidCache transcodes S3 data (e.g., JSON, CSV, Parquet) into an in-house format -- more compressed, more NVMe friendly, more efficient for DataFusion operations. \n- LiquidCache returns filtered/aggregated data to DataFusion, significantly reducing network IO.\n\nCons:\n- LiquidCache is not a transparent cache (consider [Foyer](https://github.com/foyer-rs/foyer) instead), it leverages query semantics to optimize caching. \n## Architecture\n\nBoth LiquidCache and DataFusion run on cloud servers within the same region, but are configured differently:\n\n- LiquidCache often has a memory/CPU ratio of 16:1 (e.g., 64GB memory and 4 cores)\n- DataFusion often has a memory/CPU ratio of 2:1 (e.g., 32GB memory and 16 cores)\n\nMultiple DataFusion nodes share the same LiquidCache instance through network connections. \nEach component can be scaled independently as the workload grows. \n\n\u003cimg src=\"https://raw.githubusercontent.com/XiangpengHao/liquid-cache/main/dev/doc/arch.png\" alt=\"architecture\" width=\"400\"/\u003e\n\n\n## Integrate LiquidCache in 5 Minutes\nCheck out the [examples](https://github.com/XiangpengHao/liquid-cache/tree/main/examples) folder for more details. \n\n\n\n#### 1. Start a Cache Server:\n```rust\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let liquid_cache = LiquidCacheService::new(\n        SessionContext::new(),\n        Some(1024 * 1024 * 1024),               // max memory cache size 1GB\n        Some(tempfile::tempdir()?.into_path()), // disk cache dir\n    );\n\n    let flight = FlightServiceServer::new(liquid_cache);\n\n    Server::builder()\n        .add_service(flight)\n        .serve(\"0.0.0.0:50051\".parse()?)\n        .await?;\n\n    Ok(())\n}\n```\n\nOr use our pre-built docker image:\n```bash\ndocker run -p 50051:50051 -v ~/liquid_cache:/cache \\\n  ghcr.io/xiangpenghao/liquid-cache/liquid-cache-server:latest \\\n  /app/bench_server \\\n  --address 0.0.0.0:50051 \\\n  --disk-cache-dir /cache\n```\n\n#### 2. Connect to the cache server:\nAdd the following dependency to your existing DataFusion project:\n```toml\n[dependencies]\nliquid-cache-client = \"0.1.0\"\n```\n\nThen, create a new DataFusion context with LiquidCache:\n```rust\n#[tokio::main]\npub async fn main() -\u003e Result\u003c()\u003e {\n/*==========================LiquidCache============================*/\n    let ctx = LiquidCacheBuilder::new(cache_server)\n        .with_object_store(ObjectStoreUrl::parse(object_store_url.as_str())?, None)\n        .with_cache_mode(CacheMode::Liquid)\n        .build(SessionConfig::from_env()?)?;\n/*=================================================================*/\n\n    let ctx: Arc\u003cSessionContext\u003e = Arc::new(ctx);\n    ctx.register_table(table_name, ...)\n        .await?;\n    ctx.sql(\u0026sql).await?.show().await?;\n    Ok(())\n}\n```\n\n## Community server\n\nWe run a community server for LiquidCache at \u003chttps://hex.tail0766e4.ts.net:50051\u003e (hosted on Xiangpeng's NAS, use at your own risk).\n\nYou can try it out by running:\n```bash\ncargo run --bin example_client --release -- \\\n    --cache-server https://hex.tail0766e4.ts.net:50051 \\\n    --file \"https://huggingface.co/datasets/HuggingFaceFW/fineweb/resolve/main/data/CC-MAIN-2024-51/000_00042.parquet\" \\\n    --query \"SELECT COUNT(*) FROM \\\"000_00042\\\" WHERE \\\"token_count\\\" \u003c 100\"\n```\n\nExpected output (within a second):\n```\n+----------+\n| count(*) |\n+----------+\n| 44805    |\n+----------+\n```\n\n\n## Run ClickBench \n\n#### 1. Setup the Repository\n```bash\ngit clone https://github.com/XiangpengHao/liquid-cache.git\ncd liquid-cache\n```\n\n#### 2. Run a LiquidCache Server\n```bash\ncargo run --bin bench_server --release\n```\n\n#### 3. Run a ClickBench Client\nIn a different terminal, run the ClickBench client:\n```bash\ncargo run --bin clickbench_client --release -- --query-path benchmark/clickbench/queries.sql --file examples/nano_hits.parquet\n```\n(Note: replace `nano_hits.parquet` with the [real ClickBench dataset](https://github.com/ClickHouse/ClickBench) for full benchmarking)\n\n\n## Development\n\nSee [dev/README.md](./dev/README.md)\n\n## Benchmark\n\nSee [benchmark/README.md](./benchmark/README.md)\n\n## FAQ\n\n#### Can I use LiquidCache in production today?\n\nNot yet. While production readiness is our goal, we are still implementing features and polishing the system.\nLiquidCache began as a research project exploring new approaches to build cost-effective caching systems. Like most research projects, it takes time to mature, and we welcome your help!\n\n#### Does LiquidCache cache data or results?\n\nLiquidCache is a data cache. It caches logically equivalent but physically different data from object storage.\n\nLiquidCache does not cache query results - it only caches data, allowing the same cache to be used for different queries.\n\n#### Nightly Rust, seriously?\n\nWe will transition to stable Rust once we believe the project is ready for production.\n\n#### How does LiquidCache work?\n\nCheck out our [paper](/dev/doc/liquid-cache-vldb.pdf) (under submission to VLDB) for more details. Meanwhile, we are working on a technical blog to introduce LiquidCache in a more accessible way.\n\n#### How can I get involved?\n\nWe are always looking for contributors! Any feedback or improvements are welcome. Feel free to explore the issue list and contribute to the project.\nIf you want to get involved in the research process, feel free to [reach out](https://xiangpeng.systems/work-with-me/).\n\n#### Who is behind LiquidCache?\n\nLiquidCache is a research project funded by:\n- [InfluxData](https://www.influxdata.com/)\n- Taxpayers of the state of Wisconsin and the federal government. \n\nAs such, LiquidCache is and will always be open source and free to use.\n\nYour support for science is greatly appreciated!\n\n## License\n\n[Apache License 2.0](./LICENSE)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxiangpenghao%2Fliquid-cache","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxiangpenghao%2Fliquid-cache","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxiangpenghao%2Fliquid-cache/lists"}