{"id":13671094,"url":"https://github.com/linkedin/venice","last_synced_at":"2025-08-17T01:35:02.423Z","repository":{"id":60144556,"uuid":"349172057","full_name":"linkedin/venice","owner":"linkedin","description":"Venice, Derived Data Platform for Planet-Scale Workloads.","archived":false,"fork":false,"pushed_at":"2025-08-11T20:44:07.000Z","size":60697,"stargazers_count":559,"open_issues_count":71,"forks_count":100,"subscribers_count":29,"default_branch":"main","last_synced_at":"2025-08-11T21:09:01.775Z","etag":null,"topics":["ai","database","hadoop","kafka","ml"],"latest_commit_sha":null,"homepage":"https://venicedb.org","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linkedin.png","metadata":{"files":{"readme":"docs/README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-03-18T18:04:30.000Z","updated_at":"2025-08-11T20:42:08.000Z","dependencies_parsed_at":"2022-09-25T21:27:36.132Z","dependency_job_id":"a648f9db-af32-46cb-aea5-16c3be4420b0","html_url":"https://github.com/linkedin/venice","commit_stats":null,"previous_names":[],"tags_count":1347,"template":false,"template_full_name":null,"purl":"pkg:github/linkedin/venice","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2Fvenice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2Fvenice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2Fvenice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2Fvenice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linkedin","download_url":"https://codeload.github.com/linkedin/venice/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2Fvenice/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270796217,"owners_count":24647319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","database","hadoop","kafka","ml"],"created_at":"2024-08-02T09:00:58.843Z","updated_at":"2025-08-17T01:35:02.405Z","avatar_url":"https://github.com/linkedin.png","language":"Java","funding_links":[],"categories":["Java","大数据"],"sub_categories":[],"readme":"\u003chtml\u003e\n    \u003c!-- We cannot use CSS anywhere in this page, because the GitHub main repo doesn't render it. CSS is fine within the other docs pages though. --\u003e\n    \u003cdiv align=\"center\"\u003e\n        \u003cimg src=\"assets/style/venice_full_lion_logo.svg\" width=\"50%\" alt=\"Venice\"\u003e\n        \u003ch3\u003e\n          Derived Data Platform for Planet-Scale Workloads\u003cbr/\u003e\n        \u003c/h3\u003e\n        \u003cdiv\u003e\n            \u003c!-- N.B.: We've got to leave no spaces within the \u003ca href\u003e tag otherwise we get blue link underlines inbetween the icons on the GitHub repo's main page (though not in the Just The Docs website). --\u003e\n            \u003ca href=\"https://blog.venicedb.org/stable-releases\"\u003e\u003cimg src=\"https://img.shields.io/docker/v/venicedb/venice-router?label=stable\u0026color=green\u0026logo=docker\" alt=\"Stable Release\"\u003e\u003c/a\u003e\n            \u003ca href=\"https://github.com/linkedin/venice/actions?query=branch%3Amain\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/linkedin/venice/VeniceCI-StaticAnalysisAndUnitTests.yml\" alt=\"CI\"\u003e\u003c/a\u003e\n            \u003ca href=\"https://venicedb.org/\"\u003e\u003cimg src=\"https://img.shields.io/badge/docs-grey\" alt=\"Docs\"\u003e\u003c/a\u003e\n        \u003c/div\u003e\n        \u003cdiv\u003e\n            \u003ca href=\"https://github.com/linkedin/venice\"\u003e\u003cimg src=\"https://img.shields.io/badge/github-%23121011.svg?logo=github\u0026logoColor=white\" alt=\"GitHub\"\u003e\u003c/a\u003e\n            \u003ca href=\"https://www.linkedin.com/company/venicedb/\"\u003e\u003cimg src=\"https://img.shields.io/badge/linkedin-%230077B5.svg?logo=linkedin\u0026logoColor=white\" alt=\"LinkedIn\"\u003e\u003c/a\u003e\n            \u003ca href=\"https://twitter.com/VeniceDataBase\"\u003e\u003cimg src=\"https://img.shields.io/badge/Twitter-%231DA1F2.svg?logo=Twitter\u0026logoColor=white\" alt=\"Twitter\"\u003e\u003c/a\u003e\n            \u003ca href=\"http://slack.venicedb.org\"\u003e\u003cimg src=\"https://img.shields.io/badge/Slack-4A154B?logo=slack\u0026logoColor=white\" alt=\"Slack\"\u003e\u003c/a\u003e\n        \u003c/div\u003e\n    \u003c/div\u003e\n\u003c/html\u003e\n\nVenice is a derived data storage platform, providing the following characteristics:\n\n1. High throughput asynchronous ingestion from batch and streaming sources (e.g. [Hadoop](https://github.com/apache/hadoop) and [Samza](https://github.com/apache/samza)).\n2. Low latency online reads via remote queries or in-process caching.\n3. Active-active replication between regions with CRDT-based conflict resolution.\n4. Multi-cluster support within each region with operator-driven cluster assignment.\n5. Multi-tenancy, horizontal scalability and elasticity within each cluster.\n\nThe above makes Venice particularly suitable as the stateful component backing a Feature Store, such as [Feathr](https://github.com/feathr-ai/feathr). \nAI applications feed the output of their ML training jobs into Venice and then query the data for use during online \ninference workloads.\n\n# Overview\nVenice is a system which straddles the offline, nearline and online worlds, as illustrated below.\n\n![High Level Architecture Diagram](assets/images/high_level_architecture.drawio.svg)\n\n## Dependency\n\nYou can add a dependency on Venice to any Java project as specified below. Note that, currently, Venice dependencies are\nnot published on Maven Central and therefore require adding an extra repository definition. All published jars can be\nseen [here](https://linkedin.jfrog.io/ui/native/venice/com/linkedin/venice/). Usually, the project is released a few \ntimes per week.\n\n### Gradle\n\nAdd the following to your `build.gradle`:\n\n```groovy\nrepositories {\n  mavenCentral()\n  maven {\n    name 'VeniceJFrog'\n    url 'https://linkedin.jfrog.io/artifactory/venice'\n  }\n}\n\ndependencies {\n  implementation 'com.linkedin.venice:venice-client:0.4.455'\n}\n```\n\n### Maven\n\nAdd the following to your `pom.xml`:\n\n```xml\n\u003cproject\u003e\n...\n  \u003crepositories\u003e\n    ...\n    \u003crepository\u003e\n      \u003cid\u003evenice-jfrog\u003c/id\u003e\n      \u003cname\u003eVeniceJFrog\u003c/name\u003e\n      \u003curl\u003ehttps://linkedin.jfrog.io/artifactory/venice\u003c/url\u003e\n    \u003c/repository\u003e\n  \u003c/repositories\u003e\n...\n  \u003cdependencies\u003e\n    ...\n    \u003cdependency\u003e\n      \u003cgroupId\u003ecom.linkedin.venice\u003c/groupId\u003e\n      \u003cartifactId\u003evenice-client\u003c/artifactId\u003e\n      \u003cversion\u003e0.4.455\u003c/version\u003e\n      \u003cscope\u003ecompile\u003c/scope\u003e\n    \u003c/dependency\u003e\n  \u003c/dependencies\u003e\n\u003c/project\u003e\n\n```\n\n## APIs\nFrom the user's perspective, Venice provides a variety of read and write APIs. These are fully decoupled from one \nanother, in the sense that no matter which write APIs are used, any of the read APIs are available.\n\nFurthermore, Venice provides a rich spectrum of options in terms of simplicity on one end, and sophistication on the \nother. It is easy to get started with the simpler APIs, and later on decide to enhance the use case via more advanced \nAPIs, either in addition to or instead of the simpler ones. In this way, Venice can accompany users as their \nrequirements evolve, in terms of scale, latency and functionality.\n\nThe following diagram presents these APIs and summarizes the components coming into play to make them work.\n\n![API Overview](assets/images/api_overview.drawio.svg)\n\n### Write Path\n\nThe Venice write path can be broken down into three granularities: full dataset swap, insertion of many rows into an \nexisting dataset, and updates of some columns of some rows. All three granularities are supported by Hadoop and Samza.\nIn addition, any service can asynchronously produce single row inserts and updates as well, using the \n[Online Producer](./user_guide/write_api/online_producer.md) library. The table below summarizes the write operations \nsupported by each platform:\n\n|                                                  | [Hadoop](./user_guide/write_api/push_job.md) | [Samza](./user_guide/write_api/stream_processor.md) | [Any Service](./user_guide/write_api/online_producer.md) |\n|-------------------------------------------------:|:--------------------------------------------:|:---------------------------------------------------:|:--------------------------------------------------------:|\n|                                Full dataset swap |                      ✅                       |                          ✅                          |                                                          |\n|  Insertion of some rows into an existing dataset |                      ✅                       |                          ✅                          |                            ✅                             |\n|             Updates to some columns of some rows |                      ✅                       |                          ✅                          |                            ✅                             |\n\n#### Hybrid Stores\nMoreover, the three granularities of write operations can all be mixed within a single dataset. A dataset which gets \nfull dataset swaps in addition to row insertion or row updates is called _hybrid_.\n\nAs part of configuring a store to be _hybrid_, an important concept is the _rewind time_, which defines how far back \nshould recent real-time writes be rewound and applied on top of the new generation of the dataset getting swapped in.\n\nLeveraging this mechanism, it is possible to overlay the output of a stream processing job on top of that of a batch \njob. If using partial updates, then it is possible to have some of the columns be updated in real-time and some in \nbatch, and these two sets of columns can either overlap or be disjoint, as desired.\n\n#### Write Compute\nWrite Compute includes two kinds of operations, which can be performed on the value associated with a given key:\n\n- **Partial update**: set the content of a field within the value.\n- **Collection merging**: add or remove entries in a set or map.  \n\nN.B.: Currently, write compute is only supported in conjunction with active-passive replication. Support for \nactive-active replication is under development. \n\n### Read Path\n\nVenice supports the following read APIs:\n\n- **Single get**: get the value associated with a single key\n- **Batch get**: get the values associated with a set of keys\n- **Read compute**: project some fields and/or compute some function on the fields of values associated with a set of \n  keys. When using the read compute DSL, the following functions are currently supported:\n  - **Dot product**: perform a dot product on the float vector stored in a given field, against another float vector \n    provided as query param, and return the resulting scalar.\n  - **Cosine similarity**: perform a cosine similarity on the float vector stored in a given field, against another \n    float vector provided as query param, and return the resulting scalar.\n  - **Hadamard product**: perform a Hadamard product on the float vector stored in a given field, against another float \n    vector provided as query param, and return the resulting vector.\n  - **Collection count**: return the number of items in the collection stored in a given field.\n\n#### Client Modes\n\nThere are two main modes for accessing Venice data:\n\n- **Classical Venice** (stateless): You can perform remote queries against Venice's distributed backend service. If \n  using read compute operations in this mode, the queries are pushed down to the backend and only the computation\n  results are returned to the client. There are two clients capable of such remote queries:\n  - **Thin Client**: This is the simplest client, which sends requests to the router tier, which itself sends requests\n    to the server tier.\n  - **Fast Client**: This client is partitioning-aware, and can therefore send requests directly to the correct server\n    instance, skipping the routing tier. Note that this client is still under development and may not be as stable nor\n    at functional parity with the Thin Client.\n- **Da Vinci** (stateful): Alternatively, you can eagerly load some or all partitions of the dataset and perform queries \n  against the resulting local cache. Future updates to the data continue to be streamed in and applied to the local \n  cache.\n\nThe table below summarizes the clients' characteristics:\n\n|                                |  Network Hops  |  Typical latency (p99)  |          State Footprint          |\n|-------------------------------:|:--------------:|:-----------------------:|:---------------------------------:|\n|                    Thin Client |       2        |    \u003c 10 milliseconds    |             Stateless             |\n|                    Fast Client |       1        |    \u003c 2 milliseconds     |  Minimal (routing metadata only)  |\n|    Da Vinci Client (RAM + SSD) |       0        |     \u003c 1 millisecond     | Bounded RAM, full dataset on SSD  |\n|   Da Vinci Client (all-in-RAM) |       0        |    \u003c 10 microseconds    |        Full dataset in RAM        |\n\nAll of these clients share the same read APIs described above. This enables users to make changes to their \ncost/performance tradeoff without needing to rewrite their applications.\n\n# Resources\n\nThe _Open Sourcing Venice_ [blog](https://engineering.linkedin.com/blog/2022/open-sourcing-venice--linkedin-s-derived-data-platform)\nand [conference talk](https://www.youtube.com/watch?v=pJeg4V3JgYo) are good starting points to get an overview of what\nuse cases and scale can Venice support. For more Venice posts, talks and podcasts, see our [Learn More](./user_guide/learn_more.md)\npage.\n\n## Getting Started\nRefer to the [Venice quickstart](./quickstart/quickstart.md) to create your own Venice cluster and play around with some \nfeatures like creating a data store, batch push, incremental push, and single get. We recommend sticking to our latest \n[stable release](https://blog.venicedb.org/stable-releases).\n\n## Community\nFeel free to engage with the community using our:\n\u003c!-- N.B.: The links are duplicated here between the icon and text, otherwise the blue link underline extends into the space, which does not look good. --\u003e\n- [\u003cimg src=\"assets/icons/slack-icon.svg\" width=\"15\" /\u003e](http://slack.venicedb.org) [Slack workspace](http://slack.venicedb.org)\n  - Archived and publicly searchable on [Linen](http://linen.venicedb.org)\n- [\u003cimg src=\"assets/icons/linkedin-icon.svg\" width=\"15\" /\u003e](https://www.linkedin.com/groups/14129519/) [LinkedIn group](https://www.linkedin.com/groups/14129519/)\n- [\u003cimg src=\"assets/icons/github-icon.svg\" width=\"15\" /\u003e](https://github.com/linkedin/venice/issues) [GitHub issues](https://github.com/linkedin/venice/issues)\n- [\u003cimg src=\"assets/icons/github-icon.svg\" width=\"15\" /\u003e](./dev_guide/how_to/how_to.md) [Contributor's guide](./dev_guide/how_to/how_to.md)\n\nFollow us to hear more about the progress of the Venice project and community:\n- [\u003cimg src=\"assets/icons/hashnode-icon.svg\" width=\"15\" /\u003e](https://blog.venicedb.org) [Blog](https://blog.venicedb.org)\n- [\u003cimg src=\"assets/icons/bluesky-icon.svg\" width=\"15\" /\u003e](https://bsky.app/profile/venicedb.org) [Bluesky handle](https://bsky.app/profile/venicedb.org)\n- [\u003cimg src=\"assets/icons/linkedin-icon.svg\" width=\"15\" /\u003e](https://www.linkedin.com/company/venicedb) [LinkedIn page](https://www.linkedin.com/company/venicedb)\n- [\u003cimg src=\"assets/icons/x-icon.svg\" width=\"15\" /\u003e](https://x.com/VeniceDataBase) [X handle](https://x.com/VeniceDataBase)\n- [\u003cimg src=\"assets/icons/youtube-icon.svg\" width=\"15\" /\u003e](https://youtube.com/@venicedb) [YouTube channel](https://youtube.com/@venicedb)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkedin%2Fvenice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinkedin%2Fvenice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkedin%2Fvenice/lists"}