{"id":13549438,"url":"https://github.com/linkedin/ambry","last_synced_at":"2025-08-17T01:35:07.910Z","repository":{"id":11419679,"uuid":"13870893","full_name":"linkedin/ambry","owner":"linkedin","description":"Distributed object store","archived":false,"fork":false,"pushed_at":"2025-08-13T00:08:05.000Z","size":37842,"stargazers_count":1770,"open_issues_count":145,"forks_count":276,"subscribers_count":124,"default_branch":"master","last_synced_at":"2025-08-13T00:27:39.198Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/linkedin/ambry/wiki","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linkedin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2013-10-25T20:04:54.000Z","updated_at":"2025-08-12T23:34:07.000Z","dependencies_parsed_at":"2023-10-17T04:18:37.146Z","dependency_job_id":"faac568b-95bf-4f47-8e7f-18b93fb25800","html_url":"https://github.com/linkedin/ambry","commit_stats":null,"previous_names":[],"tags_count":1451,"template":false,"template_full_name":null,"purl":"pkg:github/linkedin/ambry","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2Fambry","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2Fambry/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2Fambry/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2Fambry/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linkedin","download_url":"https://codeload.github.com/linkedin/ambry/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkedin%2Fambry/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270796217,"owners_count":24647319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T12:01:21.879Z","updated_at":"2025-08-17T01:35:07.899Z","avatar_url":"https://github.com/linkedin.png","language":"Java","funding_links":[],"categories":["Java","Distributed Filesystem","`Distributed Filesystem `"],"sub_categories":[],"readme":"# Ambry\n\n[![Github Actions CI](https://github.com/linkedin/ambry/actions/workflows/github-actions.yml/badge.svg)](https://github.com/linkedin/ambry/actions/workflows/github-actions.yml)\n[![codecov.io](https://codecov.io/github/linkedin/ambry/branch/master/graph/badge.svg)](https://codecov.io/github/linkedin/ambry)\n[![license](https://img.shields.io/github/license/linkedin/ambry.svg)](LICENSE)\n\nAmbry is a distributed object store that supports storage of trillions of small immutable objects (50K -100K) as well as billions of large objects. It was specifically designed to store and serve media objects in web companies. However, it can be used as a general purpose storage system to store DB backups, search indexes or business reports. The system has the following characterisitics: \n\n1. Highly available and horizontally scalable\n2. Low latency and high throughput\n3. Optimized for both small and large objects\n4. Cost effective\n5. Easy to use\n\nRequires JDK version 1.11 - 1.14.\n\n## Documentation\nDetailed documentation is available at https://github.com/linkedin/ambry/wiki\n\n## Research\nPaper introducing Ambry at [SIGMOD 2016](http://sigmod2016.org/) -\u003e http://dprg.cs.uiuc.edu/data/files/2016/ambry.pdf\n\nReach out to us at ambrydev@googlegroups.com if you would like us to list a paper that is based off of research on Ambry.\n\n## Getting Started\n##### Step 1: Download the code, build it and prepare for deployment.\nTo get the latest code and build it, do\n\n    $ git clone https://github.com/linkedin/ambry.git \n    $ cd ambry\n    $ ./gradlew allJar\n    $ cd target\n    $ mkdir logs\nAmbry uses files that provide information about the cluster to route requests from the frontend to servers and for replication between servers. We will use a simple clustermap that contains a single server with one partition. The partition will use `/tmp` as the mount point.\n##### Step 2: Deploy a server.\n    $ nohup java -Dlog4j2.configurationFile=file:../config/log4j2.xml -jar ambry.jar --serverPropsFilePath ../config/server.properties --hardwareLayoutFilePath ../config/HardwareLayout.json --partitionLayoutFilePath ../config/PartitionLayout.json \u003e logs/server.log \u0026\n\nThrough this command, we configure the log4j properties, provide the server with configuration options and cluster definitions and redirect output to a log. Note down the process ID returned (`serverProcessID`) because it will be needed for shutdown.  \nThe log will be available at `logs/server.log`. Alternately, you can change the log4j properties to write the log messages to a file instead of standard output.\n##### Step 3: Deploy a frontend.\n    $ nohup java -Dlog4j2.configurationFile=file:../config/log4j2.xml -cp \"*\" com.github.ambry.frontend.AmbryFrontendMain --serverPropsFilePath ../config/frontend.properties --hardwareLayoutFilePath ../config/HardwareLayout.json --partitionLayoutFilePath ../config/PartitionLayout.json \u003e logs/frontend.log \u0026\n\nNote down the process ID returned (`frontendProcessID`) because it will be needed for shutdown. Make sure that the frontend is ready to receive requests.\n\n    $ curl http://localhost:1174/healthCheck\n    GOOD\nThe log will be available at `logs/frontend.log`. Alternately, you can change the log4j properties to write the log messages to a file instead of standard output.\n##### Step 4: Interact with Ambry !\nWe are now ready to store and retrieve data from Ambry. Let us start by storing a simple image. For demonstration purposes, we will use an image `demo.gif` that has been copied into the `target` folder.\n###### POST\n    $ curl -i -H \"x-ambry-service-id:CUrlUpload\"  -H \"x-ambry-owner-id:`whoami`\" -H \"x-ambry-content-type:image/gif\" -H \"x-ambry-um-description:Demonstration Image\" http://localhost:1174/ --data-binary @demo.gif\n    HTTP/1.1 201 Created\n    Location: AmbryID\n    Content-Length: 0\nThe CUrl command creates a `POST` request that contains the binary data in demo.gif. Along with the file data, we provide headers that act as blob properties. These include the size of the blob, the service ID, the owner ID and the content type.  \nIn addition to these properties, Ambry also has a provision for arbitrary user defined metadata. We provide `x-ambry-um-description` as user metadata. Ambry does not interpret this data and it is purely for user annotation.\nThe `Location` header in the response is the blob ID of the blob we just uploaded.\n###### GET - Blob Info\nNow that we stored a blob, let us verify some properties of the blob we uploaded.\n\n    $ curl -i http://localhost:1174/AmbryID/BlobInfo\n    HTTP/1.1 200 OK\n    x-ambry-blob-size: {Blob size}\n    x-ambry-service-id: CUrlUpload\n    x-ambry-creation-time: {Creation time}\n    x-ambry-private: false\n    x-ambry-content-type: image/gif\n    x-ambry-owner-id: {username}\n    x-ambry-um-desc: Demonstration Image\n    Content-Length: 0\n###### GET - Blob\nNow that we have verified that Ambry returns properties correctly, let us obtain the actual blob.\n\n    $ curl http://localhost:1174/AmbryID \u003e demo-downloaded.gif\n    $ diff demo.gif demo-downloaded.gif \n    $\nThis confirms that the data that was sent in the `POST` request matches what we received in the `GET`. If you would like to see the image, simply point your browser to `http://localhost:1174/AmbryID` and you should see the image that was uploaded !\n###### DELETE\nAmbry is an immutable store and blobs cannot be updated but they can be deleted in order to make them irretrievable. Let us go ahead and delete the blob we just created.\n\n    $ curl -i -X DELETE http://localhost:1174/AmbryID\n    HTTP/1.1 202 Accepted\n    Content-Length: 0\nYou will no longer be able to retrieve the blob properties or data.\n\n    $ curl -i http://localhost:1174/AmbryID/BlobInfo\n    HTTP/1.1 410 Gone\n    Content-Type: text/plain; charset=UTF-8\n    Content-Length: 17\n    Connection: close\n\n    Failure: 410 Gone\n##### Step 5: Stop the frontend and server.\n    $ kill -15 frontendProcessID\n    $ kill -15 serverProcessID\nYou can confirm that the services have been shut down by looking at the logs.\n##### Additional information:\nIn addition to the simple APIs demonstrated above, Ambry provides support for `GET` of only user metadata and `HEAD`. In addition to the `POST` of binary data that was demonstrated, Ambry also supports `POST` of `multipart/form-data` via CUrl or web forms.\nOther features of interest include:\n* **Time To Live (TTL)**: During `POST`, a TTL in seconds can be provided through the addition of a header named `x-ambry-ttl`. This means that Ambry will stop serving the blob after the TTL has expired. On `GET`, expired blobs behave the same way as deleted blobs.\n* **Private**: During `POST`, providing a header named `x-ambry-private` with the value `true` will mark the blob as private. API behavior can be configured based on whether a blob is public or private.\n\n## Testing\n\u003e**WARNING**: Tests currently can take upwards of 40 minutes to run\n\nAmbry requires [azurite](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=npm%2Cblob-storage)\nand MySQL for testing. To install on MacOS:\n\nazurite:\n```\n$ npm install -g azurite\n$ azurite\n```\n\nmysql:\n```\n$ brew install mysql\n$ brew services start mysql\n$ mysql -uroot\nmysql\u003e CREATE USER 'travis'@'localhost';\nmysql\u003e GRANT ALL PRIVILEGES ON *.* to 'travis'@'localhost';\nmysql\u003e FLUSH PRIVILEGES;\nmysql\u003e CREATE DATABASE AmbryRepairRequests;\nmysql\u003e USE AmbryRepairRequests; SOURCE ./ambry-mysql/src/main/resources/AmbryRepairRequests.ddl;\n```\n\nThen run `./gradlew build` to build and run all unit tests.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkedin%2Fambry","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinkedin%2Fambry","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkedin%2Fambry/lists"}