{"id":19029577,"url":"https://github.com/guardian/pluto-storagetier","last_synced_at":"2026-03-01T14:34:45.738Z","repository":{"id":37775059,"uuid":"469835178","full_name":"guardian/pluto-storagetier","owner":"guardian","description":"Multi-tiered storage management for pluto","archived":false,"fork":false,"pushed_at":"2025-09-17T15:10:57.000Z","size":2922,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-01-30T01:22:22.585Z","etag":null,"topics":["asset-management","media-asset-management","multimedia-tech","object-matrix","vidispine"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/guardian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-14T17:21:30.000Z","updated_at":"2025-07-25T12:17:31.000Z","dependencies_parsed_at":"2023-02-09T13:31:31.548Z","dependency_job_id":"8bd8dca2-d208-4702-a98f-5aae494a2099","html_url":"https://github.com/guardian/pluto-storagetier","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/guardian/pluto-storagetier","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guardian%2Fpluto-storagetier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guardian%2Fpluto-storagetier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guardian%2Fpluto-storagetier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guardian%2Fpluto-storagetier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/guardian","download_url":"https://codeload.github.com/guardian/pluto-storagetier/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guardian%2Fpluto-storagetier/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29970997,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T14:11:48.712Z","status":"ssl_error","status_checked_at":"2026-03-01T14:11:48.352Z","response_time":124,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asset-management","media-asset-management","multimedia-tech","object-matrix","vidispine"],"created_at":"2024-11-08T21:14:35.371Z","updated_at":"2026-03-01T14:34:45.715Z","avatar_url":"https://github.com/guardian.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pluto-storagetier\n\n## What is it?\n\nThis is a monorepo, which contains a number of subprojects.  They are all pure-backend Scala projects, intended\nto run against JVM 1.8.\n\nTaken together, they are the components which allow pluto to move media from one storage tier to another.\n\nEach specific transfer has its own component, i.e. online -\u003e archive is one component, online -\u003e nearline, etc.\nA \"component\" in this case is a standalone app, built out as a Docker image which consists of the relevant\nJVM and a collection of JARs with all the dependencies etc.\n\n## How is it deployed?\n\nThe project is currently hosted on Gitlab and uses Gitlab's CI process to automatically build a deployable image\non every merge request and update to the main branch.\n\nAll tests are run as part of this process and dependencies are monitored via Snyk (https://snyk.io).  If either the tests\nfail or a security vulnerability is introduced through dependencies, then the build will fail and no image will be\noutput at the end of it.\n\nThese are numbered based on the iteration, and the final images are then pushed across to our private Docker image\nrepo hosted in AWS.  This is configured for \"immutable images\", i.e. once a given iteration number has been pushed it\ncan't be over-written and a new one must be provided.\n\nThey are then deployed via manifests to our Kubernertes system which uses rotated credentials to access them.  As with\nother \"prexit\" components, `pluto-versions-manager` can be used to interrogate the builds and quickly deploy updates.\n\n## How do I build it locally?\n\nThe applications deploy to Kubernetes via Docker images. In order to build the project, you'll need:\n- a JDK implementation at version 1.8 (later will not work with the MatrixStore libraries)\n- sbt, the scala build tool (or an alternative like intellij)\n- Docker on your workstation to build the images\n\nThen it's as simple as:\n```\n$ sbt\nsbt:pluto-storagetier\u003e projects\n[info] In file:/Users/andy_gallagher/workdev/pluto-storagetier/\n[info] \t   common\n[info] \t   mxscopy\n[info] \t   nearline_archive\n[info] \t   online_archive\n[info] \t * pluto-storagetier\nsbt:pluto-storagetier\u003e project nearline_archive\n[info] set current project to nearline_archive (in build file:/Users/andy_gallagher/workdev/pluto-storagetier/)\nsbt:nearline_archive\u003e test\n.\n.\n.\n.\nsbt:nearline_archive\u003e docker:publishLocal\n```\n\nto run tests then compile and build a local version.  Dependencies are resolved and downloaded prior to compilation, so\ninternet access is required; and using a 3G connection definitely not recommended!\n\nThis should ultimately output a Docker image onto your local\nDocker daemon as `guardianmultimedia/storagetier-online-nearline:DEV` which can then be run via\n`docker run --rm  guardianmultimedia/storagetier-online-nearline:DEV`.\n\nIf you do so, however, it will most likely fail complaining that it needs configuration for rabbitmq, pluto-core, vidispine\nor somesuch.\n\nIf you want to run locally it's best to first install prexit-local (https://gitlab.com/codmill/customer-projects/guardian/prexit-local)\nwhich builds into a minikube environment and provides all of these elements.\n\nWith that installed, you can point your Docker configuration at the minikube instance and build on there:\n```\n$ $(minikube docker-env)  #point the local docker client to the minikube docker daemon\n$ sbt\nsbt:pluto-storagetier\u003e projects\n[info] In file:/Users/andy_gallagher/workdev/pluto-storagetier/\n[info] \t   common\n[info] \t   mxscopy\n[info] \t   nearline_archive\n[info] \t   online_archive\n[info] \t * pluto-storagetier\nsbt:pluto-storagetier\u003e project nearline_archive\n[info] set current project to nearline_archive (in build file:/Users/andy_gallagher/workdev/pluto-storagetier/)\nsbt:nearline_archive\u003e test\n.\n.\n.\n.\nsbt:nearline_archive\u003e docker:publishLocal\n```\n\nThen you can use the manifests in https://gitlab.com/codmill/customer-projects/guardian/prexit-local/-/tree/master/kube/pluto-storagetier to\ndeploy from that DEV image into your local minikube.\n\nFor more information on prexit-local and minikube, you're best to start with https://gitlab.com/codmill/customer-projects/guardian/prexit-local/-/blob/master/README.md\nand continue reading about Kubernetes from there.\n\nEach pod contains a long-running process that starts up, subscribes to relevant events on the message bus (see later for details)\nand then waits for messages to come and be processed. It's set up to terminate cleanly on a SIGINT (i.e. CTRL-C from the Terminal)\nand a SIGTERM (i.e. termination from the OS or Kubernetes). It's perfectly safe (and encouraged) to run many instances\nin parallel; because of the way that they use the message queue and data store there is no risk of conflicts.\n\n## What are the components?\n\n- `common` is not a component. It includes functionality which is shared across all of the components, such as data models and the\nmessage processing logic\n- `mxs-copy-components` is also not a component in its own right. It contains functionality to make it easier to use\nthe Java libraries for directly interfacing MatrixStore.\n- `online_nearline` is a component which is responsible for performing data copies from the online storage to the nearline storage\n- `online_archive` is a component which is responsible for performing data copies from the online storage to the archive.\n\n## How do the components communicate?\nThe components communicate with each other, themselves and the rest of the Pluto system via the RabbitMQ\nmessage bus.\n\nFor a good grounding in RabbitMQ terminology, have a look here: https://www.rabbitmq.com/getstarted.html.\n\nThe protocol used throughout the wider Pluto system is that when a component _does_ something that another\ncomponent may be interested in, it pushes a JSON message to its own _Exchange_.  Another component can then\nreceive this notification onto its own _Queue_ by _subscribing_ to the producer's _Exchange_.\n\nIn other words, a _producer_ owns the Exchange and a _consumer_ owns the queue.  \n\n### Subscription\nTo take a concrete example, consider the fact that many different apps may want to be notified about\nevents from Vidispine e.g. the Deliverables component, different storage-tier components, etc.\n\nVidispine does not directly interface to RabbitMQ but sends HTTP messages to given endpoints, so we use\na tool called \"pluto-vsrelay\" (see https://gitlab.com/codmill/customer-projects/guardian/pluto-vs-relay) which\nreceives these HTTP notifications and pushes them to an Exchange.\n\nThe key point is that `pluto-vsrelay` does not need to know or care about what other processes may want to consume\nthese messages.  Furthermore, it is responsible for \"declaring\" (in RabbitMQ parlance) the exchange i.e.\nensuring that it is created and configured properly.\n\nSay that we now have a new app that needs to know about some kind of specific event from Vidispine, e.g. an\nitem metadata change.\n\nOur app \"declares\" its own _queue_ in RabbitMQ and asks the broker (RabbitMQ server) to _subscribe_ this queue onto the Exchange\nthat pluto-vsrelay created.  In this way events that are pushed to the Exchange will be copied to the queue.\nAny number of queues can be subscribed to an exchange, and they will _all_ receive the same messages from the exchange.\n\nAs the name suggests, a _queue_ will _hold_ a message until it is consumed.  An exchange, on the other hand, will\npass a message on to subscribers and then forget about it.\n\n![Generic rabbitMQ exchange/queue usage](doc/rmq-generic.png)\n\n### Competing Consumers\nMultiple instances of our app share the _same_ queue (the \"competing consumers\" pattern - https://www.enterpriseintegrationpatterns.com/patterns/messaging/CompetingConsumers.html)\nWhen our app receives a message, it remains on the broker but is \"hidden\" to other consumers.  Once our app\ninstance has finished processing, it can either `ack` the message to the broker - which means indicate that it was\nprocessed successfully and can be deleted - or `nack` the message, indicating that it was not processed successfully.\nA `nack` contains an instruction to either re-queue the message or discard it (optionally to a dead-letter exchange).\nIf the app's connection to the broker terminates before either is received, then the message is automatically\nre-queued so another instance can pick it up.\n\nIn this way it does not matter if our app crashes or is restarted for some reason outside of our control, because\nif it was in the middle of something then that operation will be re-tried.\n\nIt is important, though, to ensure that operations each app performs will not fail\nif they are being retried over a partially-completed attempt.\n\n### Routing Keys\nYou can imagine, though, that in this example there are a lot of other events coming from Vidispine that\nour app is _not_ interested in (we only want item metadata updates).  It would be nice if we could only\nreceive onto our queue these specific events rather than everything.\n\nThis is where the concept of a \"routing key\" comes in (see https://www.rabbitmq.com/tutorials/tutorial-four-python.html).\n\nAll the exchanges used in Pluto are \"topic\" exchanges, meaning that they _require_  a routing key to be present\non a message when it is sent.\n\nA routing key is a set of strings separated by the period `.` character - e.g. `vidispine.item.metadata.modify`.\nRabbitMQ itself does not care about the specific content or meaning of the routing key, but we stick to\na least specific -\u003e most specific logic (in this case, literally `modify` the `metadata` of an `item` \nin `vidispine`).\n\nWhen you make a subscription to a Topic exchange, you need to pass in a specification for the routing key(s) that\nyou want to receive.  The wildcard characters `*` and `#` are useful here - `*` means \"match anything in this part of\nthe routing key\" and `#` means \"match anything from here on in\". \nFor example: \n - `vidispine.item.#` would match `vidispine.item.metadata.modify` and `vidispine.item.shape.create` etc.\n - `vidispine.item.*.create` would match `vidispine.item.shape.create` but not `vidispine.item.metadata.modify`.\n\nYou can see these subscriptions in action in the respective Main classes of the components:\n```scala\n      ProcessorConfiguration(\n        \"assetsweeper\",\n        \"assetsweeper.asset_folder_importer.file.#\",\n        \"storagetier.onlinearchive.newfile\",\n        new AssetSweeperMessageProcessor(plutoConfig)\n      )\n```\nIs the code that sets up an instance of the AssetSweeperMessageProcessor class to receive all `file` messages\nfrom `asset_folder_importer` via the `assetsweeper` exchange and sends success messages with a routing key of\n`storagetier.onlinearchive.newfile.success` to the component's designated output exchange.\n\n`ProcessorConfiguration` is defined in [ProcessorConfiguration.scala](common/src/scala/com/gu/multimedia/storagetier/framework/ProcessorConfiguration.scala)\n\n## How do the 'components' work?\n\n### Application architecture\n\nAll of the executable components follow the same pattern. At heart they are console-based Scala apps and start with a\nMain function inside a static class called Main.  This reads its settings from environment variables and starts up instances\nof the `MessageProcessingFramework` and a number of `MessageProcessor` instances to do the actual work.\n\nEach of these `MessageProcessor` instances usually lives in a class which is at the root level of the component and many\nthen have other dependencies which they can call out to, e.g. MatrixStore routines, Vidispine request building/parsing etc.\n\nEvery `MessageProcessor` instance starts with the `handleMessage` definition above, which then calls other function in the\nclass, which call more dependencies, etc. etc. etc.\n\nMany operations are performed with the help of Akka Streams, especially sending material to and from the MatrixStore appliances\nand HTTP integration with services like pluto-core and Vidispine.\n\n###Common logic\nEach component is based around the same fundamental logic, which is encapsulated in the `com.gu.multimedia.storagetier.framework` module.\nThey are designed to respond to messages that occur elsewhere in the system which are notified via the message queue,\nand then ensure that a copy of the given media is present in the required storage tier.  Once this has been done, \nanother message is output indicating that the operation took place.\n\nFailures are split into two kinds; \"retryable\" and \"permanent\" failures.  \n- If a permanent failure (i.e. one for which there is no point retrying) occurs during the processing of a message, \nthen the original message is sent to a dead-letter queue via a dead-letter exchange, with a number of fields set to \nindicate what went wrong.  \n- If a retryable failure occurs, then the original message is sent to a \"retry\" queue via a \nretry exchange.  The \"retry\" queue is not directly subscribed, but all messages have a TTL (time-to-live) value set on \nthem.  Once this TTL expires they are re-routed back to the \"retry-input\" exchange, which is picked up by an app instance \nand replayed. In this way, retries are kept outside the scope of any running instance so it is safe for instances to \ncrash or be restarted at any point.\n\nSchematically, the logic looks like this:\n\n![RabbitMQ subscription](doc/rmq-subscription.png)\n\nIn practise, this is represented by the [MessageProcessingFramework](common/src/scala/com/gu/multimedia/storagetier/framework/MessageProcessingFramework.scala)\nclass.\nIn order to be initialised, this requires a list of [ProcessorConfiguration](common/src/scala/com/gu/multimedia/storagetier/framework/ProcessorConfiguration.scala)\ninstances which associates a \"processor\" (i.e. an implementation of the [MessageProcessor](common/src/scala/com/gu/multimedia/storagetier/framework/MessageProcessor.scala)\ninterface) with an exchange to subscribe to, a routing key spec to narrow down the subscription and an\n\"outgoing\" routing key to use for success messages.\n\nYou can see this being set up in [Main.scala](online_archive/src/main/Main.scala).\n\nOnce initiated, it will connect to rabbitMQ and declare a queue with a specific name. This name is shared\nbetween all instances, so we have a single queue for the app that receives messages from all the different exchanges\n(including our own, and our own retries).\n\nThe logic in `MessageProcessor` takes care of routing an incoming message to the correct `MessageProcessor` instance,\nbased on the incoming exchange name and the routing key, both of which are carried in the message's metadata.\n\nThe `handleMessage` method of the `MessageProcessor` instance is then called with the parsed message JSON and the\nrouting key that it came from.\n\nThe Framework then waits for the Future returned from the `MessageProcessor` to complete. The action it takes\ndepends on the value of the Future:\n- a Right signifies success. The Right should contain a JSON object to send out, which is serialised to a string\nand sent onto the app's output exchange with the routing key given in the ProcessorConfiguration. The original\nmessage is `ack`'d on the broker, removing it from the queue\n- a Left signifies a retryable failure. The Left should contain a string describing the failure, which is logged.\n  - The original message is `nack`'d _without_ retry on the original queue and a copy is sent to the \"retry exchange\".\n  - The \"Retry exchange\" is subscribed by the \"retry queue\", which has a Time-To-Live attribute set on it. \n  - The message copy _also_ has a Time-To-Live attribute set, and whichever is the lower of these two settings is used.\n  - Once the time-to-live is expired, the message broker forwards it on to the designated \"dead-letter exchange\" of the\nretry queue, which then immediately forwards it back to the input queue.\n  - In this way we have a retry loop that can act without blocking application instances and allowing other content to be\nprocessed at the same time\n- a Failure (i.e. failed future) signifies a permanent failure. The exception message should contain a string describing\nthe failure, which is logged.\n  - The original message is `nack`'d without retry on the original queue. \n  - A copy is sent to the \"dead-letter exchange\" which is subscribed by the dead-letter queue (\"DLQ\").  This copy has extra\nmetadata fields set on it indicating the failure and where it originally came from.\n\n### Implementing a processor\nIn order to actually _do_ something with a message, you must create a subclass (well an implementation really) of the\n[MessageProcessor](common/src/scala/com/gu/multimedia/storagetier/framework/MessageProcessor.scala) trait.\n\nThis is as simple as providing an implementation of the following method in your own class:\n\n```scala\ndef handleMessage(routingKey:String, msg:Json):Future[Either[String,MessageProcessorReturnValue]]\n```\n\nand following the protocol above to represent success, retryable failure or permanent failure.\n\n`MessageProcessorReturnValue` is defined in [MessageProcessorReturnValue.scala](common/src/scala/com/gu/multimedia/storagetier/framework/MessageProcessorReturnValue.scala)\nand exists to allow success messages to be sent to multiple locations.\n\nIn practise though, you can normally just return a circe Json object and rely on an implicit converter, like this:\n```scala\nimport com.gu.multimedia.storagetier.framework.MessageProcessorConverters._\nimport io.circe.generic.auto._\nimport io.circe.syntax._\n\nclass MyProcessor extends MessageProcessor {\n  def handleMessage(routingKey:String, msg:Json):Future[Either[String,MessageProcessorReturnValue]] = {\n    // do stuff here.......\n    Future(Right(myDomainObject.asJson))\n  }\n}\n```\n\n\n\n## Logging\n\nWhen the system is actually running in practise, it quickly becomes very difficult to understand the logs.\nThis is because you potentially have a lot of messages, some retrying, some new, being processed across a lot of instances.\nNot every failure is a problem; some messages are expected to loop through a few retries before an external system has\n\"caught up\" (e.g. validating that a file exists in the archive storage).\n\nFor this reason, we use ELK (Elasticsearch-Logstash-Kibana) to  parse and warehouse the logs.  Each log line has a format\nwhich is defined in a `resources/logback.xml` configuration file and looks like this:\n\n```\n%d{yyyy-MM-dd'T'HH:mm:ss.SSSZZ} [%thread] [%X{msgId};%X{retryAttempt};%X{routingKey};%X{exchange};%X{isRedeliver}] %-5level %logger{36} - %msg%n\n```\n\ni.e.:\n- timestamp\n- thread ID\n- message ID that is being processed. This is arbitrary and set by the sender; we normally use UUIDs.  Allows you to cross-reference the events or filter\nacross multiple retries\n- retry attempt counter. Starts at 0, this is incremented every time a message is retried. Note that a message will not necessarily\nbe retried on the same instance that received it before\n- routing key of the message that is being processed\n- exchange that sent the message that is being processed\n- a boolean flag indicating whether the message has been redelivered by the broker (i.e. because something failed)\n- log level\n- class/logger name\n- message\n\nThese fields are parsed out in logstash and can be used for filtering, so you can quickly zoom in on why a specific file 'X' seems\nto be failing.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguardian%2Fpluto-storagetier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguardian%2Fpluto-storagetier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguardian%2Fpluto-storagetier/lists"}