{"id":13556219,"url":"https://github.com/TritonDataCenter/manta","last_synced_at":"2025-04-03T09:31:01.080Z","repository":{"id":20322814,"uuid":"23597054","full_name":"TritonDataCenter/manta","owner":"TritonDataCenter","description":"Manta is a scalable HTTP-based object store","archived":false,"fork":false,"pushed_at":"2024-04-05T05:09:34.000Z","size":1579,"stargazers_count":604,"open_issues_count":7,"forks_count":64,"subscribers_count":85,"default_branch":"master","last_synced_at":"2024-05-19T07:25:58.149Z","etag":null,"topics":["distributed","high-availability","storage"],"latest_commit_sha":null,"homepage":"","language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TritonDataCenter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-09-02T21:56:01.000Z","updated_at":"2024-07-13T14:04:38.980Z","dependencies_parsed_at":"2024-07-13T14:04:28.486Z","dependency_job_id":null,"html_url":"https://github.com/TritonDataCenter/manta","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TritonDataCenter%2Fmanta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TritonDataCenter%2Fmanta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TritonDataCenter%2Fmanta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TritonDataCenter%2Fmanta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TritonDataCenter","download_url":"https://codeload.github.com/TritonDataCenter/manta/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246450461,"owners_count":20779408,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed","high-availability","storage"],"created_at":"2024-08-01T12:03:42.487Z","updated_at":"2025-04-03T09:30:59.304Z","avatar_url":"https://github.com/TritonDataCenter.png","language":"Makefile","funding_links":[],"categories":["Makefile","others"],"sub_categories":[],"readme":"\u003c!--\n    This Source Code Form is subject to the terms of the Mozilla Public\n    License, v. 2.0. If a copy of the MPL was not distributed with this\n    file, You can obtain one at http://mozilla.org/MPL/2.0/.\n--\u003e\n\n\u003c!--\n    Copyright 2020 Joyent, Inc.\n    Copyright 2022 MNX Cloud, Inc.\n--\u003e\n\n# Manta: a scalable, distributed object store\n\nManta is an open-source, scalable, HTTP-based object store. All the pieces\nrequired to deploy and operate your own Manta are open source. This repo\nprovides documentation for the overall Manta project and pointers to the other\nrepositories that make up a complete Manta deployment.\n\n## Getting started\n\nThe fastest way to get started with Manta depends on what exactly one\nwishes to do.\n\n* To use Manta see the [Getting Started](./docs/user-guide/#getting-started)\n  section of the User Guide.\n\n* To learn about installing and operating your own Manta deployment, see the\n  [Manta Operator Guide](./docs/operator-guide/).\n\n* To understand Manta's architecture, see [Bringing Arbitrary Compute to\n  Authoritative Data](http://queue.acm.org/detail.cfm?id=2645649), the [ACM\n  Queue](http://queue.acm.org/) article on its design and implementation.\n\n* To understand the [CAP tradeoffs](http://en.wikipedia.org/wiki/CAP_theorem) in Manta,\n  see [Fault Tolerence in Manta](http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/) --\n  which received [some notable praise](https://twitter.com/eric_brewer/status/352804538769604609).\n\n* For help with working on Manta and building and testing your changes,\n  see the [developer guide](docs/developer-guide)\n\n## Community\n\nCommunity discussion about Manta happens in two main places:\n\n* The *manta-discuss*\n  [mailing list](https://mantastorage.topicbox.com/groups/manta-discuss).\n  If you wish to send mail to the list you'll need to join, but you can view\n  and search the archives online without being a member.\n\n* In the *#manta* IRC channel on the\n  [Libera.chat IRC network](https://libera.chat/).\n\n## Dependencies\n\nManta is composed of a number of services that deploy on top of Joyent's\n[Triton DataCenter](https://github.com/TritonDataCenter/triton) platform (just \"Triton\"\nfor short), which is also open-source. Triton provides services for operating\nphysical servers (compute nodes), deploying services in containers, monitoring\nservices, transmitting and visualizing real-time performance data, and a bunch\nmore. Manta primarily uses Triton for initial deployment, service upgrade, and\nservice monitoring.\n\nTriton itself depends on [SmartOS](http://smartos.org).  Manta also directly\ndepends on several SmartOS features, notably ZFS.\n\n## Building and Deploying Manta\n\nManta service images are built and packaged using the same mechanisms as\nbuilding the services that are part of Triton. Once you have Triton set up,\nfollow the instructions in the [Manta Operator Guide](./docs/operator-guide/)\nto deploy Manta.  The easiest way to play around with your own Manta\ninstallation is to first set up a Triton cloud-on-a-laptop (COAL) installation\nin VMware and then follow those instructions to deploy Manta on it.\n\nIf you want to deploy your own builds of Manta components, see \"Deploying your\nown Manta Builds\" below.\n\n## Repositories\n\nThis repository is just a wrapper containing documentation about Manta.  Manta\nis made up of several components from many repositoies. This section highlights\nsome of the more important ones.\n\nA full list of repositories relevant to Manta is maintained in a [repo manifest\nfile](./tools/jr-manifest.json) in this repo. To more conveniently list those\nrepos, you can use the [`jr` tool](https://github.com/TritonDataCenter/joyent-repos#jr).\n\nThe front door services respond to requests from the internet at large:\n\n* [muppet](https://github.com/TritonDataCenter/muppet): the haproxy-based \"loadbalancer\"\n  service\n* [muskie](https://github.com/TritonDataCenter/manta-muskie): the node.js-based \"webapi\"\n  service, this is Manta's \"Directory API\"\n* [buckets-api](https://github.com/TritonDataCenter/manta-buckets-api): Node.js-based\n  \"buckets-api\" service, this is Manta's \"Buckets API\"\n\nThe metadata tiers for the Directory and Buckets APIs store the entire object\nnamespace (not object data) as well as backend storage system capacity:\n\n* [manatee](https://github.com/TritonDataCenter/manatee): the \"postgres\" service, a\n  high-availability postgres cluster using synchronous replication and automatic\n  fail-over\n* [moray](https://github.com/TritonDataCenter/moray): Node-based key-value store built on\n  top of manatee.  Also responsible for monitoring manatee replication topology\n  (i.e., which postgres instance is the master).\n* [electric-moray](https://github.com/TritonDataCenter/electric-moray): Node-based service\n  that provides the same interface as Moray, but which directs requests to one\n  or more Moray+Manatee *shards* based on hashing the Moray key.\n* [buckets-mdapi](https://github.com/TritonDataCenter/manta-buckets-mdapi): a Rust-based\n  API for managing all metadata for the Buckets API\n* [buckets-mdplacement](https://github.com/TritonDataCenter/manta-buckets-mdplacement): a\n  Rust-based API for handling routing of Buckets API objects to appropriate\n  nodes in the storage tier.\n\nThe storage tier is responsible for actually storing bits on disk:\n\n* [mako](https://github.com/TritonDataCenter/manta-mako): the \"storage\" service, a\n  nginx-based server that receives PUT/GET requests from the front door services\n  to store object data on disk\n* [minnow](https://github.com/TritonDataCenter/manta-minnow): a Node-based agent that\n  runs inside storage instances to periodically report storage capacity to the\n  metadata tier\n\nThere are a number of services not part of the data path that are critical for\nManta's operation. For example:\n\n* [binder](https://github.com/TritonDataCenter/binder): hosts both ZooKeeper (used for\n  manatee leader election and for group membership) and a Node-based DNS server\n  that keeps track of which instances of each service are online at any given\n  time\n* [mahi](https://github.com/TritonDataCenter/mahi): The \"authcache\" service for handling authn/authz.\n\nMost of the above components are *services*, of which there may be multiple\n*instances* in a single Manta deployment. Except for the last category of\nnon-data-path services, these can all be deployed redundantly for availability\nand additional instances can be deployed to increase capacity.\n\nFor more details on the architecture, including how these pieces actually fit\ntogether, see the [Architecture](./docs/operator-guide/architecture.md) section\nof the Operator Guide.\n\n## Deploying your own Manta Builds\n\nAs described above, as part of the normal Manta deployment process, you start\nwith the \"manta-deployment\" zone that's built into Triton.  Inside that zone, you\nrun \"manta-init\" to fetch the latest Joyent build of each Manta component.  Then\nyou run Manta deployment tools to actually deploy zones based on these builds.\n\nThe easiest way to use your own custom build is to first deploy Manta using the\ndefault Joyent build and *then* replace whatever components you want with your\nown builds.  This will also ensure that you're starting from a known-working set\nof builds so that if something goes wrong, you know where to start looking.  To\ndo this:\n\n1. Complete the Manta deployment procedure from the operator guide.\n2. Build a zone image for whatever zone you want to replace.  See the\n   instructions for building [Triton](https://github.com/TritonDataCenter/triton)\n   zone images.  Manta zones work the same way.  The output of this process\n   will be a zone **image**, identified by uuid.  The image is comprised of\n   two files: an image manifest (a JSON file) and the image file itself\n   (a binary blob).\n3. Import the image into the Triton DataCenter that you're using to deploy Manta.\n   (If you've got a multi-datacenter Manta deployment, you'll need to import the\n   image into each datacenter separately using this same procedure.)\n    1. Copy the image and manifest files to the Triton headnode where the Manta\n       deployment zone is deployed.  For simplicity, assume that the\n       manifest file is \"/var/tmp/my_manifest.json\" and the image file is\n       \"/var/tmp/my_image\".  You may want to use the image uuid in the filenames\n       instead.\n    2. Import the image using:\n\n           sdc-imgadm import -m /var/tmp/my_manifest.json -f /var/tmp/my_image\n\n4. Now you can use the normal Manta zone update procedure (from the operator\n   guide). This involves saving the current configuration to a JSON\n   file using \"manta-adm show -sj \u003e config.json\", updating the configuration\n   file, and then applying the changes with \"manta-adm update \u003c config.json\".\n   When you modify the configuration file, you can use your image's uuid in\n   place of whatever service you're trying to replace.\n\nIf for some reason you want to avoid deploying the Joyent builds at all, you'll\nhave to follow a more manual procedure.  One approach is to update the SAPI\nconfiguration for whatever service you want (using sdc-sapi -- see\n[SAPI](https://github.com/TritonDataCenter/sdc-sapi)) *immediately after* running\nmanta-init but before deploying anything.  Note that each subsequent\n\"manta-init\" will clobber this change, though the SAPI configuration is normally\nonly used for the initial deployment anyway.  The other option is to apply the\nfully-manual install procedure from the Operator Guide (i.e., instead of\nusing manta-deploy-coal or manta-deploy-lab) and use a custom \"manta-adm\"\nconfiguration file in the first place.  If this is an important use case, file\nan issue and we can improve this procedure.\n\nThe above procedure works to update Manta *zones*, which are most of the\ncomponents above.  The other two kinds of components are the *platform* and\n*agents*.  Both of these procedures are documented in the Operator Guide,\nand they work to deploy custom builds as well as the official Joyent builds.\n\n## Contributing to Manta\n\nTo report bugs or request features, you can submit issues to the Manta project\non Github.  If you're asking for help with Joyent's production Manta service,\nyou should contact Joyent support instead.\n\nSee the [Contribution Guidelines](./CONTRIBUTING.md) for information about\ncontributing changes to the project.\n\n## Design principles\n\nManta assumes several constraints on the data storage problem:\n\n1. There should be one *canonical* copy of data.  You shouldn't need to copy\n   data in order to analyze it, transform it, or serve it publicly over the\n   internet.\n2. The system must scale horizontally in every dimension.  It should be possible\n   to add new servers and deploy software instances to increase the system's\n   capacity in terms of number of objects, total data stored, or compute\n   capacity.\n3. The system should be general-purpose.\n4. The system should be strongly consistent and highly available.  In terms of\n   [CAP](http://en.wikipedia.org/wiki/CAP_theorem), Manta sacrifices\n   availability in the face of network partitions.  (The reasoning here is that\n   an AP cache can be built atop a CP system like Manta, but if Manta were AP,\n   then it would be impossible for anyone to get CP semantics.)\n5. The system should be transparent about errors and performance.  The public\n   API only supports atomic operations, which makes error reporting and\n   performance easy to reason about.  (It's hard to say anything about the\n   performance of compound operations, and it's hard to report failures in\n   compound operations.)  Relatedly, a single Manta deployment may span multiple\n   datacenters within a region for higher availability, but Manta does not\n   attempt to provide a global namespace across regions, since that would imply\n   uniformity in performance or fault characteristics.\n\nFrom these constraints, we define some design principles:\n\n1. Manta presents an HTTP interface (with REST-based PUT/GET/DELETE operations)\n   as the primary way of reading and writing data.  Because there's only one\n   copy of data, and some data needs to be available publicly (e.g., on the\n   internet over standard protocols), HTTP is a good choice.\n2. Manta is an *object store*, meaning that it only provides PUT/GET/DELETE for\n   *entire objects*.  You cannot write to the middle of an object or append to\n   the end of one.  This constraint makes it possible to guarantee strong\n   consistency and high availability, since only the metadata tier (i.e., the\n   namespace) needs to be strongly consistent, and objects themselves can be\n   easily replicated for availability.\n\nIt's easy to underestimate the problem of just reliably storing bits on disk.\nIt's commonly assumed that the only components that fail are disks, that they\nfail independently, and that they fail cleanly (e.g., by reporting errors).  In\nreality, there are a lot worse failure modes than disks failing cleanly,\nincluding:\n\n* disks or HBAs dropping writes\n* disks or HBAs redirecting both read and write requests to the wrong physical\n  blocks\n* disks or HBAs retrying writes internally, resulting in orders-of-magnitude\n  latency bubbles\n* disks, HBAs, or buses corrupting data at any point in the data path\n\nManta delegates to ZFS to solve the single-system data storage problem.  To\nhandle these cases,\n\n* ZFS stores block checksums *separately* from the blocks themselves.\n* Filesystem metadata is stored redundantly (on separate disks).  Data is\n  typically stored redundantly as well, but that's up to user configuration.\n* ZFS is aware of how the filesystem data is stored across several disks.  As a\n  result, when reads from one disk return data that doesn't match the expected\n  checksum, it's able to read another copy and fix the original one.\n\n## Further reading\n\nFor background on the overall design approach, see [\"There's Just No Getting\nAround It: You're Building a Distributed\nSystem\"](http://queue.acm.org/detail.cfm?id=2482856).\n\nFor information about how Manta is designed to survive component failures and\nmaintain strong consistency, see [Fault tolerance in\nManta](http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/).\n\nFor information on the latest recommended production hardware, see [Joyent\nManufacturing Matrix](http://eng.joyent.com/manufacturing/matrix.html) and\n[Joyent Manufacturing Bill of\nMaterials](http://eng.joyent.com/manufacturing/bom.html).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTritonDataCenter%2Fmanta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTritonDataCenter%2Fmanta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTritonDataCenter%2Fmanta/lists"}