{"id":18384870,"url":"https://github.com/allegro/akubra","last_synced_at":"2025-07-12T02:35:06.323Z","repository":{"id":45037053,"uuid":"71782347","full_name":"allegro/akubra","owner":"allegro","description":"Simple solution to keep a independent S3 storages in sync","archived":false,"fork":false,"pushed_at":"2024-04-19T09:39:32.000Z","size":9752,"stargazers_count":87,"open_issues_count":14,"forks_count":9,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-06T23:34:44.668Z","etag":null,"topics":["amazon-s3","amazon-s3-storage","ceph","object-storage","s3","storage","sync"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/allegro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-10-24T11:32:11.000Z","updated_at":"2025-03-07T03:54:58.000Z","dependencies_parsed_at":"2024-01-18T12:36:09.863Z","dependency_job_id":"17c0a28d-471b-4b48-ad17-cbf63cd4e300","html_url":"https://github.com/allegro/akubra","commit_stats":null,"previous_names":[],"tags_count":46,"template":false,"template_full_name":null,"purl":"pkg:github/allegro/akubra","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allegro%2Fakubra","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allegro%2Fakubra/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allegro%2Fakubra/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allegro%2Fakubra/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/allegro","download_url":"https://codeload.github.com/allegro/akubra/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allegro%2Fakubra/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264925961,"owners_count":23684245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon-s3","amazon-s3-storage","ceph","object-storage","s3","storage","sync"],"created_at":"2024-11-06T01:15:45.275Z","updated_at":"2025-07-12T02:35:06.273Z","avatar_url":"https://github.com/allegro.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Akubra\n\n[![Version Widget]][Version] [![Build Status Widget]][Build Status] [![GoDoc Widget]][GoDoc]\n\n[Version]: https://github.com/allegro/akubra/releases/latest\n[Version Widget]: https://img.shields.io/github/tag/allegro/akubra.svg\n[Build Status]: https://travis-ci.org/allegro/akubra\n[Build Status Widget]: https://travis-ci.org/allegro/akubra.svg?branch=master\n[GoDoc]: https://godoc.org/github.com/allegro/akubra\n[GoDoc Widget]: https://godoc.org/github.com/allegro/akubra?status.svg\n\n## Goals\n\n### Redundancy\n\nAkubra is a simple solution to keep independent S3 storages in sync - almost\nrealtime, eventually consistent.\n\nKeeping synchronized storage clusters, which handles great volume of new objects,\nis the most efficient by feeding them with all incoming data\nat once. That's what Akubra does, with a minimum memory and cpu footprint.\n\nSynchronizing S3 storages offline is almost impossible with a high volume of traffic.\nIt would require keeping track of new objects (or periodical bucket listing),\ndownloading and uploading them to the other storage. It's slow, expensive and hard\nto implement robustly.\n\nAkubra way is to put files in all storages at once by copying requests to multiple\nbackends. I case one if clusters rejects request it logs that event, and synchronizes\ntroublesome object with an independent process.\n\n### Seamless storage space extension with new storage clusters\n\nAkubra has sharding capabilities. You can easily configure new backends with\nweights and append them to regions cluster pool.\n\nBased on cluster weights akubra splits all operations between clusters in pool.\nIt also backtracks to older cluster when requested for not existing object on\ntarget cluster. This kind of events are logged, so it's possible to rebalance\nclusters in background.\n\n### Multi cloud cost optimization\n\nWhile all objects has to be stored in each storage within a shard, not all storages\nhas to be read. With load balancing and storage prioritization akubra will peak\ncheapest one.\n\n## Build\n\n### Prerequisites\n\nYou need go \u003e= 1.8 compiler [see](https://golang.org/doc/install)\n\n### Build\n\nIn main directory of this repository do:\n\n```\nmake build\n```\n\n### Test\n\n```\nmake test\n```\n\n## Usage of Akubra:\n\n```\nusage: akubra [\u003cflags\u003e]\n\nFlags:\n      --help       Show context-sensitive help (also try --help-long and --help-man).\n  -c, --conf=CONF  Configuration file e.g.: \"conf/dev.yaml\"\n```\n\n### Example:\n\n```\nakubra -c devel.yaml\n```\n\n## How it works?\n\nOnce a request comes to our proxy we copy all its headers and create pipes for\nbody streaming to each endpoint. If any endpoint returns a positive response it's\nimmediately returned to a client. If all endpoints return an error, then the\nfirst response is passed to the client\n\nIf some nodes respond incorrectly we log which cluster has a problem, is it\nstoring or reading and where the erroneous file may be found. In that case\nwe also return positive response as stated above.\n\nWe also handle slow endpoint scenario. If there are more connections than safe\nlimit defined in configuration, the backend with most of them is taken out of\nthe pool and an error is logged.\n\n## Configuration\n\nConfiguration is read from a YAML configuration file with the following fields:\n\n```yaml\nService:\n  Server:\n    BodyMaxSize: 100MB\n    MaxConcurrentRequests: 200\n    # Listen interface and port e.g. \"0:8000\", \"localhost:9090\", \":80\"\n    Listen: \":7082\"\n    # Technical endpoint interface\n    TechnicalEndpointListen: \":7005\"\n    # Health check endpoint (for load balancers)\n    HealthCheckEndpoint: \"/status/ping\"\n  Client:\n    # Additional not AWS S3 specific headers proxy will add to original request\n    AdditionalResponseHeaders:\n      \"Access-Control-Allow-Origin\": \"*\"\n      \"Access-Control-Allow-Credentials\": \"true\"\n      \"Access-Control-Allow-Methods\": \"GET, POST, OPTIONS\"\n      \"Access-Control-Allow-Headers\": \"DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,X-CSRFToken\"\n      \"Cache-Control\": \"public, s-maxage=600, max-age=600\"\n    # Additional headers added to backend response\n    AdditionalRequestHeaders:\n      \"Cache-Control\": \"public, s-maxage=600, max-age=600\"\n    # Backends in maintenance mode\n    # MaintainedBackends:\n    #  - \"http://s3.dc2.internal\"\n    # List request methods to be logged in synclog in case of backend failure\n    SyncLogMethods:\n      - GET\n      - PUT\n      - DELETE\n    # Transports rules with dedicated timeouts\n    Transports:\n      - Name: TransportDef-Method:GET|POST\n        Rules:\n          Method: GET|POST\n          Path: .*\n        Properties:\n          MaxIdleConns: 200\n          MaxIdleConnsPerHost: 1000\n          IdleConnTimeout: 2s\n          ResponseHeaderTimeout: 5s\n      - Name: TransportDef-Method:GET|POST|PUT\n        Rules:\n          Method: GET|POST|PUT\n          QueryParam: acl\n        Properties:\n          MaxIdleConns: 200\n          MaxIdleConnsPerHost: 500\n          IdleConnTimeout: 5s\n          ResponseHeaderTimeout: 5s\n      - Name: OtherTransportDefinition\n        Rules:\n        Properties:\n          MaxIdleConns: 300\n          MaxIdleConnsPerHost: 600\n          IdleConnTimeout: 2s\n          ResponseHeaderTimeout: 2s\n\n# List request methods to be logged in synclog in case of backend failure\nSyncLogMethods:\n  - PUT\n  - DELETE\n# Configure sharding\nClusters:\n  cluster1:\n    Backends:\n      - http://127.0.0.1:9001\n  cluster2:\n    Backends:\n      - http://127.0.0.1:9002\nRegions:\n  myregion:\n    Clusters:\n      - Cluster: cluster1\n        Weight: 0\n      - Cluster: cluster2\n        Weight: 1\n    Domains:\n      - myregion.internal\n\nLogging:\n  Synclog:\n    stderr: true\n  #  stdout: false  # default: false\n  #  file: \"/var/log/akubra/sync.log\"  # default: \"\"\n  #  syslog: LOG_LOCAL1  # default: LOG_LOCAL1\n  #  database:\n  #    user: dbUser\n  #    password: \"\"\n  #    dbname: dbName\n  #    host: localhost\n  #    inserttmpl: |\n  #      INSERT INTO tablename(path, successhost, failedhost, ts,\n  #       method, useragent, error)\n  #      VALUES ('new','{{.path}}','{{.successhost}}','{{.failedhost}}',\n  #      '{{.ts}}'::timestamp, '{{.method}}','{{.useragent}}','{{.error}}');\n\n  Mainlog:\n    stderr: true\n  #  stdout: false  # default: false\n  #  file: \"/var/log/akubra/akubra.log\"  # default: \"\"\n  #  syslog: LOG_LOCAL2  # default: LOG_LOCAL2\n  #  level: Error   # default: Debug\n\n  Accesslog:\n    stderr: true # default: false\n  #  stdout: false  # default: false\n  #  file: \"/var/log/akubra/access.log\"  # default: \"\"\n  #  syslog: LOG_LOCAL3  # default: LOG_LOCAL3\n\n# Enable metrics collection\nMetrics:\n  # Possible targets: \"prometheus\", \"graphite\", \"expvar\", \"stdout\"\n  Target: graphite\n  # Expvar or Prometheus handler listener address\n  ExpAddr: \":8080\"\n  # How often metrics should be released, applicable for \"graphite\", \"prometheus\" and \"stdout\"\n  Interval: 30s\n  # Graphite metrics prefix path\n  Prefix: my.metrics\n  # Shall prefix be suffixed with \"\u003chostname\u003e.\u003cprocess\u003e\"\n  AppendDefaults: true\n  # Graphite collector address\n  Addr: graphite.addr.internal:2003\n  # Debug includes runtime.MemStats metrics\n  Debug: false\n```\n\n## Configuration validation for CI\n\nAkubra has a technical http endpoint for configuration validation purposes.\nIt's configured with TechnicalEndpointListen property.\n\n### Example usage\n\n    curl -vv -X POST -H \"Content-Type: application/yaml\" --data-binary @akubra.cfg.yaml http://127.0.0.1:8071/configuration/validate\n\nPossible responses:\n\n    * HTTP 200\n    Configuration checked - OK.\n\nor:\n\n    * HTTP 400, 405, 413, 415 and info in body with validation error message\n\n## Health check endpoint\n\nFeature required by load balancers, DNS servers and related systems for health checking.\nIn configuration YAML we have a `HealthCheckEndpoint` parameter - it's an URI path for health check HTTP endpoint.\n\n### Example usage\n\n    curl -vv -X GET http://127.0.0.1:8080/status/ping\n\nResponse:\n\n    \u003c HTTP/1.1 200 OK\n    \u003c Cache-Control: no-cache, no-store\n    \u003c Content-Type: text/html\n    \u003c Content-Length: 2\n    OK\n\n## Transports and Rules with dedicated timeouts\n\nThis feature guarantees high availability and better transmission.\n\nFor example, when one specific HTTP method has lag we can set timeouts with special 'Rule'.\nAnother example, when user adds big chunks by multi upload,\ndefault timeout needs to be changed with dedicated 'Transport' with 'Rule' for this case.\n\nWe have 'Rules' for 'Transports' definitions:\n\n- required minimum one item in 'Transports' section\n- required empty or one property (Method, Path, QueryParam) in 'Rules' section\n- if 'Rules' section is empty, the transport will match any requests\n- when transport cannot be matched, http 500 error code will be sent to client.\n\n## Limitations\n\n- Users credentials have to be identical on every backend\n- We do not support S3 partial uploads\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallegro%2Fakubra","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fallegro%2Fakubra","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallegro%2Fakubra/lists"}