{"id":13551229,"url":"https://github.com/skx/sos","last_synced_at":"2025-09-02T22:33:48.691Z","repository":{"id":65613829,"uuid":"59128723","full_name":"skx/sos","owner":"skx","description":"Simple Object Storage (I wish I could call it Steve's Simple Storage, or S3 ;)","archived":false,"fork":false,"pushed_at":"2020-01-20T10:45:53.000Z","size":173,"stargazers_count":156,"open_issues_count":0,"forks_count":14,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-06-21T09:06:55.806Z","etag":null,"topics":["blob-servers","daemon","golang","replication","storage"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/skx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-05-18T15:33:37.000Z","updated_at":"2025-05-31T04:17:52.000Z","dependencies_parsed_at":"2023-01-31T19:35:11.666Z","dependency_job_id":null,"html_url":"https://github.com/skx/sos","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/skx/sos","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skx%2Fsos","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skx%2Fsos/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skx%2Fsos/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skx%2Fsos/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/skx","download_url":"https://codeload.github.com/skx/sos/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skx%2Fsos/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261095310,"owners_count":23108784,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blob-servers","daemon","golang","replication","storage"],"created_at":"2024-08-01T12:01:44.480Z","updated_at":"2025-06-21T09:07:04.337Z","avatar_url":"https://github.com/skx.png","language":"Go","funding_links":[],"categories":["Go","golang"],"sub_categories":[],"readme":"[![Go Report Card](https://goreportcard.com/badge/github.com/skx/sos)](https://goreportcard.com/report/github.com/skx/sos)\n[![license](https://img.shields.io/github/license/skx/sos.svg)](https://github.com/skx/sos/blob/master/LICENSE)\n[![Release](https://img.shields.io/github/release/skx/sos.svg)](https://github.com/skx/sos/releases/latest)\n\n\n# Simple Object Storage\n\nThis Simple Object Storage (SOS) project is a HTTP-based object-storage system which allows files to be uploaded, and later retrieved via HTTP.\n\nFiles can be replicated across a number of hosts to ensure redundancy, and increased availability in the event of hardware failure.\n\n* [The design of the system](DESIGN.md).\n* [Scaling to large numbers of objects](SCALING.md).\n* [How replication works](REPLICATION.md).\n* [The APIs we present, both internal and private](API.md).\n\n\n\n## Installation\n\nThere are two ways to install this project from source, which depend on the version of the [go](https://golang.org/) version you're using.\n\nIf you just need the binaries you can find them upon the [project release page](https://github.com/skx/sos/releases).\n\n\n### Source Installation go \u003c=  1.11\n\nIf you're using `go` before 1.11 then the following command should fetch/update the project and install it upon your system:\n\n     $ go get -u github.com/skx/sos\n\n### Source installation go  \u003e= 1.12\n\nIf you're using a more recent version of `go` (which is _highly_ recommended), you need to clone to a directory which is not present upon your `GOPATH`:\n\n    git clone https://github.com/skx/sos\n    cd sos\n    go install\n\n\n\n## Overview\n\nYou can read the [design overview](DESIGN.md) for more details, but the\ncore idea behind the implmentation  relies upon the notion of a\n\"blob server\" - which is a very simple service which provides only the\nfollowing simple primitives:\n\n* Store a particular chunk of binary data with a specific name.\n* Given a name retrieve the chunk of binary data associated with it.\n* Return a list of all known names.\n\nThe public API is built upon the top of that primitive, and both are\nlaunched via the same command `sos`, by specifying the sub-command\nto use:\n\n     $ ./sos blob-server ...\n     $ ./sos api-server ...\n\nHere the first command launches a blob-server, which is the back-end for\nstorage, and the second command launches the public API server - which is\nwhat your code/users should operate against.\n\nIf you launch `sos` with no arguments you'll see brief details of the\navailable subcommands.\n\n\n\n## Quick Start\n\nIn an ideal deployment at least two hosts would be used:\n\n* One host would run the public-server.\n   * This allows uploads to be made, and later retrieved.\n* Each of the two hosts would also run a blob-server.\n   * The blob-servers provide the actual storage of the uploaded-objects.\n   * The contents of these are replicated out of band.\n\nWe can simulate a deployment upon a single host for the purposes of testing.  You'll just need to make sure you have four terminals open to run the appropriate daemons.\n\nFirst of all you'll want to launch a pair of blob-servers:\n\n    $ sos blob-server -store data1 -port 4001\n    $ sos blob-server -store data2 -port 4002\n\n\u003e **NOTE**: The storage-paths (`./data1` and `./data2` in the example above) is where the uploaded-content will be stored.  These directories will be created if missing.\n\nIn production usage you'd generally record the names of the blob-servers in a configuration file, either `/etc/sos.conf`, or `~/.sos.conf`, however they may also be specified upon the command line.\n\nWe'll then start the public/API-server ensuring that it knows about the blob-servers to store content in:\n\n    $ sos api-server -blob-server http://localhost:4001,http://localhost:4002\n    Launching API-server\n    ..\n\n\nNow you, or your code, can connect to the server and start uploading/downloading objects.  By default the following ports will be used by the `sos-server`:\n\n|service           | port |\n|----------------- | ---- |\n| upload service   | 9991 |\n| download service | 9992 |\n\nProviding you've started all three daemons you can now perform a test upload with `curl`:\n\n    $ curl -X POST --data-binary @/etc/passwd http://localhost:9991/upload\n    {\"id\":\"cd5bd649c4dc46b0bbdf8c94ee53c1198780e430\",\"size\":2306,\"status\":\"OK\"}\n\nIf all goes well you'll receive a JSON-response as shown, and you can use the ID which is returned to retrieve your object:\n\n    $ curl http://localhost:9992/fetch/cd5bd649c4dc46b0bbdf8c94ee53c1198780e430\n    ..\n    $\n\n\u003e **NOTE**: The download service runs on a different port.  This is so that you can make policy decisions about uploads/downloads via your local firewall.\n\nAt the point you run the upload the contents will only be present on one of the blob-servers, chosen at random.  To ensure your data is replicated you need to (regularly) launch the replication utility:\n\n    $ sos replicate -blob-server http://localhost:4001,http://localhost:4002 --verbose\n\tgroup - server\n\t   default - http://localhost:4001\n\t   default - http://localhost:4002\n    Syncing group: default\n       Group member: http://localhost:4001\n       Group member: http://localhost:4002\n       Object cd5bd649c4dc46b0bbdf8c94ee53c1198780e430 is missing on http://localhost:4001\n         Mirroring cd5bd649c4dc46b0bbdf8c94ee53c1198780e430 from http://localhost:4002 to http://localhost:4001\n            Fetching :http://localhost:4002/blob/cd5bd649c4dc46b0bbdf8c94ee53c1198780e430\n            Uploading :http://localhost:4001/blob/cd5bd649c4dc46b0bbdf8c94ee53c1198780e430\n\n\n## Meta-Data\n\nWhen uploading objects it is often useful to store meta-data, such as the original name of the uploaded object, the owner, or some similar data.  For that reason any header you add to your upload with an `X-`prefix will be stored and returned on download.\n\nAs a special case the header `X-Mime-Type` can be used to set the returned `Content-Type` header too.\n\nFor example uploading an image might look like this:\n\n    $ curl -X POST -H \"X-Orig-Filename: steve.jpg\" \\\n                   -H \"X-MIME-Type: image/jpeg\" \\\n                   --data-binary @/home/skx/Images/tmp/steve.jpg \\\n            http://localhost:9991/upload\n    {\"id\":\"20b30df22469e6d7617c7da6a457d4e384945a06\",\"status\":\"OK\",\"size\":17599}\n\nDownloading will result in the headers being set:\n\n    $ curl -v http://localhost:9992/fetch/20b30df22469e6d7617c7da6a457d4e384945a06 \u003e/dev/null\n    ..\n    \u003c HTTP/1.1 200 OK\n    \u003c X-Orig-Filename: steve.jpg\n    \u003c Date: Fri, 27 May 2016 06:17:39 GMT\n    \u003c Content-Type: image/jpeg\n    \u003c Transfer-Encoding: chunked\n    \u003c\n    { [data not shown]\n\n\n\n\n## Production Usage\n\n* The API service must be visible to clients, to allow downloads to be made.\n    * Because the download service runs on port `9992` it is assumed that corporate firewalls would deny access.\n    * We assume you'll configure an Apache/nginx/similar reverse-proxy to access the files via a host like `http://objects.example.com/`.\n\n* It is assumed you might wish to restrict uploads to particular clients, rather than allow the world to make uploads.  The simplest way of doing this is to use your firewall to filter access to port `9991`.\n\n* The blob-servers must be reachable by the host(s) running the API-service, but they should not be publicly visible.\n    * If your blob-servers are exposed to the internet remote users could [use the API](API.md) to spider and download all your content.\n\n* None of the servers need to be launched as root, because they don't bind to privileged ports, or require special access.\n    * **NOTE**: [issue #6](https://github.com/skx/sos/issues/6) improved the security of the `blob-server` by invoking `chroot()`.  However `chroot()` will fail if the server is not launched as root, which is harmless.\n\n* You can also read about scaling when your data is too large to fit upon a single `blob-server`:\n   * [Read about scaling SoS](SCALING.md)\n\n\n## Future Changes?\n\nIt would be possible to switch to using _chunked_ storage, for example breaking up each file that is uploaded into 128Mb sections and treating them as distinct.  The reason that is not done at the moment is because it relies upon state:\n\n* The public server needs to be able to know that the file with a given ID is comprised of the following chunks of data:\n    * `a5d606958533634fed7e6d5a79d6a5617252021f`\n    * `038deb6940db2d0e7b9ee9bba70f3501a0667989`\n    * `a7914eb6ff984f97c5f6f365d3d93961be2e8617`\n    * `...`\n* That data must be always kept up to date and accessible.\n\nAt the moment the API-server is stateless, so tracking that data is not possible.  It possible to imagine using [redis](http://redis.io/), or some other external database to record the data, but that increases the complexity of deployment.\n\n\n## Github Setup\n\nThis repository is configured to run tests upon every commit, and when\npull-requests are created/updated.  The testing is carried out via\n[.github/run-tests.sh](.github/run-tests.sh) which is used by the\n[github-action-tester](https://github.com/skx/github-action-tester) action.\n\nReleases are automated in a similar fashion via [.github/build](.github/build),\nand the [github-action-publish-binaries](https://github.com/skx/github-action-publish-binaries) action.\n\n\n\n## Questions?\n\nQuestions/Changes are most welcome; just report an issue.\n\n\n\n\nSteve\n --\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskx%2Fsos","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fskx%2Fsos","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskx%2Fsos/lists"}