{"id":22354038,"url":"https://github.com/movableink/vessel","last_synced_at":"2025-10-28T21:11:09.080Z","repository":{"id":11142115,"uuid":"13508461","full_name":"movableink/vessel","owner":"movableink","description":"Deploys an EC2 instance to ship and process CSV files on S3","archived":false,"fork":false,"pushed_at":"2013-10-11T20:09:24.000Z","size":1636,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-01-31T13:43:41.770Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/movableink.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-10-11T20:03:04.000Z","updated_at":"2023-07-25T13:49:37.000Z","dependencies_parsed_at":"2022-07-25T21:18:30.347Z","dependency_job_id":null,"html_url":"https://github.com/movableink/vessel","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/movableink%2Fvessel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/movableink%2Fvessel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/movableink%2Fvessel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/movableink%2Fvessel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/movableink","download_url":"https://codeload.github.com/movableink/vessel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245652989,"owners_count":20650611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-04T13:10:57.456Z","updated_at":"2025-10-28T21:11:09.023Z","avatar_url":"https://github.com/movableink.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"Vessel\n======\n\n\u003e “Thou art too damned jolly. Sail on.”\n\n\u003e Herman Melville, Moby-Dick\n\n\nBackground\n----------\n\nDeploys an EC2 instance (a \"Vessel\") to ship and process CSV files from one S3\nkeyspace to another S3 keyspace. Vessel routes CSV input rows to configurable\ndirectory structures (keyspaces).\n\n\nConfiguration\n-------------\n\n[Pallet](http://palletops.com/) is used to deploy the EC2 Vessel.\n\n\n```bash\nlein pallet add-service my-aws aws-ec2 \"your-aws-key\" \"your-aws-secret-key\"\n```\n\nThis will create a configuration file in `~/.pallet/services/my-aws.clj` with\nyour AWS credentials.\n\n\nMain entry point\n----------------\n\nWith [Leiningen](https://github.com/technomancy/leiningen)\n\n```bash\nlein run -m vessel.deploy path/to/manifest.edn 201310\n```\n\nOr as a standalone JAR:\n\n```bash\njava -cp vessel-0.4.0-SNAPSHOT-standalone.jar clojure.main -m vessel.deploy path/to/manifest.edn 201310\n```\n\nThis deploys an EC2 instance that:\n\n  1. Downloads compressed reports from `s3://bucket/201310/part-r-*.gz`\n  2. Processes every file, line by line, routing input lines to output files\n     based on rules defined in `manifest.edn`\n  3. Compresses and ships results to S3\n  4. Decommissions itself\n\nThere are no S3 transfer or bandwidth fees because the EC2 Vessel is deployed in\nthe same AWS region as the S3 buckets.\n\n\nProcess steps\n-------------\n\nRunning the **Main entry point** command is equivalent to running five steps.\n\n\n### Step 1 of 5: Provision \u0026 launch EC2 Vessel\n\n\n```bash\nlein run -m vessel.launch path/to/manifest.edn 201310\n```\nor\n\n```bash\njava -cp vessel-0.4.0-SNAPSHOT-standalone.jar clojure.main -m vessel.launch path/to/manifest.edn 201310\n```\n\n\n### Step 2 of 5: Fetch input from S3 directories\n\n\n```bash\nlein run -m vessel.input path/to/manifest.edn 201310\n```\n\nor\n\n```bash\njava -cp vessel-0.4.0-SNAPSHOT-standalone.jar clojure.main -m vessel.input path/to/manifest.edn 201310\n```\n\nThis will fetch all files from\n\n`s3://my-s3-bucket/path/to/files/201310/part-*.gz`\n\nand place them locally in\n\n`data/input/path/to/files/201310/part-*.gz`\n\n\n### Step 3 of 5: Process input\n\n\n```bash\nlein run -m vessel.process path/to/manifest.edn 201310\n```\n\nor\n\n```bash\njava -cp vessel-0.4.0-SNAPSHOT-standalone.jar clojure.main -m vessel.process path/to/manifest.edn 201310\n```\n\nThis will process \u0026 transform all lines from files in\n\n`data/input/path/to/input/201310/part-*.gz`\n\nand place them into files of the form\n\n`data/output/paths/:column-01/:column-02/:yyyymmdd/:yyyymmdd_:yyyymmdd-000NN.csv.gz`\n\n\n### Step 4 of 5: Put output to other S3 directories\n\n\n```bash\nlein run -m vessel.output path/to/manifest.edn 201310\n```\n\nor\n\n```bash\njava -cp vessel-0.4.0-SNAPSHOT-standalone.jar clojure.main -m vessel.output path/to/manifest.edn 201310\n```\n\nThis will put all files from\n\n`data/output/paths/:column-01/:column-02/:yyyymmdd/:yyyymmdd_:yyyymmdd-000NN.csv.gz`\n\nto S3 keys of the form\n\n`s3://my-s3-bucket/paths/:column-01/:column-02/:yyyymmdd/:yyyymmdd_:yyyymmdd-000NN.csv.gz`\n\n\n### Step 5 of 5: Decommission EC2 Vessel\n\n\n```bash\nlein run -m vessel.decommission path/to/manifest.edn 201310\n```\n\nor\n\n```bash\njava -cp vessel-0.4.0-SNAPSHOT-standalone.jar clojure.main -m vessel.decommission path/to/manifest.edn 201310\n```\n\nLicense\n-------\n\nCopyright © 2013 Movable, Inc.\n\nDistributed under the [Eclipse Public License](https://raw.github.com/movableink/vessel/master/LICENSE), the same as Clojure.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmovableink%2Fvessel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmovableink%2Fvessel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmovableink%2Fvessel/lists"}