{"id":19730515,"url":"https://github.com/begriffs/aws_pipes","last_synced_at":"2025-10-14T20:11:54.676Z","repository":{"id":6038503,"uuid":"7262870","full_name":"begriffs/aws_pipes","owner":"begriffs","description":"AWS queues à la Unix","archived":false,"fork":false,"pushed_at":"2013-01-03T16:25:00.000Z","size":159,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-14T20:08:12.239Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/begriffs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-12-20T19:13:43.000Z","updated_at":"2015-04-20T21:42:57.000Z","dependencies_parsed_at":"2022-09-10T03:51:28.399Z","dependency_job_id":null,"html_url":"https://github.com/begriffs/aws_pipes","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/begriffs/aws_pipes","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/begriffs%2Faws_pipes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/begriffs%2Faws_pipes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/begriffs%2Faws_pipes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/begriffs%2Faws_pipes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/begriffs","download_url":"https://codeload.github.com/begriffs/aws_pipes/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/begriffs%2Faws_pipes/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279021004,"owners_count":26086946,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T00:16:37.580Z","updated_at":"2025-10-14T20:11:54.657Z","avatar_url":"https://github.com/begriffs.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Overview\n\n### Communication\n\nSend messages between Amazon EC2 instances through Unix pipes.\n\nCommunication in aws_pipes is built on top of the Amazon [Simple Queue\nService](http://aws.amazon.com/sqs/) (SQS) which lets you\n\n- Move data between distributed components of your application without\n  losing messages or requiring each component to be always available.\n- Get started with no extra installed software or special firewall\n  configurations.\n- Connect machines on different networks, developed with different\n  technologies, and running at different times.\n- Save messages in the queue for up to 14 days.\n\nText is the universal interface, and any application that can read and\nwrite text can use this gem \u0026ndash; no knowledge of the Amazon API is\nrequired.\n\n### Logging\n\nConsolidate logs between EC2 instances. Logging in aws_pipes is built on\ntop of Amazon [SimpleDB](http://aws.amazon.com/simpledb/).\n\n- Get logs off individual servers to save disk space.\n- Pool the log messages from related workers.\n- Monitor and query logs from one place.\n- Save as much log history as you want, the storage is virtually\n  unlimited.\n\n### Saving Datasets\n\nSave data across EC2 instances with scalable throughput.\nData archival in aws_pipes is built on top of Amazon\n[DynamoDB](http://aws.amazon.com/dynamodb/).\n\n- Store data centrally.\n- Automatically scale throughput and space.\n- Query results (albeit not relationally).\n- Monitor data acquisition through web control panel.\n- Can export to S3.\n\n## Usage\n\n### aws_queue\n\n    # write data to an SQS queue named \"foo\"\n    your_program | aws_queue write foo\n\n    # read data from an SQS queue named \"foo\"\n    aws_queue read foo | your_program\n\nTo use this program you will need to [create a\nqueue](https://console.aws.amazon.com/sqs/) in the Amazon Web Console.\n\n### aws_log\n\n    # write stderr to log named \"bar\"\n    your_program 2\u003e \u003e(aws_log record bar)\n\n    # delete all messages in log named \"bar\"\n    aws_log delete bar\n\n    # View log entries for \"bar\" within a date range\n    aws_log show bar --after \"1970-01-01\" --before \"2020-02-02 13:42:12.123\"\n\nEach line sent to the log gets marked with a timestamp and the external\nIP address of the machine which added it.\n\nYou can combine queuing and logging in\na single command using Bash [process substitution](\nhttp://www.gnu.org/software/bash/manual/bashref.html#Process-Substitution):\n\n    # write stdout to an SQS queue named \"foo\"\n    # while logging stderr to a log named \"bar\"\n    your_program 1\u003e \u003e(aws_queue write foo) 2\u003e \u003e(aws_log record bar)\n\n### aws_db\n\n    # save each tab-delimited line of as a row in DynamoDB table foo\n    # filling in columns a, b, and c\n    your_program | aws_db foo a b c\n\nDynamoDB tables have adjustable read- and write-throughput settings to\nscale as needed. The `aws_db` command will automatically re-provision\nwrite throughput if writing starts getting throttled. This makes\n`aws_db` (when run in parallel) a way to save virtually unlimited\namounts of data as quickly as necessary.\n\n## Installation\n\n1. Sign up for an [AWS account](http://aws.amazon.com/).\n1. Find your secret key and key id in *My Account* \u003e *Security Credentials*.\n1. (optionally) Set your environment variables AWS_ACCESS_KEY_ID, and\n   AWS_ACCESS_KEY accordingly.\n1. Run `gem install aws_pipes` from the command line.\n\nThis will install the `aws_queue` and `aws_log` commands to your path.\nIf you haven't stored your Amazon credentials in environment variables,\nyou can pass them in as command line options. For more info, run\n\n    aws_queue --help\n\n## Examples\n\n### Downloading a massive list of urls in parallel.\n\nOne computer can feed a list of urls to workers which download them.\nSuppose the urls are stored in `urls.txt`. Just redirect the file into a\nqueue:\n\n    aws_queue write to_be_downloaded \u003c urls.txt\n\nThen have each worker pull from the `to_be_downloaded` queue and\nrepeatedly run a command to download each url. The queue supports many\nsimultaneous readers and prevents duplicate work. We save any errors to\na log named \"downloader\" which we can monitor remotely.\n\n    aws_queue read to_be_downloaded | xargs -L1 wget -nv 2\u003e \u003e(aws_log record downloader)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbegriffs%2Faws_pipes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbegriffs%2Faws_pipes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbegriffs%2Faws_pipes/lists"}