{"id":18832656,"url":"https://github.com/datto/pyper","last_synced_at":"2026-01-25T18:30:17.398Z","repository":{"id":79940125,"uuid":"48005326","full_name":"datto/pyper","owner":"datto","description":"Flexible pipelines for data storage and retrieval.","archived":false,"fork":false,"pushed_at":"2015-12-14T22:38:21.000Z","size":88,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-30T07:21:12.977Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datto.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-12-14T22:37:32.000Z","updated_at":"2016-11-21T10:04:37.000Z","dependencies_parsed_at":"2023-03-09T03:31:15.983Z","dependency_job_id":null,"html_url":"https://github.com/datto/pyper","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datto%2Fpyper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datto%2Fpyper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datto%2Fpyper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datto%2Fpyper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datto","download_url":"https://codeload.github.com/datto/pyper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239768927,"owners_count":19693763,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T01:58:39.933Z","updated_at":"2026-01-25T18:30:17.325Z","avatar_url":"https://github.com/datto.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pyper\n\nFlexible pipelines for content storage and retrieval.\n\nPyper allows the construction of pipelines to store and retrieve data. Each pipe in the pipeline modifies the\ninformation in the pipeline before passing it to the next step. By composing pipes in different ways, different\ndata access patterns can be created.\n\n## Usage\n\nRequire the pyper library and the pipes that you need:\n\n```ruby\nrequire 'pyper'\nrequire 'pyper/model'  # Import model-related pipes\nrequire 'pyper/cassandra'  # Import Cassandra-related pipes\nrequire 'pyper/content'    # Import content storage-related pipes\n```\n\nOr, import the entire library using `require 'pyper/all'`\n\nCreate a pipeline composed of a set of pipes:\n\n```ruby\nwrite_pipeline = Pyper::Pipeline.create do\n   add Pyper::Pipes::Write::AttributeSerializer.new\n   add Pyper::Pipes::FieldRename.new(:to =\u003e :to_emails, :from =\u003e :from_email)\n   add Pyper::Pipes::ModKey.new\n   add Pyper::Pipes::Cassandra::Writer.new(:table_1, metadata_client)\n   add Pyper::Pipes::Cassandra::Writer.new(:table_2, indexes_client)\n   add Pyper::Pipes::Cassandra::Writer.new(:table_3, indexes_client)\nend\n```\n\nThen, push data down the pipe:\n\n```ruby\nresult = write_pipeline.push(attributes)\n```\n\nView the value of the set of successive transformations performed by the pipe:\n```ruby\nresult.value\n```\n\nA pipeline performs a bunch of sequential transformations to the data being passed down the pipe. It may also have side\neffects, such as storing data. The specific pipes provided in this library aim are aimed at two uses: writing and\nreading data.\n\nA write pipeline takes an initial set of attributes, performing a set of transfomations such as serialization and so on,\nbefore storing the data in one or more storage outputs. For example, this gem provides storage pipes for Cassandra and\nAmazon S3, but it is easy to write a pipe for other storage backends.\n\nConversely, a read pipeline takes initially a set of options. These options be transformed by the pipeline, and then used\nto read data from an external source. This data may then be transformed by the pipeline - for example, performing\ndeserialization or data mapping operations.\n\n```ruby\nread_pipeline = Pyper::Pipeline.create do\n   add Pyper::Pipes::Cassandra::PaginationDecoding.new\n   add Pyper::Pipes::Cassandra::Reader.new(:table, indexes_client)\n   add Pyper::Pipes::FieldRename.new(:to_emails =\u003e :to, :from_email =\u003e :from)\n   add Pyper::Pipes::Cassandra::PaginationEncoding.new\n   add Pyper::Pipes::Model::VirtusDeserializer.new(message_attributes)\n   add Pyper::Pipes::Model::VirtusParser.new(MyModelClass)\nend\n\nresult = read_pipeline.push(:row =\u003e '1', :id =\u003e 'i', :page_token =\u003e 'sdf')\nresult.value # Enumerator with matching instances of MyModelClass\n```\n\nNote that pipe order matters. In the example read pipe above, `Cassandra::PaginationDecoding` decodes pagination options, thus\nperforming an operation on the initial options provided. The `Cassandra::Reader` pipe uses the options to retrieve items from\nCassandra, and subsequent elements of the pipeline are designed to transform this retrieved data. Thus, it would not be\nsensible for the `Cassandra::PaginationDecoding` pipe to come after the `Cassandra::Reader` pipe.\n\n### Creating and using pipelines\n\nA pipeline is an instance of `Pyper::Pipeline`, to which pipes are appended using the `\u003c\u003c` or `add` operators.\n\n```ruby\nmy_pipeline = Pyper::Pipeline.new \u003c\u003c\n   Pyper::Pipes::Cassandra::PaginationDecoding.new \u003c\u003c\n   Pyper::Pipes::Cassandra::Reader.new(:table, indexes_client) \u003c\u003c\n   Pyper::Pipes::Cassandra::PaginationEncoding.new\n```\n\nHowever, the `create` method makes pipeline construction easier. The above example becomes the following:\n\n```ruby\nmy_pipeline = Pyper::Pipeline.create do\n   add Pyper::Pipes::Cassandra::PaginationDecoding.new\n   add Pyper::Pipes::Cassandra::Reader.new(:table, indexes_client)\n   add Pyper::Pipes::Cassandra::PaginationEncoding.new\nend\n```\n\nTo invoke the pipeline, use the `push` method and provide the data to enter the pipeline:\n\n```ruby\npipe_status = my_pipeline.push(:row =\u003e '1', :id =\u003e 'i')\n```\n\nHere, `pipe_status` is a `Pyper::PipeStatus` object, which contains two attributes, `pipe_status.value` and\n`pipe_status.status`. The value is the returned result of the series of tranformations applied by the pipeline. The status\ncontains metadata about the push operation that might be created by each pipe in the pipeline.\n\n### Creating new pipes\n\nA pipe must implement the `call` method, which takes two arguments: the object entering the pipe, as well as the status. It\nshould return the object leaving the pipe:\n\n```ruby\nclass MyPipe\n  def call(attributes, status = {})\n    attributes[:c] = attributes[:a] + attributes[:b]\n    status[:processed_by_my_pipe] = true\n    attributes\n  end\nend\n```\n\nThis example pipe above modifies `attributes` before returning it. It also sets a flag on the status object.\n\nNote that because the pipe need only respond to `call`, lambdas and procs are valid pipes.\n\nGenerally, pipes in a write pipeline operate on an attributes hash (containing the attributes meant to be written to a data\nstore). Pipes in a read pipeline initially might modify arguments. A data retrieval pipe would then use the arguments to\nfetch data, and subsequent pipes would perform operations on the enumeration of data items. Thus, a read pipe might look\nsomething like:\n\n```ruby\nclass Deserialize\n  def call(items, status = {})\n     items.map { |item| deserialize(item) }\n  end\n\n  def deserialize(item)\n    ...\n  end\nend\n```\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'pyper_rb', :git =\u003e 'git@github.com:backupify/pyper.git'\n```\n\nAnd then execute:\n\n    $ bundle\n\nOr install it yourself as:\n\n    $ gem install pyper_rb\n\n## Contributing\n\n1. Fork it ( https://github.com/backupify/pyper/fork )\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Add some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create a new Pull Request\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatto%2Fpyper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatto%2Fpyper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatto%2Fpyper/lists"}