{"id":17177264,"url":"https://github.com/lexi-lambda/decontaminate","last_synced_at":"2025-04-13T17:09:58.261Z","repository":{"id":62556911,"uuid":"44411198","full_name":"lexi-lambda/decontaminate","owner":"lexi-lambda","description":"A Ruby DSL for extracting data from complicated XML documents","archived":false,"fork":false,"pushed_at":"2015-10-27T18:26:06.000Z","size":192,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-13T17:09:50.696Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lexi-lambda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-10-16T21:20:14.000Z","updated_at":"2015-10-26T21:36:15.000Z","dependencies_parsed_at":"2022-11-03T06:15:22.392Z","dependency_job_id":null,"html_url":"https://github.com/lexi-lambda/decontaminate","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lexi-lambda%2Fdecontaminate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lexi-lambda%2Fdecontaminate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lexi-lambda%2Fdecontaminate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lexi-lambda%2Fdecontaminate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lexi-lambda","download_url":"https://codeload.github.com/lexi-lambda/decontaminate/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248750107,"owners_count":21155686,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T00:03:14.987Z","updated_at":"2025-04-13T17:09:58.239Z","avatar_url":"https://github.com/lexi-lambda.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Decontaminate [![Gem Version](https://badge.fury.io/rb/decontaminate.svg)](https://badge.fury.io/rb/decontaminate) [![Build Status](https://travis-ci.org/lexi-lambda/decontaminate.svg?branch=0.2.0)](https://travis-ci.org/lexi-lambda/decontaminate)\n\nDecontaminate is a tool for extracting information from large, potentially nested XML documents. It provides a simple Ruby DSL for selecting values from Nokogiri objects and storing the results in JSON-like Ruby hashes and arrays.\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'decontaminate'\n```\n\nAnd then execute:\n\n    $ bundle\n\nOr install it yourself as:\n\n    $ gem install decontaminate\n\n## Usage\n\nDecontaminate provides a DSL for creating *decontaminators*, which, when instantiated, accept XML nodes or documents and produce a hash as a result. To start, create a class that inherits from `Decontaminate::Decontaminator`:\n\n```ruby\nclass MyDecontaminator \u003c Decontaminate::Decontaminator\nend\n```\n\nIf parsing an entire document, you should specify the name of the root element:\n\n```ruby\nclass MyDecontaminator \u003c Decontaminate::Decontaminator\n  self.root = 'User'\nend\n```\n\n### Scalar Values\n\nTo select values from the XML document, use the `scalar` class method:\n\n```ruby\nclass MyDecontaminator \u003c Decontaminate::Decontaminator\n  self.root = 'User'\n\n  scalar 'Name'\n  scalar 'Age', type: :integer\n  scalar 'DateRegistered', key: 'registered_at'\nend\n```\n\nThis might produce a result like the following:\n\n```ruby\n=\u003e MyDecontaminator.new(xml_document).as_json\n{\n  'name' =\u003e 'Jane Smith',\n  'age' =\u003e 28,\n  'registered_at' =\u003e '2013-08-16T20:51:34.236Z'\n}\n```\n\nThe first argument to `scalar` is the name of the node to extract data from. In fact, this can be any XPath string relative to the document root. By default, the resulting JSON key is inferred from the provided path, but it can also be overridden with the `key:` argument. Additionally, the type of the scalar can be specified with the `type:` argument, which defaults to `:string`.\n\nAttributes can be specified with XPath syntax by prepending an `@` sign:\n\n```ruby\nscalar '@id', type: :integer\n```\n\n#### Scalar Transformers\n\nIn addition to customization of the parser using the `type:` keyword argument, `scalar` can be provided with a block that will allow custom transformation of the value. It will be supplied with the value as parsed according to the provided type, and the return value will be the result stored in the output.\n\n```ruby\nscalar 'RatingPercentage', key: 'rating_ratio', type: :float do |percentage|\n  percentage \u0026\u0026 percentage / 100.0\nend\n```\n\nTransformer blocks are evaluated in the context of the decontaminator instance, so instance methods can be called. Additionally, it is possible to call instance methods as transformers directly without needing to pass a block by passing the name of the method as the `transformer:` keyword argument.\n\n```ruby\nscalar 'RatingPercentage',\n       key: 'rating_ratio',\n       type: :float,\n       transformer: :percentage_to_ratio\n\ndef percentage_to_ratio(percentage)\n  percentage \u0026\u0026 percentage / 100.0\nend\n```\n\n### Nested Values\n\nIt's also possible to specify nested or even deeply nested hashes with the `hash` class method:\n\n```ruby\nhash 'UserProfile', key: 'profile' do\n  scalar 'Description'\n\n  hash 'Specialization' do\n    scalar 'Area'\n    scalar 'Expertise', type: :float\n  end\nend\n```\n\nThe `hash` method accepts a block, which works just like the class body, but all paths are scoped to the path passed to `hash`. The `key` argument is optional, just like with `scalar`.\n\nSometimes it may be useful to create an additional hash in the output as an organizational tool, even though there is no equivalent nesting in the input XML. In this case, the XPath argument may be omitted, specifying only `key:`.\n\n```ruby\nhash key: 'info' do\n  scalar 'Email'\nend\n```\n\nThis will fetch a value from the `Email` node on the root, but it will be stored in a property within a separate hash, keyed in the result with `'info'`.\n\n### Array Data\n\nIn addition to the `scalar` and `hash` methods, there are plural forms which allow parsing and extracting data that appears many times within a single document. These are named `scalars` and `hashes`, respectively. They work much like their singular counterparts, but the provided path should match multiple elements.\n\nFor example, given the following decontaminator:\n\n```ruby\nclass ArticlesDecontaminator \u003c Decontaminate::Decontaminator\n  hashes 'Articles' do\n    scalar 'Name'\n    scalars 'Tags'\n  end\nend\n```\n\nAnd given the following XML document:\n\n```xml\n\u003cArticles\u003e\n  \u003cArticle\u003e\n    \u003cName\u003eArticle A\u003c/Name\u003e\n    \u003cTags\u003e\n      \u003cTag\u003eNews\u003c/Tag\u003e\n      \u003cTag\u003eTechnology\u003c/Tag\u003e\n    \u003c/Tags\u003e\n  \u003c/Article\u003e\n  \u003cArticle\u003e\n    \u003cName\u003eArticle B\u003c/Name\u003e\n    \u003cTags\u003e\n      \u003cTag\u003eSports\u003c/Tag\u003e\n      \u003cTag\u003eRecreation\u003c/Tag\u003e\n    \u003c/Tags\u003e\n  \u003c/Article\u003e\n\u003c/Articles\u003e\n```\n\nThe resulting object will have the following structure:\n\n```ruby\n{\n  'articles' =\u003e [\n    {\n      'name' =\u003e 'Article A',\n      'tags' =\u003e ['News', 'Technology']\n    },\n    {\n      'name' =\u003e 'Article B',\n      'tags' =\u003e ['Sports', 'Recreation']\n    }\n  ]\n}\n```\n\nThere are some special things to note in the above example:\n\n  - **The name of the individual elements is inferred from the parent key.**\n\n    In both cases, the parent element was the plural form of its children (`Articles`/`Article` and `Tags`/`Tag`). Since this is common, the plural forms automatically perform this name inference.\n\n    Since this behavior is sometimes unwanted, it can be disabled by passing the path as an explicit `path:` keyword argument.\n\n    ```ruby\n    scalars path: 'Tags/TagName', key: 'tags' # Performs no name inference\n    ```\n\n  - **No `root` element was specified since the root element is a plural.**\n\n    When using name inference for a plural element at the root, specifying the root element is an error. By using the explicit `path:` form mentioned above, `root` could still be specified.\n\n    ```ruby\n    self.root = 'Articles'\n    hashes path: 'Article', key: 'articles' do; ...; end\n    ```\n\n### Tuple Data\n\nComplementing `scalar` and `hash` is `tuple`, which accepts multiple paths and returns a fixed-length array containing an element for each path.\n\n```ruby\ntuple ['Height/text()', 'Height/@units'], key: 'height_with_units'\n```\n\nThe `tuple` method is most useful when supplied with a block, which works like `scalar`'s value transformer, but is supplied with an argument for each path. This allows values to be parsed from multiple values in the source document.\n\n```ruby\ntuple ['Height/text()', 'Height/@units'], key: 'height_cm' do |height, units|\n  convert_units height.to_f, from: units, to: 'cm'\nend\n```\n\nTuples also support the shorthand `transformer:` argument that `scalar` and `scalars` support.\n\n### Flattening nested data\n\nSince source data is sometimes more nested than is desired, the `with` method is a helper for scoping decontamination directives to a given XML element without increasing the nesting depth of the resulting object. Like `hash`, it accepts an XPath and a block, but the attributes created from within the block will not be wrapped in a hash.\n\n```ruby\nwith 'Some/Nested/Data' do\n  scalar 'Value'\nend\n```\n\nThere is no plural form for `with` since it would, by necessity, create duplicate keys.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flexi-lambda%2Fdecontaminate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flexi-lambda%2Fdecontaminate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flexi-lambda%2Fdecontaminate/lists"}