{"id":21567468,"url":"https://github.com/szajbus/saxy","last_synced_at":"2025-04-10T13:21:37.810Z","repository":{"id":4305759,"uuid":"5437997","full_name":"szajbus/saxy","owner":"szajbus","description":"Memory-efficient XML parser. Finds object definitions in XML and translates them into Ruby objects.","archived":false,"fork":false,"pushed_at":"2019-02-14T15:59:45.000Z","size":120,"stargazers_count":13,"open_issues_count":1,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-06T09:44:01.353Z","etag":null,"topics":["parser","ruby","sax","xml"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/szajbus.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-08-16T11:11:00.000Z","updated_at":"2019-07-08T18:17:34.000Z","dependencies_parsed_at":"2022-08-29T21:22:24.292Z","dependency_job_id":null,"html_url":"https://github.com/szajbus/saxy","commit_stats":null,"previous_names":["humante/saxy"],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/szajbus%2Fsaxy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/szajbus%2Fsaxy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/szajbus%2Fsaxy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/szajbus%2Fsaxy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/szajbus","download_url":"https://codeload.github.com/szajbus/saxy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248225661,"owners_count":21068078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["parser","ruby","sax","xml"],"created_at":"2024-11-24T10:31:05.963Z","updated_at":"2025-04-10T13:21:37.775Z","avatar_url":"https://github.com/szajbus.png","language":"Ruby","readme":"# Saxy\n\n[![Gem Version](https://badge.fury.io/rb/saxy.svg)](https://badge.fury.io/rb/saxy)\n[![Build Status](https://api.travis-ci.org/szajbus/saxy.svg)](http://travis-ci.org/szajbus/saxy)\n\nMemory-efficient XML parser. Finds object definitions in XML and translates them into Ruby hashes.\n\nIt uses SAX parser (provided by Nokogiri gem) under the hood, which means that it doesn't load the whole XML file into memory. It goes once through it and yields hashes along the way.\n\nIn result the memory footprint of the parser remains small and more or less constant irrespective of the size of the XML file, be it few KB or hundreds of GB.\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n    gem 'saxy'\n\nAnd then execute:\n\n    $ bundle\n\nOr install it yourself as:\n\n    $ gem install saxy\n\n## Requirements\n\nAs of `0.5.0` version `saxy` requires ruby 1.9.3 or higher. Previous versions of the gem work with ruby 1.8 and 1.9.2 (see below), but they are not maintained anymore.\n\n### Ruby 1.8 support\n\nSee `ruby-1.8` branch. Install with:\n\n    gem 'saxy', '~\u003e 0.3.0'\n\n### Ruby 1.9.2 support\n\nSee `ruby-1.9.2` branch. Install with:\n\n    gem 'saxy', '~\u003e 0.4.0'\n\n## Changelog\n\nSee `CHANGELOG.md` file.\n\n## Usage\n\nYou instantiate the parser by passing path to XML file or an IO-like object, object-identifying tag name and options hash (optionally) as its arguments.\n\n```ruby\nparser = Saxy.parse(path_or_io, object_tag, options = {})\n```\n\nThen iterate over it using `each` (or any of convenient methods provided by `Enumerable` mix-in).\n\n```ruby\nparser.each do |object|\n  ...\nend\n```\n\n### Options\n\n* `encoding` - Forces the parser to work in given encoding\n* `recovery` - Should this parser recover from structural errors? It will not stop processing file on structural errors if set to `true`.\n* `replace_entities` - Should this parser replace entities? `\u0026amp;` will get converted to `\u0026` if set to `true`.\n* `error_handler` - If set to a callable, parser will call it with any error it encounters instead of raising exceptions.\n\nCombination of `error_handler` and `recovery` options allows for continued processing when encountering recoverable errors (e.g. unescaped [predefined entities](https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML)).\n\n```ruby\nerror_handler = proc { |e| $stderr.puts \"#{e.message} at line #{e.context.line}, column #{e.context.column}.\" }\nSaxy.parse(path_or_io, object_tag, error_handler: error_handler, recovery: true) { ... }\n```\n\n## Example\n\nAssume the XML file (an imaginary product feed):\n\n````xml\n\u003c?xml version='1.0' encoding='UTF-8'?\u003e\n\u003cwebstore\u003e\n  \u003cname\u003eAmazon\u003c/name\u003e\n  \u003cproducts\u003e\n    \u003cproduct\u003e\n      \u003cname\u003eKindle - The world's best-selling e-reader.\u003c/name\u003e\n      \u003cimages\u003e\n        \u003cthumbSize width=\"80\" height=\"60\"\u003ehttp://amazon.com/kindle_thumb.jpg\u003c/thumbSize\u003e\n      \u003c/images\u003e\n    \u003c/product\u003e\n    \u003cproduct\u003e\n      \u003cname\u003eKindle Touch - Simple-to-use touchscreen with built-in WIFI.\u003c/name\u003e\n      \u003cimages\u003e\n        \u003cthumbSize width=\"120\" height=\"90\"\u003ehttp://amazon.com/kindle_touch_thumb.jpg\u003c/thumbSize\u003e\n      \u003c/images\u003e\n    \u003c/product\u003e\n  \u003c/products\u003e\n\u003c/webstore\u003e\n````\n\nThe following will parse the XML, find product definitions (inside `\u003cproduct\u003e` and `\u003c/product\u003e` tags), build `Hash`es and yield them inside the block.\n\nUsage with a file path:\n\n````ruby\nSaxy.parse(\"filename.xml\", \"product\").each do |product|\n  puts product[\"name\"]\n  puts product[\"images\"][\"thumb_size\"][\"contents\"]\n  puts \"#{product[\"images\"][\"thumb_size\"][\"width\"]}x#{product[\"images\"][\"thumb_size\"][\"height\"]}\"\nend\n\n# =\u003e\n\"Kindle - The world's best-selling e-reader.\"\n\"http://amazon.com/kindle_thumb.jpg\"\n\"80x60\"\n\"Kindle Touch - Simple-to-use touchscreen with built-in WIFI.\"\n\"http://amazon.com/kindle_touch_thumb.jpg\"\n\"120x90\"\n````\n\nUsage with an IO-like object `ARGF` or `$stdin`:\n\n````ruby\n# \u003e cat filename.xml | ruby this_script.rb\nSaxy.parse(ARGF, \"product\").each do |product|\n  puts product[\"name\"]\nend\n\n# =\u003e\n\"Kindle - The world's best-selling e-reader.\"\n````\n\nSaxy supports Enumerable, so you can use its goodies to your comfort without building intermediate arrays:\n\n````ruby\nSaxy.parse(\"filename.xml\", \"product\").map do |object|\n  # map yielded Hash to ActiveRecord instances, etc.\nend\n````\n\nYou can also grab an Enumerator for external use (e.g. lazy evaluation, etc.):\n\n````ruby\nenumerator = Saxy.parse(\"filename.xml\", \"product\").each\nlazy       = Saxy.parse(\"filename.xml\", \"product\").lazy # Ruby 2.0\n````\n\nMultiple definitions of child objects are grouped in arrays:\n\n````ruby\nwebstore = Saxy.parse(\"filename.xml\", \"webstore\").first\nwebstore[\"products\"][\"product\"].size # =\u003e 2\n````\n\n## Debugging\n\nInvalid XML files happen a lot and error messages are not always extremely helpful. In case of a parsing error, some additional information can be retrieved from parser's context.\n\n```ruby\n  begin\n    Saxy.parse(...) { ... }\n  rescue e =\u003e Saxy::ParsingError\n    puts \"#{e.message} at #{e.context.line} line and #{e.context.column}\"\n  end\n```\n\n## Contributing\n\n1. Fork it\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Added some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create new Pull Request\n\n## License\n\nSee `LICENSE.txt` file.\n\n## Author\n\nMichał Szajbe, [@szajbus](https://twitter.com/szajbus), [szajbe.pl](http://szajbe.pl)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fszajbus%2Fsaxy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fszajbus%2Fsaxy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fszajbus%2Fsaxy/lists"}