{"id":16701655,"url":"https://github.com/coryodaniel/ingestor","last_synced_at":"2026-03-17T09:32:07.039Z","repository":{"id":6204116,"uuid":"7434993","full_name":"coryodaniel/ingestor","owner":"coryodaniel","description":"Tool for ingesting plain text files in ActiveRecord","archived":false,"fork":false,"pushed_at":"2014-01-09T19:42:23.000Z","size":276,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-02-27T11:43:57.651Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coryodaniel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-01-04T04:04:44.000Z","updated_at":"2023-03-30T17:58:43.000Z","dependencies_parsed_at":"2022-08-20T22:00:34.350Z","dependency_job_id":null,"html_url":"https://github.com/coryodaniel/ingestor","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/coryodaniel/ingestor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coryodaniel%2Fingestor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coryodaniel%2Fingestor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coryodaniel%2Fingestor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coryodaniel%2Fingestor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coryodaniel","download_url":"https://codeload.github.com/coryodaniel/ingestor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coryodaniel%2Fingestor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30620673,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T08:10:05.930Z","status":"ssl_error","status_checked_at":"2026-03-17T08:10:04.972Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-12T18:45:03.955Z","updated_at":"2026-03-17T09:32:07.002Z","avatar_url":"https://github.com/coryodaniel.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ingestor\n\nA simple DSL for importing data from text and csv files to ActiveRecord. This was originally designed to \ncontinually import changing data from EAN and Geonames.\n\nGreat for parsing JSON, XML, CSV and plaint text into ActiveRecord, if you\nneed to scrape HTML into ActiveRecord check out [klepto](http://github.com/coryodaniel/klepto).\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n    gem 'ingestor'\n\nAnd then execute:\n\n    $ bundle install\n\nOr install it yourself as:\n\n    $ gem install ingestor\n\nAdd the following to your Rakefile\n    require 'ingestor/tasks'\n    \n## Usage\n\n  Given a text file:\n\n    id|name|population\n    1|China|1,354,040,000\n    2|India|1,210,193,422\n    3|United States|315,550,000\n\n  And an AR Class:\n  ```ruby\n  class Country\n    attr_accessible :name, :population\n  end\n  ```\n\n  Sync the file with AR:\n  ```ruby\n  ingest(\"path/to/countries.txt\") do\n    map_attributes do |values|\n      {\n        id:           values[0],\n        name:         values[1],\n        population:  values[2]\n      }\n    end\n\n    # current lines values\n    finder{|attrs| \n      Country.where(id: attrs[:id]).first || Country.new\n    }\n  end\n  ```\n\n  It can handle remote files and zip files as well.\n  ```ruby\n  ingest(\"http://example.com/a_lot_of_countries.zip\") do\n    compressed true\n    map_attributes do |values|\n      {\n        id:           values[0],\n        name:         values[1],\n        population:  values[2]\n      }\n    end\n\n    # current lines values\n    finder{|attrs| \n      Country.where(id: attrs[:id]).first || Country.new\n    }\n  end\n  ```\n\n  It can handle XML, JSON, and more... \n  ```ruby\n  require 'ingestor/parser/xml'\n  ingest(\"http://example.com/books.xml\") do\n    parser :xml\n    parser_options xpath: '//book'\n    map_attributes do |values|\n      {\n        id:           values['id'],\n        title:        values['title'],\n        author:       {\n          name: values['author']\n        }\n      }\n    end\n\n    # current lines values\n    finder{|attrs| \n      Book.where(id: attrs[:id]).first || Book.new\n    }\n\n    processor{|attrs,record|\n      record.update_attributes(attrs)\n      record.reviews.create({\n        stars: 5,\n        comment: \"Every book they sell is so great!\"\n      })\n    }\n  end  \n  ```\n\n  CSV Example\n  ```ruby\n  require 'ingestor/parser/csv'\n  ingest \"./samples/contracts.csv\" do\n    parser :csv\n    \n    # all options come directly from Ruby core CSV class\n    parser_options :headers =\u003e true,\n      :col_sep            =\u003e \",\",\n      :row_sep            =\u003e :auto,\n      :quote_char         =\u003e '\"',\n      :field_size_limit   =\u003e nil,\n      :converters         =\u003e nil,\n      :unconverted_fields =\u003e nil,\n      :return_headers     =\u003e false,\n      :header_converters  =\u003e nil,\n      :skip_blanks        =\u003e false,\n      :force_quotes       =\u003e false    \n\n    # How to map out the columns from text to AR\n    map_attributes do |row|\n      {\n        id:                 row[0],\n        seller_name:        row[1],\n        customer_name:      row[2],\n        commencement_date:  row[7],\n        termination_date:   row[8]\n      }\n    end\n    \n    # before{|attrs| attrs}\n    \n    # Your strategy for finding or instantiating a new object to be handled by the processor block\n    finder{|attrs|\n      Contract.new\n    }\n\n    processor{|attrs,record|\n      # ... custom processor here ...\n      record.update_attributes attrs\n    }\n    \n    after{|record| \n      puts \"Created: #{record.summary}\"\n    }\n  end  \n  ```\n\n  JSON Example\n  ```ruby\n  require 'ingestor/parser/json'\n  ingest(\"http://example.com/people.json\") do\n    parser :json\n    parser_options collection: lambda{|document|\n      document['people']\n    }\n    map_attributes do |values|\n      {\n        name:         values[\"first_name\"] + \" \" + values[\"last_name\"]\n        age:          values['age'],\n        address:      values['address']\n      }\n    end\n\n    # current lines values\n    finder{|attrs| \n      Person.where(name: attrs[:name]).first || Person.new\n    }\n\n    processor{|attrs,record|\n      record.update_attributes(attrs)\n      record.send_junk_mail!\n    }\n  end\n  ```  \n\n\n## Advanced Usage\nDSL Options\n  * parser - the parser to use on the file\n    * Symbol\n    * Optional\n    * Default: :plain_text\n    * Available Values: :plain_text, :xml, :json, :csv, :html\n    * See 'Included Parsers' below\n  * parser_options - options for a specific parser\n    * Hash\n    * Optional\n    * Default: set per parser\n    * See 'Included Parsers' below\n  * sample - dump a single raw entry from the file to STDOUT and exit\n    * Boolean \n    * Optional\n    * Default: false\n    (defaults: false) will \n  * includes_header - Tells the parser that the first line is a header and should be ignored\n    * Boolean\n    * Optional\n    * Default: false\n  * compressed - Should the file be decompressed\n    * Boolean\n    * Optional\n    * Default: false\n  * working_directory - where to store remote or decompressed files for local processing\n    * String\n    * Optional\n    * Default: /tmp/ingestor\n  * before - callback that receives attributes for each record BEFORE call to [finder]\n    * Proc(attributes)\n    * Optional\n    * Default: nil\n  * finder - Arel finder for each object\n    * Proc(attributes)\n    * Returns: ~ActiveModel\n    * Required\n  * processor - What to do with the attributes and object\n    * Proc(attributes,record)\n    * Returns: ~ActiveModel\n    * Optional\n    * Default: Proc, calls #update_attributes on record without protection\n  * after - callback that receives each record after [processor]\n    * Proc(record)\n    * Optional  \n\n\n## Included Parsers\n\nWriting parsers is simple ([see examples](https://github.com/coryodaniel/ingestor/tree/master/lib/ingestor/parser])).\n\n### Plain Text Parser\n  Parses a plain text document.\n\n  Options\n  * delimiter - how to split up each line\n    * String\n    * Default: '|'\n    * Optional\n  * line\\_processor - override default\\_line\\_processor. The default\\_line\\_processor simply splits the string using the delimiter\n    * Proc(string)\n    * Returns Array\n    * Default: nil\n    * Optional\n\n### XML Parser\n  Parses an XML document\n\n  Options\n  * selector - xpath selector to get the node collection\n    * String\n    * Required\n  * encoding - XML Encoding. See nokogiri encoding\n    * String\n    * Optional\n    * Default libxml2 best guess\n\n### JSON Parser\n  Parses a JSON document\n\n  Options\n  * collection - receives the document and narrows it down to the collection you are interested in\n    * Proc(Hash)\n    * Returns Hash | Array\n    * Required\n\n### CSV Parser\nComing soon...\n\n### HTML Parser\nComing soon...\n\n\n## Contributing\n\n1. Fork it\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Add some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create new Pull Request\n\n## Running Tests\n  \n  1. Copy spec/orm/database.example.yml =\u003e spec/orm/database.yml\n  2. Configure spec/orm/database.yml\n  3. bundle exec guard\n\n\n## Todos\n* Deprecate plain_text (this was the first thing I created)\n* rdoc http://rdoc.rubyforge.org/RDoc/Markup.html\n* Move includes_header to CSV, PlainText\n* Mongoid Support\n* sort/limit options\n* configure travis\n* A way to sample a file without building an ingestor first\n  * bin/ingestor --sample --path=./my.xml --parser xml --parser_options_xpath '//book'","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoryodaniel%2Fingestor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoryodaniel%2Fingestor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoryodaniel%2Fingestor/lists"}