{"id":13747653,"url":"https://github.com/tatey/conformist","last_synced_at":"2025-11-11T20:31:38.455Z","repository":{"id":59152332,"uuid":"1659959","full_name":"tatey/conformist","owner":"tatey","description":"Bend CSVs to your will with declarative schemas.","archived":false,"fork":false,"pushed_at":"2017-03-18T00:05:00.000Z","size":138,"stargazers_count":60,"open_issues_count":2,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-08-11T03:19:55.682Z","etag":null,"topics":["csv","ruby","scraping"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tatey.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-04-25T13:37:15.000Z","updated_at":"2024-06-25T14:00:57.000Z","dependencies_parsed_at":"2022-09-13T11:00:49.402Z","dependency_job_id":null,"html_url":"https://github.com/tatey/conformist","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/tatey/conformist","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tatey%2Fconformist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tatey%2Fconformist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tatey%2Fconformist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tatey%2Fconformist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tatey","download_url":"https://codeload.github.com/tatey/conformist/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tatey%2Fconformist/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272366808,"owners_count":24922221,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-27T02:00:09.397Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","ruby","scraping"],"created_at":"2024-08-03T06:01:36.472Z","updated_at":"2025-11-11T20:31:38.424Z","avatar_url":"https://github.com/tatey.png","language":"Ruby","readme":"# Conformist\n\n[![Build Status](https://secure.travis-ci.org/tatey/conformist.png)](http://travis-ci.org/tatey/conformist)\n[![Code Climate](https://codeclimate.com/github/tatey/conformist.png)](https://codeclimate.com/github/tatey/conformist)\n\nBend CSVs to your will with declarative schemas. Map one or many columns, preprocess cells and lazily enumerate. Declarative schemas are easier to understand, quicker to setup and independent of I/O. Use [CSV](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html) (Formally [FasterCSV](https://rubygems.org/gems/fastercsv)), [Spreadsheet](https://rubygems.org/gems/spreadsheet) or any array of array-like data structure.\n\n![](http://f.cl.ly/items/00191n3O1J2E1a342F1L/conformist.jpg)\n\n## Quick and Dirty Examples\n\nOpen a CSV file and declare a schema. A schema compromises of columns. A column takes an arbitrary name followed by its position in the input. A column may be derived from multiple positions.\n\n``` ruby\nrequire 'conformist'\nrequire 'csv'\n\ncsv    = CSV.open '~/transmitters.csv'\nschema = Conformist.new do\n  column :callsign, 1\n  column :latitude, 1, 2, 3\n  column :longitude, 3, 4, 5\n  column :name, 0 do |value|\n    value.upcase\n  end\nend\n```\n\nInsert the transmitters into a SQLite database.\n\n``` ruby\nrequire 'sqlite3'\n\ndb = SQLite3::Database.new 'transmitters.db'\nschema.conform(csv).each do |transmitter|\n  db.execute \"INSERT INTO transmitters (callsign, ...) VALUES ('#{transmitter.callsign}', ...);\"\nend\n```\n\nOnly insert the transmitters with the name \"Mount Cooth-tha\" using ActiveRecord or DataMapper.\n\n``` ruby\ntransmitters = schema.conform(csv).select do |transmitter|\n  transmitter.name == 'Mount Coot-tha'\nend\ntransmitters.each do |transmitter|\n  Transmitter.create! transmitter.attributes\nend\n```\n\nSource from multiple, different input files and insert transmitters together into a single database.\n\n``` ruby\nrequire 'conformist'\nrequire 'csv'\nrequire 'sqlite3'\n\nau_schema = Conformist.new do\n  column :callsign, 8\n  column :latitude, 10\nend\nus_schema = Conformist.new do\n  column :callsign, 1\n  column :latitude, 1, 2, 3\nend\n\nau_csv = CSV.open '~/au/transmitters.csv'\nus_csv = CSV.open '~/us/transmitters.csv'\n\ndb = SQLite3::Database.new 'transmitters.db'\n\n[au_schema.conform(au_csv), us_schema.conform(us_csv)].each do |schema|\n  schema.each do |transmitter|\n    db.execute \"INSERT INTO transmitters (callsign, ...) VALUES ('#{transmitter.callsign}', ...);\"\n  end\nend\n```\n\nOpen a Microsoft Excel spreadsheet and declare a schema.\n\n``` ruby\nrequire 'conformist'\nrequire 'spreadsheet'\n\nbook   = Spreadsheet.open '~/states.xls'\nsheet  = book.worksheet 0\nschema = Conformist.new do\n  column :state, 0, 1 do |values|\n    \"#{values.first}, #{values.last}\"\n  end\n  column :capital, 2\nend\n```\n\nPrint each state's attributes to standard out.\n\n``` ruby\nschema.conform(sheet).each do |state|\n  $stdout.puts state.attributes\nend\n```\n\nFor more examples see [test/fixtures](https://github.com/tatey/conformist/tree/master/test/fixtures), [test/schemas](https://github.com/tatey/conformist/tree/master/test/schemas) and [test/unit/integration_test.rb](https://github.com/tatey/conformist/blob/master/test/unit/integration_test.rb).\n\n## Installation\n\nConformist is available as a gem. Install it at the command line.\n\n``` sh\n$ [sudo] gem install conformist\n```\n\nOr add it to your Gemfile and run `$ bundle install`.\n\n``` ruby\ngem 'conformist'\n```\n\n## Usage\n\n### Anonymous Schema\n\nAnonymous schemas are quick to declare and don't have the overhead of creating an explicit class.\n\n``` ruby\ncitizen = Conformist.new do\n  column :name, 0, 1\n  column :email, 2\nend\n\ncitizen.conform [['Tate', 'Johnson', 'tate@tatey.com']]\n```\n\n### Class Schema\n\nClass schemas are explicit. Class schemas were the only type available in earlier versions of Conformist.\n\n``` ruby\nclass Citizen\n  extend Conformist\n\n  column :name, 0, 1\n  column :email, 2\nend\n\nCitizen.conform [['Tate', 'Johnson', 'tate@tatey.com']]\n```\n\n### Implicit Indexing\n\nColumn indexes are implicitly incremented when the index argument is omitted. Implicit indexing is all or nothing.\n\n``` ruby\ncolumn :account_number                              # =\u003e 0\ncolumn :date { |v| Time.new *v.split('/').reverse } # =\u003e 1\ncolumn :description                                 # =\u003e 2\ncolumn :debit                                       # =\u003e 3\ncolumn :credit                                      # =\u003e 4\n```\n\n### Conform\n\nConform is the principle method for lazily applying a schema to the given input.\n\n``` ruby\nenumerator = schema.conform CSV.open('~/file.csv')\nenumerator.each do |row|\n  puts row.attributes\nend\n```\n\n#### Input\n\n`#conform` expects any object that responds to `#each` to return an array-like object.\n\n``` ruby\nCSV.open('~/file.csv').responds_to? :each # =\u003e true\n[[], [], []].responds_to? :each           # =\u003e true\n```\n\n#### Header Row\n\n`#conform` takes an option to skip the first row of input. Given a typical CSV document,\nthe first row is the header row and irrelevant for enumeration.\n\n``` ruby\nschema.conform CSV.open('~/file_with_headers.csv'), :skip_first =\u003e true\n```\n\n#### Named Columns\n\nStrings can be used as column indexes instead of integers. These strings will be matched\nagainst the first row to determine the appropriate numerical index.\n\n``` ruby\ncitizen = Conformist.new do\n  column :email, 'EM'\n  column :name, 'FN', 'LN'\nend\n\ncitizen.conform [['FN', 'LN', 'EM'], ['Tate', 'Johnson', 'tate@tatey.com']], :skip_first =\u003e true\n```\n\n#### Enumerator\n\n`#conform` is lazy, returning an [Enumerator](http://www.ruby-doc.org/core-1.9.3/Enumerator.html). Input is not parsed until you call `#each`, `#map` or any method defined in [Enumerable](http://www.ruby-doc.org/core-1.9.3/Enumerable.html). That means schemas can be assigned now and evaluated later. `#each` has the lowest memory footprint because it does not build a collection.\n\n#### Struct\n\nThe argument passed into the block is a struct-like object. You can access columns as methods or keys. Columns were only accessible as keys in earlier versions of Conformist. Methods are now the preferred syntax.\n\n``` ruby\ncitizen[:name] # =\u003e \"Tate Johnson\"\ncitizen.name   # =\u003e \"Tate Johnson\"\n```\n\nFor convenience the `#attributes` method returns a hash of key-value pairs suitable for creating ActiveRecord or DataMapper records.\n\n``` ruby\ncitizen.attributes # =\u003e {:name =\u003e \"Tate Johnson\", :email =\u003e \"tate@tatey.com\"}\n```\n\n### One Column\n\nMaps the first column in the input file to `:first_name`. Column indexing starts at zero.\n\n``` ruby\ncolumn :first_name, 0\n```\n\n### Many Columns\n\nMaps the first and second columns in the input file to `:name`.\n\n``` ruby\ncolumn :name, 0, 1\n```\n\nIndexing is completely arbitrary and you can map any combination.\n\n``` ruby\ncolumn :name_and_city 0, 1, 2\n```\n\nMany columns are implicitly concatenated. Behaviour can be changed by passing a block. See *preprocessing*.\n\n### Preprocessing\n\nSometimes values need to be manipulated before they're conformed. Passing a block gets access to values. The return value of the block becomes the conformed output.\n\n``` ruby\ncolumn :name, 0, 1 do |values|\n  values.map(\u0026:upcase) * ' '\nend\n```\n\nWorks with one column too. Instead of getting a collection of objects, one object is passed to the block.\n\n``` ruby\ncolumn :first_name, 0 do |value|\n  value.upcase\nend\n```\n\nIt's also possible to provide a context object that is made available during preprocessing.\n\n``` ruby\ncitizen = Conformist.new do\n  column :name, 0, 1 do |values, context|\n    (context[:upcase?] ? values.map(\u0026:upcase) : values) * ' '\n  end\nend\n\ncitizen.conform [['tate', 'johnson']], context: {upcase?: true}\n```\n\n### Virtual Columns\n\nVirtual columns are not sourced from input. Omit the index to create a virtual column. Like real columns, virtual columns are included in the conformed output.\n\n``` ruby\ncolumn :day do\n  1\nend\n```\n\n### Inheritance\n\nInheriting from a schema gives access to all of the parent schema's columns.\n\n#### Anonymous Schema\n\nAnonymous inheritance takes inspiration from Ruby's syntax for [instantiating new classes](http://ruby-doc.org/core-1.9.3/Class.html#method-c-new).\n\n``` ruby\nparent = Conformist.new do\n  column :name, 0, 1\nend\n\nchild = Conformist.new parent do\n  column :category do\n    'Child'\n  end\nend\n```\n\n#### Class Schema\n\nClassical inheritance works as expected.\n\n``` ruby\nclass Parent\n  extend Conformist\n\n  column :name, 0, 1\nend\n\nclass Child \u003c Parent\n  column :category do\n    'Child'\n  end\nend\n```\n\n## Upgrading from \u003c= 0.0.3 to \u003e= 0.1.0\n\nWhere previously you had\n\n``` ruby\nclass Citizen\n  include Conformist::Base\n\n  column :name, 0, 1\nend\n\nCitizen.load('~/file.csv').foreach do |citizen|\n  # ...\nend\n```\n\nYou should now do\n\n``` ruby\nrequire 'fastercsv'\n\nclass Citizen\n  extend Conformist\n\n  column :name, 0, 1\nend\n\nCitizen.conform(FasterCSV.open('~/file.csv')).each do |citizen|\n  # ...\nend\n```\n\nSee CHANGELOG.md for a full list of changes.\n\n## Compatibility\n\n* MRI 2.4.0, 2.3.1, 2.2.0, 2.1.0, 2.0.0, 1.9.3\n* JRuby\n\n## Dependencies\n\nNo explicit dependencies, although `CSV` and `Spreadsheet` are commonly used.\n\n## Contributing\n\n1. Fork\n2. Install dependancies by running `$ bundle install`\n3. Write tests and code\n4. Make sure the tests pass locally by running `$ bundle exec rake`\n5. Push to GitHub and make sure continuous integration tests pass at\n   https://travis-ci.org/tatey/conformist/pull_requests\n5. Send a pull request on GitHub\n\nPlease do not increment the version number in `lib/conformist/version.rb`.\nThe version number will be incremented by the maintainer after the patch\nis accepted.\n\n## Motivation\n\nMotivation for this project came from the desire to simplify importing data from various government organisations into [Antenna Mate](http://antennamate.com). The data from each government was similar, but had completely different formatting. Some pieces of data needed preprocessing while others simply needed to be concatenated together. Not wanting to write a parser for each new government organisation, I created Conformist.\n\n## Copyright\n\nCopyright © 2016 Tate Johnson. Conformist is released under the MIT license. See LICENSE for details.\n","funding_links":[],"categories":["Ruby"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftatey%2Fconformist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftatey%2Fconformist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftatey%2Fconformist/lists"}