Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tatey/conformist

Bend CSVs to your will with declarative schemas.
https://github.com/tatey/conformist

csv ruby scraping

Last synced: 3 months ago
JSON representation

Bend CSVs to your will with declarative schemas.

Awesome Lists containing this project

README

        

# Conformist

[![Build Status](https://secure.travis-ci.org/tatey/conformist.png)](http://travis-ci.org/tatey/conformist)
[![Code Climate](https://codeclimate.com/github/tatey/conformist.png)](https://codeclimate.com/github/tatey/conformist)

Bend CSVs to your will with declarative schemas. Map one or many columns, preprocess cells and lazily enumerate. Declarative schemas are easier to understand, quicker to setup and independent of I/O. Use [CSV](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html) (Formally [FasterCSV](https://rubygems.org/gems/fastercsv)), [Spreadsheet](https://rubygems.org/gems/spreadsheet) or any array of array-like data structure.

![](http://f.cl.ly/items/00191n3O1J2E1a342F1L/conformist.jpg)

## Quick and Dirty Examples

Open a CSV file and declare a schema. A schema compromises of columns. A column takes an arbitrary name followed by its position in the input. A column may be derived from multiple positions.

``` ruby
require 'conformist'
require 'csv'

csv = CSV.open '~/transmitters.csv'
schema = Conformist.new do
column :callsign, 1
column :latitude, 1, 2, 3
column :longitude, 3, 4, 5
column :name, 0 do |value|
value.upcase
end
end
```

Insert the transmitters into a SQLite database.

``` ruby
require 'sqlite3'

db = SQLite3::Database.new 'transmitters.db'
schema.conform(csv).each do |transmitter|
db.execute "INSERT INTO transmitters (callsign, ...) VALUES ('#{transmitter.callsign}', ...);"
end
```

Only insert the transmitters with the name "Mount Cooth-tha" using ActiveRecord or DataMapper.

``` ruby
transmitters = schema.conform(csv).select do |transmitter|
transmitter.name == 'Mount Coot-tha'
end
transmitters.each do |transmitter|
Transmitter.create! transmitter.attributes
end
```

Source from multiple, different input files and insert transmitters together into a single database.

``` ruby
require 'conformist'
require 'csv'
require 'sqlite3'

au_schema = Conformist.new do
column :callsign, 8
column :latitude, 10
end
us_schema = Conformist.new do
column :callsign, 1
column :latitude, 1, 2, 3
end

au_csv = CSV.open '~/au/transmitters.csv'
us_csv = CSV.open '~/us/transmitters.csv'

db = SQLite3::Database.new 'transmitters.db'

[au_schema.conform(au_csv), us_schema.conform(us_csv)].each do |schema|
schema.each do |transmitter|
db.execute "INSERT INTO transmitters (callsign, ...) VALUES ('#{transmitter.callsign}', ...);"
end
end
```

Open a Microsoft Excel spreadsheet and declare a schema.

``` ruby
require 'conformist'
require 'spreadsheet'

book = Spreadsheet.open '~/states.xls'
sheet = book.worksheet 0
schema = Conformist.new do
column :state, 0, 1 do |values|
"#{values.first}, #{values.last}"
end
column :capital, 2
end
```

Print each state's attributes to standard out.

``` ruby
schema.conform(sheet).each do |state|
$stdout.puts state.attributes
end
```

For more examples see [test/fixtures](https://github.com/tatey/conformist/tree/master/test/fixtures), [test/schemas](https://github.com/tatey/conformist/tree/master/test/schemas) and [test/unit/integration_test.rb](https://github.com/tatey/conformist/blob/master/test/unit/integration_test.rb).

## Installation

Conformist is available as a gem. Install it at the command line.

``` sh
$ [sudo] gem install conformist
```

Or add it to your Gemfile and run `$ bundle install`.

``` ruby
gem 'conformist'
```

## Usage

### Anonymous Schema

Anonymous schemas are quick to declare and don't have the overhead of creating an explicit class.

``` ruby
citizen = Conformist.new do
column :name, 0, 1
column :email, 2
end

citizen.conform [['Tate', 'Johnson', '[email protected]']]
```

### Class Schema

Class schemas are explicit. Class schemas were the only type available in earlier versions of Conformist.

``` ruby
class Citizen
extend Conformist

column :name, 0, 1
column :email, 2
end

Citizen.conform [['Tate', 'Johnson', '[email protected]']]
```

### Implicit Indexing

Column indexes are implicitly incremented when the index argument is omitted. Implicit indexing is all or nothing.

``` ruby
column :account_number # => 0
column :date { |v| Time.new *v.split('/').reverse } # => 1
column :description # => 2
column :debit # => 3
column :credit # => 4
```

### Conform

Conform is the principle method for lazily applying a schema to the given input.

``` ruby
enumerator = schema.conform CSV.open('~/file.csv')
enumerator.each do |row|
puts row.attributes
end
```

#### Input

`#conform` expects any object that responds to `#each` to return an array-like object.

``` ruby
CSV.open('~/file.csv').responds_to? :each # => true
[[], [], []].responds_to? :each # => true
```

#### Header Row

`#conform` takes an option to skip the first row of input. Given a typical CSV document,
the first row is the header row and irrelevant for enumeration.

``` ruby
schema.conform CSV.open('~/file_with_headers.csv'), :skip_first => true
```

#### Named Columns

Strings can be used as column indexes instead of integers. These strings will be matched
against the first row to determine the appropriate numerical index.

``` ruby
citizen = Conformist.new do
column :email, 'EM'
column :name, 'FN', 'LN'
end

citizen.conform [['FN', 'LN', 'EM'], ['Tate', 'Johnson', '[email protected]']], :skip_first => true
```

#### Enumerator

`#conform` is lazy, returning an [Enumerator](http://www.ruby-doc.org/core-1.9.3/Enumerator.html). Input is not parsed until you call `#each`, `#map` or any method defined in [Enumerable](http://www.ruby-doc.org/core-1.9.3/Enumerable.html). That means schemas can be assigned now and evaluated later. `#each` has the lowest memory footprint because it does not build a collection.

#### Struct

The argument passed into the block is a struct-like object. You can access columns as methods or keys. Columns were only accessible as keys in earlier versions of Conformist. Methods are now the preferred syntax.

``` ruby
citizen[:name] # => "Tate Johnson"
citizen.name # => "Tate Johnson"
```

For convenience the `#attributes` method returns a hash of key-value pairs suitable for creating ActiveRecord or DataMapper records.

``` ruby
citizen.attributes # => {:name => "Tate Johnson", :email => "[email protected]"}
```

### One Column

Maps the first column in the input file to `:first_name`. Column indexing starts at zero.

``` ruby
column :first_name, 0
```

### Many Columns

Maps the first and second columns in the input file to `:name`.

``` ruby
column :name, 0, 1
```

Indexing is completely arbitrary and you can map any combination.

``` ruby
column :name_and_city 0, 1, 2
```

Many columns are implicitly concatenated. Behaviour can be changed by passing a block. See *preprocessing*.

### Preprocessing

Sometimes values need to be manipulated before they're conformed. Passing a block gets access to values. The return value of the block becomes the conformed output.

``` ruby
column :name, 0, 1 do |values|
values.map(&:upcase) * ' '
end
```

Works with one column too. Instead of getting a collection of objects, one object is passed to the block.

``` ruby
column :first_name, 0 do |value|
value.upcase
end
```

It's also possible to provide a context object that is made available during preprocessing.

``` ruby
citizen = Conformist.new do
column :name, 0, 1 do |values, context|
(context[:upcase?] ? values.map(&:upcase) : values) * ' '
end
end

citizen.conform [['tate', 'johnson']], context: {upcase?: true}
```

### Virtual Columns

Virtual columns are not sourced from input. Omit the index to create a virtual column. Like real columns, virtual columns are included in the conformed output.

``` ruby
column :day do
1
end
```

### Inheritance

Inheriting from a schema gives access to all of the parent schema's columns.

#### Anonymous Schema

Anonymous inheritance takes inspiration from Ruby's syntax for [instantiating new classes](http://ruby-doc.org/core-1.9.3/Class.html#method-c-new).

``` ruby
parent = Conformist.new do
column :name, 0, 1
end

child = Conformist.new parent do
column :category do
'Child'
end
end
```

#### Class Schema

Classical inheritance works as expected.

``` ruby
class Parent
extend Conformist

column :name, 0, 1
end

class Child < Parent
column :category do
'Child'
end
end
```

## Upgrading from <= 0.0.3 to >= 0.1.0

Where previously you had

``` ruby
class Citizen
include Conformist::Base

column :name, 0, 1
end

Citizen.load('~/file.csv').foreach do |citizen|
# ...
end
```

You should now do

``` ruby
require 'fastercsv'

class Citizen
extend Conformist

column :name, 0, 1
end

Citizen.conform(FasterCSV.open('~/file.csv')).each do |citizen|
# ...
end
```

See CHANGELOG.md for a full list of changes.

## Compatibility

* MRI 2.4.0, 2.3.1, 2.2.0, 2.1.0, 2.0.0, 1.9.3
* JRuby

## Dependencies

No explicit dependencies, although `CSV` and `Spreadsheet` are commonly used.

## Contributing

1. Fork
2. Install dependancies by running `$ bundle install`
3. Write tests and code
4. Make sure the tests pass locally by running `$ bundle exec rake`
5. Push to GitHub and make sure continuous integration tests pass at
https://travis-ci.org/tatey/conformist/pull_requests
5. Send a pull request on GitHub

Please do not increment the version number in `lib/conformist/version.rb`.
The version number will be incremented by the maintainer after the patch
is accepted.

## Motivation

Motivation for this project came from the desire to simplify importing data from various government organisations into [Antenna Mate](http://antennamate.com). The data from each government was similar, but had completely different formatting. Some pieces of data needed preprocessing while others simply needed to be concatenated together. Not wanting to write a parser for each new government organisation, I created Conformist.

## Copyright

Copyright © 2016 Tate Johnson. Conformist is released under the MIT license. See LICENSE for details.