Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/buren/honey_format
Makes working with CSVs as smooth as honey.
https://github.com/buren/honey_format
csv ruby rubygem
Last synced: 2 months ago
JSON representation
Makes working with CSVs as smooth as honey.
- Host: GitHub
- URL: https://github.com/buren/honey_format
- Owner: buren
- License: mit
- Created: 2015-10-01T22:40:55.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2024-04-16T08:58:22.000Z (9 months ago)
- Last Synced: 2024-10-13T13:36:58.661Z (3 months ago)
- Topics: csv, ruby, rubygem
- Language: Ruby
- Homepage:
- Size: 396 KB
- Stars: 14
- Watchers: 3
- Forks: 2
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# HoneyFormat [![Build Status](https://travis-ci.org/buren/honey_format.svg)](https://travis-ci.org/buren/honey_format) [![Code Climate](https://codeclimate.com/github/buren/honey_format/badges/gpa.svg)](https://codeclimate.com/github/buren/honey_format) [![Inline docs](http://inch-ci.org/github/buren/honey_format.svg)](https://www.rubydoc.info/gems/honey_format/)
> Makes working with CSVs as smooth as honey.
Proper objects for CSV headers and rows, convert column values, filter columns and rows, small(-ish) perfomance overhead, no dependencies other than Ruby stdlib.
## Features
- Proper objects for CSV header and rows
- Convert row and header column values
- Pass your own custom row builder
- Filter what columns and rows are included in CSV output
- Gracefully handle missing and duplicated header columns
- [CLI](#cli) - Simple command line interface
- Only ~5-10% overhead from using Ruby CSV, see [benchmarks](#benchmark)
- Has no dependencies other than Ruby stdlib
- Supports Ruby >= 2.3Read the [usage section](#usage), [RubyDoc](https://www.rubydoc.info/gems/honey_format/) or [examples/ directory](https://github.com/buren/honey_format/tree/master/examples) for how to use this gem.
## Quick use
```ruby
csv_string = <<-CSV
Id,Username,Email
1,buren,[email protected]
2,jacob,[email protected]
CSV
csv = HoneyFormat::CSV.new(csv_string, type_map: { id: :integer })
csv.columns # => [:id, :username, :email]
csv.rows # => [#, #]
user = csv.rows.first
user.id # => 1
user.username # => "buren"csv.to_csv(columns: [:id, :username]) { |row| row.id < 2 }
# => "id,username\n1,buren\n"
```## Installation
Add this line to your application's Gemfile:
```ruby
gem 'honey_format'
```And then execute:
```
$ bundle
```Or install it yourself as:
```
$ gem install honey_format
```## Usage
By default assumes a header in the CSV file
```ruby
csv_string = "Id,Username\n1,buren"
csv = HoneyFormat::CSV.new(csv_string)# Header
header = csv.header
header.original # => ["Id", "Username"]
header.columns # => [:id, :username]# Rows
rows = csv.rows # => [#]
user = rows.first
user.id # => "1"
user.username # => "buren"
```Set delimiter & quote character
```ruby
csv_string = "name;id|'John Doe';42"
csv = HoneyFormat::CSV.new(
csv_string,
delimiter: ';',
row_delimiter: '|',
quote_character: "'",
)
```__Type converters__
> Type converters are great if you want to convert column values, like numbers and dates.
There are a bunch of [default type converters](https://github.com/buren/honey_format/blob/master/lib/honey_format/converters/converters.rb)
```ruby
csv_string = "Id,Username\n1,buren"
type_map = { id: :integer }
csv = HoneyFormat::CSV.new(csv_string, type_map: type_map)
csv.rows.first.id # => 1
```Pass your own
```ruby
csv_string = "Id,Username\n1,buren"
type_map = { username: proc { |v| v.upcase } }
csv = HoneyFormat::CSV.new(csv_string, type_map: type_map)
csv.rows.first.username # => "BUREN"
```Combine multiple converters
```ruby
csv_string = "Id,Username\n1, BuRen "
type_map = { username: [:strip, :downcase] }
csv = HoneyFormat::CSV.new(csv_string, type_map: type_map)
csv.rows.first.username # => "buren"
```Register your own converter
```ruby
HoneyFormat.configure do |config|
config.converter_registry.register :upcased, proc { |v| v.upcase }
endcsv_string = "Id,Username\n1,buren"
type_map = { username: :upcased }
csv = HoneyFormat::CSV.new(csv_string, type_map: type_map)
csv.rows.first.username # => "BUREN"
```Remove registered converter
```ruby
HoneyFormat.configure do |config|
config.converter_registry.unregister :upcase
# now you're free to register your own
config.converter_registry.register :upcase, proc { |v| v.upcase if v }
end
```Access registered converters
```ruby
decimal_converter = HoneyFormat.converter_registry[:decimal]
decimal_converter.call('1.1') # => 1.1
```Default converter names
```ruby
HoneyFormat.config.default_converters.keys
```See [`Converters::DEFAULT`](https://github.com/buren/honey_format/blob/master/lib/honey_format/converters.rb) for a complete list of the default converter names.
__Row builder__
> Pass your own row builder if you want more control of the entire row or if you want to return your own row object.
Custom row builder
```ruby
csv_string = "Id,Username\n1,buren"
upcaser = ->(row) { row.tap { |r| r.username.upcase! } }
csv = HoneyFormat::CSV.new(csv_string, row_builder: upcaser)
csv.rows # => [#]
```As long as the row builder responds to `#call` you can pass anything you like
```ruby
class Anonymizer
def call(row)
@cache ||= {}
# Return an object you want to represent the row
row.tap do |r|
# given the same value make sure to return the same anonymized value every time
@cache[r.email] ||= "#{SecureRandom.hex(6)}@example.com"
r.email = @cache[r.email]
r.payment_id = ''
end
end
endcsv_string = <<~CSV
Email,Payment ID
[email protected],123
[email protected],998
CSV
csv = HoneyFormat::CSV.new(csv_string, row_builder: Anonymizer.new)
csv.rows.to_csv(columns: [:email])
# => [email protected]
# [email protected]
# [email protected]
```__Output CSV__
> Makes it super easy to output a subset of columns/rows.
Manipulate the rows before output
```ruby
csv_string = "Id,Username\n1,buren"
csv = HoneyFormat::CSV.new(csv_string)
csv.rows.each { |row| row.id = nil }
csv.to_csv # => "id,username\n,buren\n"
```Output a subset of columns
```ruby
csv_string = "Id, Username, Country\n1,buren,Sweden"
csv = HoneyFormat::CSV.new(csv_string)
csv.to_csv(columns: [:id, :country]) # => "id,country\nburen,Sweden\n"
```Output a subset of rows
```ruby
csv_string = "Name, Country\nburen,Sweden\njacob,Denmark"
csv = HoneyFormat::CSV.new(csv_string)
csv.to_csv { |row| row.country == 'Sweden' } # => "name,country\nburen,Sweden\n"
```__Headers__
> By default generates method-like names for each header column, but also gives you full control: define them or convert them.
By default assumes a header in the CSV file.
```ruby
csv_string = "Id,Username\n1,buren"
csv = HoneyFormat::CSV.new(csv_string)# Header
header = csv.header
header.original # => ["Id", "Username"]
header.columns # => [:id, :username]
```Define header
```ruby
csv_string = "1,buren"
csv = HoneyFormat::CSV.new(csv_string, header: ['Id', 'Username'])
csv.rows.first.username # => "buren"
```Set default header converter
```ruby
HoneyFormat.configure do |config|
config.header_converter = proc { |v| v.downcase }
end# you can get the default one with
header_converter = HoneyFormat.converter_registry[:header_column]
header_converter.call('First name') # => "first_name"
```Use any converter registry as the header converter
```ruby
csv_string = "Id,Username\n1,buren"
csv = HoneyFormat::CSV.new(csv_string, header_converter: :upcase)
csv.columns # => [:ID, :USERNAME]
```Pass your own header converter
```ruby
# unmapped keys use the default header converter,
# mix simple key => value mapping with key => proc
converter = {
'First^Name' => :first_name,
'Username' => -> { :handle }
}csv_string = "ID,Username,First^Name\n1,buren,Jacob"
user = HoneyFormat::CSV.new(csv_string, header_converter: converter).rows.first
user.first_name # => "Jacob"
user.handle # => "buren"
user.id # => "1"# you can also pass a proc or any callable object
converter = Class.new do
define_singleton_method(:call) { |value, index| "#{value}#{index}" }
end
# or
converter = ->(value, index) { "#{value}#{index}" }
user = HoneyFormat::CSV.new(csv_string, header_converter: converter)
```Missing header values are automatically set and deduplicated
```ruby
csv_string = "first,,third,third\nval0,val1,val2,val3"
csv = HoneyFormat::CSV.new(csv_string)
user = csv.rows.first
user.column1 # => "val1"
user.third # => "val2"
user.third1 # => "val3"
```Duplicated header values
```ruby
csv_string = <<~CSV
email,email,name
[email protected],[email protected],John
CSV
# :deduplicate is the default value
csv = HoneyFormat::CSV.new(csv_string, header_deduplicator: :deduplicate)
user = csv.rows.first
user.email # => [email protected]
user.email1 # => [email protected]# you can also choose to raise an error instead
HoneyFormat::CSV.new(csv_string, header_deduplicator: :raise)
# => HoneyFormat::DuplicateHeaderColumnError
```If your header contains special chars and/or chars that can't be part of Ruby method names,
things can get a little awkward..
```ruby
csv_string = "ÅÄÖ\nSwedish characters"
user = HoneyFormat::CSV.new(csv_string).rows.first
# Note that these chars aren't "downcased" in Ruby 2.3 and older versions of Ruby,
# "ÅÄÖ".downcase # => "ÅÄÖ"
user.ÅÄÖ # => "Swedish characters"
# while on Ruby > 2.3
user.åäöcsv_string = "First^Name\nJacob"
user = HoneyFormat::CSV.new(csv_string).rows.first
user.public_send(:"first^name") # => "Jacob"
# or
user['first^name'] # => "Jacob"
```Emoji characters
```ruby
csv_string = "😎⛷\nEmoji characters"
csv = HoneyFormat::CSV.new(csv_string)
csv.rows.first.😎⛷ # => Emoji characters
```__Errors__
> When you need to be extra safe.
If you want to there are some errors you can rescue
```ruby
begin
HoneyFormat::CSV.new(csv_string)
rescue HoneyFormat::HeaderError => e
puts 'there was a problem with the header'
raise(e)
rescue HoneyFormat::RowError => e
puts 'there was a problem with a row'
raise(e)
end
```You can see all [available errors here](https://www.rubydoc.info/gems/honey_format/HoneyFormat/Errors).
__Skip lines__
> Skip comments and/or other unwanted lines from being parsed.
```ruby
csv_string = <<~CSV
Id,Username
1,buren
# comment
2,jacob
CSV
regexp = %r{\A#} # Match all lines that start with "#"
csv = HoneyFormat::CSV.new(csv_string, skip_lines: regexp)
csv.rows.length # => 2
```__Matrix__
> Use whats under the hood.
Actually `HoneyFormat::CSV` is a very thin wrapper around `HoneyFormat::Matrix`.
You can use `Matrix` directly it support all options that aren't specifically tied to parsing a CSV.Example
```ruby
data = [
%w[name id],
%w[jacob 1]
]
type_map = {
id: :integer,
name: :upcase
}matrix = HoneyFormat::Matrix.new(data, type_map: { id: :integer, name: :upcase })
matrix.columns # => [:name, :id]
matrix.rows.to_a # => [#]
matrix.to_csv # => "name,id\nJACOB,1\n"
```If you want to see more usage examples check out the [`examples/`](https://github.com/buren/honey_format/tree/master/examples) and [`spec/`](https://github.com/buren/honey_format/tree/master/spec) directories and of course [on RubyDoc](https://www.rubydoc.info/gems/honey_format/).
__SQL example__
When you want the result as an object, with certain columns converted to objects.
```ruby
require 'mysql2'class DBClient
def initialize(host:, username:, password:, port: 3306)
@client = Mysql2::Client.new(
host: host,
username: username,
password: password,
port: port
)
enddef query(sql, type_map: {})
result = @client.query(sql)
return if result.first.nil?matrix = HoneyFormat::Matrix.new(
result.map(&:values),
header: result.first.keys,
type_map: type_map
)
matrix.rows
end
end
```Usage example with a fictional "users" database table (schema: `name`, `created_at`)
```ruby
client = DbClient.new(host: '127.0.0.1', username: 'root', password: nil)
users = client.query(
'SELECT * FROM users',
type_map: { created_at: :datetime! }
)
user = users.first
user.name # => buren
user.created_at.class # => Time
```## Configuration
Configuration is optional
```ruby
HoneyFormat.configure do |config|
config.header_converter = proc { |column| column.downcase }
config.delimiter = ";"
config.row_delimiter = "|"
config.quote_character = "'"
config.skip_lines = %r{\A#} # Match all lines that start with "#"
end
```Default configuration values
```ruby
HoneyFormat.configure do |config|
config.header_converter = HoneyFormat::Registry.new(Converters::DEFAULT)[:header_column]
config.delimiter = ","
config.row_delimiter = :auto
config.quote_character = "\""
config.skip_lines = nil
end
```## CLI
> Perfect when you want to get something simple done quickly.
```
Usage: honey_format [options]
--csv=input.csv CSV file
--columns=id,name Select columns
--output=output.csv CSV output (STDOUT otherwise)
--delimiter=, CSV delimiter (default: ,)
--skip-lines=, Skip lines that match this pattern
--[no-]header-only Print only the header
--[no-]rows-only Print only the rows
-h, --help How to use
--version Show version
```Output a subset of columns to a new file
```
# input.csv
id,name,username
1,jacob,buren
``````
$ honey_format input.csv --columns=id,username > output.csv
```## Benchmark
_Note_: This gem, adds some overhead to parsing a CSV string, typically ~5-10%. I've included some benchmarks below, your mileage may vary.. The benchmarks have been run with Ruby 2.5.
204KB (1k lines)
```
CSV no options: 51.0 i/s
CSV with header: 36.1 i/s - 1.41x slower
HoneyFormat::CSV: 48.7 i/s - 1.05x slower
```2MB (10k lines)
```
CSV no options: 5.1 i/s
CSV with header: 3.6 i/s - 1.42x slower
HoneyFormat::CSV: 4.9 i/s - 1.05x slower
```You can run the benchmarks yourself
```
Usage: bin/benchmark [file.csv] [options]
--csv=[file1.csv] CSV file(s)
--[no-]verbose Verbose output
--lines-multipliers=[1,2,10] Multiply the rows in the CSV file (default: 1)
--time=[30] Benchmark time (default: 30)
--warmup=[5] Benchmark warmup (default: 5)
-h, --help How to use
```## Development
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/buren/honey_format. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](contributor-covenant.org) code of conduct.
## License
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).