Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/intercom/hippie_csv
:v: Tolerant, liberal CSV parsing
https://github.com/intercom/hippie_csv
Last synced: about 1 month ago
JSON representation
:v: Tolerant, liberal CSV parsing
- Host: GitHub
- URL: https://github.com/intercom/hippie_csv
- Owner: intercom
- License: apache-2.0
- Created: 2015-06-11T01:15:22.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2023-08-01T02:22:40.000Z (over 1 year ago)
- Last Synced: 2024-08-07T08:13:17.763Z (5 months ago)
- Language: Ruby
- Homepage:
- Size: 436 KB
- Stars: 57
- Watchers: 178
- Forks: 7
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# HippieCSV
[ ![Codeship Status for intercom/hippie_csv](https://codeship.com/projects/f3b188e0-f312-0132-75cb-5ed004d44c71/status?branch=master)](https://codeship.com/projects/85324)
Ruby's `CSV` is great. It complies with the [proposed CSV spec](https://www.ietf.org/rfc/rfc4180.txt)
pretty well. If you pass its methods bad or non-compliant CSVs, it’ll rightfully
and loudly complain. It’s great 👍Except…if you want to be able to deal with files from the real world. At
[Intercom](https://intercom.io), we’ve seen lots of problematic CSVs from
customers importing data to our system. You may want to support such cases.
You may not always know the delimiter, nor the chosen quote character, in
advance.HippieCSV is a ridiculously tolerant and liberal parser which aims to yield as
much usable data as possible out of such real-world CSVs.## Installation
Add this line to your application's Gemfile:
gem 'hippie_csv'
And then execute:
$ bundle
Or install it yourself as:
$ gem install hippie_csv
## Usage
Exposes three public methods:
1. `.read` a file path to an array. Reads from the file all at once, building the whole CSV object in memory.
2. `.parse` an in-memory string to an array.
3. `.stream` from a file path and parse line by line, calling a given block on each row.**Note**: Processing large files using read or parse is a memory intensive operation. Use stream for parsing a CSV file line by line from the file to save memory. This method will use less memory but take longer, as we run each line through parse.
```ruby
require 'hippie_csv'HippieCSV.read("path/to/data.csv")
HippieCSV.stream("path/to/data.csv") do |row|
# use row here...
endHippieCSV.parse(csv_string)
```## Features
- Deduces the delimiter (supports `,`, `;`, and `\t`)
- Deduces the quote character (supports `'`, `"`, and `|`)
- Forgives backslash escaped quotes in quoted CSVs
- Forgives invalid newlines in quoted CSVs
- Heals many encoding issues (and aggressively forces UTF-8)
- Deals with many miscellaneous malformed types of CSVs
- Works when a [byte order mark](https://en.wikipedia.org/wiki/Byte_order_mark) is present## Contributing
1. Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
2. Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
3. Fork the project.
4. Start a feature/bugfix branch.
5. Commit and push until you are happy with your contribution.
6. Make sure to add tests for it. This is important so we don't break it in a future version unintentionally.
7. Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so we can cherry-pick around it.