Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yykamei/strong_csv
Type checker for a CSV file.
https://github.com/yykamei/strong_csv
csv hacktoberfest ruby
Last synced: 4 months ago
JSON representation
Type checker for a CSV file.
- Host: GitHub
- URL: https://github.com/yykamei/strong_csv
- Owner: yykamei
- License: mit
- Created: 2022-04-20T14:00:07.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-08-19T01:45:33.000Z (6 months ago)
- Last Synced: 2024-10-10T08:35:58.310Z (4 months ago)
- Topics: csv, hacktoberfest, ruby
- Language: Ruby
- Homepage:
- Size: 177 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# strong_csv
Type checker for a CSV file inspired by [strong_json](https://github.com/soutaro/strong_json).
## Motivation
Some applications have a feature to receive a CSV file uploaded by a user,
and in general, it needs to validate each cell of the CSV file.How should applications validate them?
Of course, it depends, but there would be common validation logic for CSV files.
For example, some columns may have to be integers because of database requirements.
It would be cumbersome to write such validations always.strong_csv helps you to mitigate such a drudgery by letting you declare desired types beforehand.
## Installation
Add this line to your application's Gemfile:
```ruby
gem "strong_csv"
```And then execute:
```console
bundle
```Or install it yourself as:
```console
gem install strong_csv
```## Usage
The most important APIs of strong_csv are `StrongCSV.new` and `StrongCSV#parse`.
`StrongCSV.new` lets you declare types for each CSV column with Ruby's block syntax.
Inside the block, you will mainly use `let` and declare types for a column.After defining types, you can parse CSV content with `StrongCSV#parse`.
`StrongCSV#parse` won't raise errors as possible and just store error messages in its rows.
The reason why it won't raise errors is CSV content may contain _invalid_ rows,
but sometimes, it makes sense to ignore them and process something for _valid_ rows.
If you want to stop all processes with invalid rows,
check whether all rows are valid before proceeding with computation.Here is an example usage of this gem:
```ruby
require "strong_csv"strong_csv = StrongCSV.new do
let :stock, integer
let :tax_rate, float
let :name, string(within: 1..255)
let :description, string?(within: 1..1000)
let :active, boolean
let :started_at, time?(format: "%Y-%m-%dT%H:%M:%S")
let :price, integer, error_message: "This should be Integer"# Literal declaration
let :status, 0..6
let :priority, 10, 20, 30, 40, 50
let :size, "S", "M", "L" do |value| # The input must be one of "S", "M", or "L", and it will be casted as the returned value of the block.
case value
when "S"
1
when "M"
2
when "L"
3
end
end# Regular expressions
let :url, %r{\Ahttps://}# Custom validation
#
# This example sees the database to fetch exactly stored `User` IDs,
# and it checks the `:user_id` cell really exists in the `users` table.
# `pick` would be useful to avoid N+1 problems.
pick :user_id, as: :user_ids do |ids|
User.where(id: ids).ids
end
let :user_id, integer(constraint: ->(i) { user_ids.include?(i) })
enddata = <<~CSV
stock,tax_rate,name,active,status,priority,size,url,price
12,0.8,special item,True,4,20,M,https://example.com,PRICE
CSVstrong_csv.parse(data, field_size_limit: 2048) do |row|
if row.valid?
row[:tax_rate] # => 0.8
row[:active] # => true
# do something with row
else
row.errors # => {:price=>["This should be Integer"], :user_id=>["`nil` can't be casted to Integer"]}
# do something with row.errors
end
end
```You can also define types without CSV headers by specifying column numbers.
> [!NOTE]
> The numbers must start from `0` (zero-based index).```ruby
StrongCSV.new do
let 0, integer
let 1, string
let 2, 1..10
end
```This declaration expects a CSV has the contents like this:
```csv
123,abc,3
830,mno,10
```## Available types
Type
Description
integer
andinteger?
The value must be casted toInteger
.
float
andfloat?
The value must be casted toFloat
.
boolean
andboolean?
The value must be casted to Boolean (true
orfalse
).
string
andstring?
The value must be casted toString
.
time
andtime?
The value must be casted toTime
.
optional
The value can benil
. If the value exists, it must satisfy the given type constraint.
23
(Integer literal)
The value must be casted to the specificInteger
literal.
15.12
(Float literal)
The value must be casted to the specificFloat
literal.
1..10
(Range literal)
The value must be casted to the beginning ofRange
and be covered with it.
"abc"
(String literal)
The value must be casted to the specificString
literal.
%r{\Ahttps://}
(Regexp literal)
The value must be casted to aString
that matches the specified Regexp.
,
(Union type)
The value must satisfy one of the subtypes.
### `integer` and `integer?`
The value must be casted to Integer. `integer?` allows the value to be `nil`, so you can declare optional integer type
for columns. It also lets you allow values that satisfy the specified limitation through `:constraint`._Example_
```ruby
strong_csv = StrongCSV.new do
let :stock, integer
let :state, integer?
let :user_id, integer(constraint: ->(v) { user_ids.include?(v)})
pick :user_id, as: :user_ids do |values|
User.where(id: values).ids
end
endresult = strong_csv.parse(<<~CSV)
stock,state,user_id
12,0,1
20,,2
non-integer,1,4
CSVresult.map(&:valid?) # => [true, true, false]
result[0].slice(:stock, :state, :user_id) # => {:stock=>12, :state=>0, :user_id=>1}
result[1].slice(:stock, :state, :user_id) # => {:stock=>20, :state=>nil, :user_id=>2}
result[2].slice(:stock, :state, :user_id) # => {:stock=>"non-integer", :state=>1, :user_id=>"4"}
result[2].errors.slice(:stock, :user_id) # => {:stock=>["`\"non-integer\"` can't be casted to Integer"], :user_id=>["`\"4\"` does not satisfy the specified constraint"]}
```### `float` and `float?`
The value must be casted to Float. `float?` allows the value to be `nil`, so you can declare optional float type for
columns. It also lets you allow values that satisfy the specified limitation through `:constraint`._Example_
```ruby
strong_csv = StrongCSV.new do
let :tax_rate, float
let :fail_rate, float?
endresult = strong_csv.parse(<<~CSV)
tax_rate,fail_rate
0.02,0.1
0.05,
,0.8
CSVresult.map(&:valid?) # => [true, true, false]
result[0].slice(:tax_rate, :fail_rate) # => {:tax_rate=>0.02, :fail_rate=>0.1}
result[1].slice(:tax_rate, :fail_rate) # => {:tax_rate=>0.05, :fail_rate=>nil}
result[2].slice(:tax_rate, :fail_rate) # => {:tax_rate=>nil, :fail_rate=>0.8} (`nil` is not allowed for `tax_rate`)
```### `boolean` and `boolean?`
The value must be casted to Boolean (`true` of `false`).
`"true"`, `"True"`, and `"TRUE"` are casted to `true`,
while `"false"`, `"False"`, and `"FALSE"` are casted to `false`.
`boolean?` allows the value to be `nil` as an optional boolean
value._Example_
```ruby
strong_csv = StrongCSV.new do
let :enabled, boolean
let :active, boolean?
endresult = strong_csv.parse(<<~CSV)
enabled,active
True,True
False,
,
CSVresult.map(&:valid?) # => [true, true, false]
result[0].slice(:enabled, :active) # => {:enabled=>true, :active=>true}
result[1].slice(:enabled, :active) # => {:enabled=>false, :active=>nil}
result[2].slice(:enabled, :active) # => {:enabled=>nil, :active=>nil} (`nil` is not allowed for `enabled`)
```### `string` and `string?`
The value must be casted to String. `string?` allows the value to be `nil` as an optional string value.
They also support `:within` in its arguments, and it limits the length of the string value within the specified `Range`._Example_
```ruby
strong_csv = StrongCSV.new do
let :name, string(within: 1..4)
let :description, string?
endresult = strong_csv.parse(<<~CSV)
name,description
JB,Hello
yykamei,
,🤷
CSVresult.map(&:valid?) # => [true, false, false]
result[0].slice(:name, :description) # => {:name=>"JB", :description=>"Hello"}
result[1].slice(:name, :description) # => {:name=>"yykamei", :description=>nil} ("yykamei" exceeds the `Range` specified with `:within`)
result[2].slice(:name, :description) # => {:name=>nil, :description=>"🤷"} (`nil` is not allowed for `name`)
```### `time` and `time?`
The value must be casted to Time. `time?` allows the value to be `nil` as an optional time value.
They have the `:format` argument, which is used as the format
of [`Time.strptime`](https://rubydoc.info/stdlib/time/Time.strptime);
it means you can ensure the value must satisfy the time format. The default value of `:format` is `"%Y-%m-%d"`._Example_
```ruby
strong_csv = StrongCSV.new do
let :start_on, time
let :updated_at, time?(format: "%FT%T")
endresult = strong_csv.parse(<<~CSV)
start_on,updated_at
2022-04-01,2022-04-30T15:30:59
2022-05-03
05-03,2021-09-03T09:48:23
CSVresult.map(&:valid?) # => [true, true, false]
result[0].slice(:start_on, :updated_at) # => {:start_on=>2022-04-01 00:00:00 +0900, :updated_at=>2022-04-30 15:30:59 +0900}
result[1].slice(:start_on, :updated_at) # => {:start_on=>2022-05-03 00:00:00 +0900, :updated_at=>nil}
result[2].slice(:start_on, :updated_at) # => {:start_on=>"05-03", :updated_at=>2021-09-03 09:48:23 +0900} ("05-03" does not satisfy the default format `"%Y-%m-%d"`)
```### `optional`
While each type above has its optional type with `?`, literals cannot be suffixed with `?`.
However, there would be a case to have an optional literal type.
In this case, `optional` might be useful and lets you declare such types._Example_
```ruby
strong_csv = StrongCSV.new do
let :foo, optional(123)
let :bar, optional("test")
endresult = strong_csv.parse(<<~CSV)
foo,bar
123,test
,
124
CSVresult.map(&:valid?) # => [true, true, false]
result[0].slice(:foo, :bar) # => {:foo=>123, :bar=>"test"}
result[1].slice(:foo, :bar) # => {:foo=>nil, :bar=>nil}
result[2].slice(:foo, :bar) # => {:foo=>"124", :bar=>nil} (124 is not equal to 123)
```### Literal
You can declare literal value as types. The supported literals are `Integer`, `Float`, `String`, and `Range`.
_Example_
```ruby
strong_csv = StrongCSV.new do
let 0, 123
let 1, "test"
let 2, 2.5
let 3, 1..10
let 4, /[a-z]+/
endresult = strong_csv.parse(<<~CSV)
123,test,2.5,9,abc
123,test,2.5,0,xyz
123,Hey,2.5,10,!
CSVresult.map(&:valid?) # => [true, false, false]
result[0].slice(0, 1, 2, 3, 4) # => {0=>123, 1=>"test", 2=>2.5, 3=>9, 4=>"abc"}
result[1].slice(0, 1, 2, 3, 4) # => {0=>123, 1=>"test", 2=>2.5, 3=>"0", 4=>"xyz"} (0 is out of 1..10)
result[2].slice(0, 1, 2, 3, 4) # => {0=>123, 1=>"Hey", 2=>2.5, 3=>10, 4=>"!"} ("Hey" is not equal to "test", and "!" does not match /[a-z]+/)
```### Union
There would be a case that it's alright if a value satisfies one of the types.
Union types are useful for such a case._Example_
```ruby
strong_csv = StrongCSV.new do
let :priority, 10, 20, 30
let :size, "S", "M", "L"
endresult = strong_csv.parse(<<~CSV)
priority,size
10,M
30,A
11,S
CSVresult.map(&:valid?) # => [true, false, false]
result[0].slice(:priority, :size) # => {:priority=>10, :size=>"M"}
result[1].slice(:priority, :size) # => {:priority=>30, :size=>"A"} ("A" is not one of "S", "M", and "L")
result[2].slice(:priority, :size) # => {:priority=>"11", :size=>"S"} (11 is not one of 10, 20, and 30)
```## Contributing
Bug reports and pull requests are welcome on the [GitHub repository](https://github.com/yykamei/strong_csv).
This project is intended to be a safe, welcoming space for collaboration,
and contributors are expected to adhere to the
[code of conduct](https://github.com/yykamei/strong_csv/blob/main/CODE_OF_CONDUCT.md).