Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/yykamei/strong_csv

Type checker for a CSV file.
https://github.com/yykamei/strong_csv
csv hacktoberfest ruby
Last synced: 4 months ago
JSON representation
Type checker for a CSV file.
Host: GitHub
URL: https://github.com/yykamei/strong_csv
Owner: yykamei
License: mit
Created: 2022-04-20T14:00:07.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-08-19T01:45:33.000Z (6 months ago)
Last Synced: 2024-10-10T08:35:58.310Z (4 months ago)
Topics: csv, hacktoberfest, ruby
Language: Ruby
Homepage:
Size: 177 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project

README

        # strong_csv



Type checker for a CSV file inspired by [strong_json](https://github.com/soutaro/strong_json).

## Motivation

Some applications have a feature to receive a CSV file uploaded by a user,

and in general, it needs to validate each cell of the CSV file.

How should applications validate them?

Of course, it depends, but there would be common validation logic for CSV files.

For example, some columns may have to be integers because of database requirements.

It would be cumbersome to write such validations always.

strong_csv helps you to mitigate such a drudgery by letting you declare desired types beforehand.

## Installation

Add this line to your application's Gemfile:

```ruby

gem "strong_csv"

```

And then execute:

```console

bundle

```

Or install it yourself as:

```console

gem install strong_csv

```

## Usage

The most important APIs of strong_csv are `StrongCSV.new` and `StrongCSV#parse`.

`StrongCSV.new` lets you declare types for each CSV column with Ruby's block syntax.

Inside the block, you will mainly use `let` and declare types for a column.

After defining types, you can parse CSV content with `StrongCSV#parse`.

`StrongCSV#parse` won't raise errors as possible and just store error messages in its rows.

The reason why it won't raise errors is CSV content may contain _invalid_ rows,

but sometimes, it makes sense to ignore them and process something for _valid_ rows.

If you want to stop all processes with invalid rows,

check whether all rows are valid before proceeding with computation.

Here is an example usage of this gem:

```ruby

require "strong_csv"

strong_csv = StrongCSV.new do

  let :stock, integer

  let :tax_rate, float

  let :name, string(within: 1..255)

  let :description, string?(within: 1..1000)

  let :active, boolean

  let :started_at, time?(format: "%Y-%m-%dT%H:%M:%S")

  let :price, integer, error_message: "This should be Integer"

  # Literal declaration

  let :status, 0..6

  let :priority, 10, 20, 30, 40, 50

  let :size, "S", "M", "L" do |value| # The input must be one of "S", "M", or "L", and it will be casted as the returned value of the block.

    case value

    when "S"

      1

    when "M"

      2

    when "L"

      3

    end

  end

  # Regular expressions

  let :url, %r{\Ahttps://}

  # Custom validation

  #

  # This example sees the database to fetch exactly stored `User` IDs,

  # and it checks the `:user_id` cell really exists in the `users` table.

  # `pick` would be useful to avoid N+1 problems.

  pick :user_id, as: :user_ids do |ids|

    User.where(id: ids).ids

  end

  let :user_id, integer(constraint: ->(i) { user_ids.include?(i) })

end

data = <<~CSV

  stock,tax_rate,name,active,status,priority,size,url,price

  12,0.8,special item,True,4,20,M,https://example.com,PRICE

CSV

strong_csv.parse(data, field_size_limit: 2048) do |row|

  if row.valid?

    row[:tax_rate] # => 0.8

    row[:active] # => true

    # do something with row

  else

    row.errors # => {:price=>["This should be Integer"], :user_id=>["`nil` can't be casted to Integer"]}

    # do something with row.errors

  end

end

```

You can also define types without CSV headers by specifying column numbers.

> [!NOTE]

> The numbers must start from `0` (zero-based index).

```ruby

StrongCSV.new do

  let 0, integer

  let 1, string

  let 2, 1..10

end

```

This declaration expects a CSV has the contents like this:

```csv

123,abc,3

830,mno,10

```

## Available types

    

        Type

        Description

    

    

        integer and integer?

        The value must be casted to Integer.

    

    

        float and float?

        The value must be casted to Float.

    

    

        boolean and boolean?

        The value must be casted to Boolean (true or false).

    

    

        string and string?

        The value must be casted to String.

    

    

        time and time?

        The value must be casted to Time.

    

    

        optional

        The value can be nil. If the value exists, it must satisfy the given type constraint.

    

    

        23 (Integer literal)

        The value must be casted to the specific Integer literal.

    

    

        15.12 (Float literal)

        The value must be casted to the specific Float literal.

    

    

        1..10 (Range literal)

        The value must be casted to the beginning of Range and be covered with it.

    

    

        "abc" (String literal)

        The value must be casted to the specific String literal.

    

    

        %r{\Ahttps://} (Regexp literal)

        The value must be casted to a String that matches the specified Regexp.

    

    

        , (Union type)

        The value must satisfy one of the subtypes.

    

### `integer` and `integer?`

The value must be casted to Integer. `integer?` allows the value to be `nil`, so you can declare optional integer type

for columns. It also lets you allow values that satisfy the specified limitation through `:constraint`.

_Example_

```ruby

strong_csv = StrongCSV.new do

  let :stock, integer

  let :state, integer?

  let :user_id, integer(constraint: ->(v) { user_ids.include?(v)})

  pick :user_id, as: :user_ids do |values|

    User.where(id: values).ids

  end

end

result = strong_csv.parse(<<~CSV)

  stock,state,user_id

  12,0,1

  20,,2

  non-integer,1,4

CSV

result.map(&:valid?) # => [true, true, false]

result[0].slice(:stock, :state, :user_id) # => {:stock=>12, :state=>0, :user_id=>1}

result[1].slice(:stock, :state, :user_id) # => {:stock=>20, :state=>nil, :user_id=>2}

result[2].slice(:stock, :state, :user_id) # => {:stock=>"non-integer", :state=>1, :user_id=>"4"}

result[2].errors.slice(:stock, :user_id) # => {:stock=>["`\"non-integer\"` can't be casted to Integer"], :user_id=>["`\"4\"` does not satisfy the specified constraint"]}

```

### `float` and `float?`

The value must be casted to Float. `float?` allows the value to be `nil`, so you can declare optional float type for

columns. It also lets you allow values that satisfy the specified limitation through `:constraint`.

_Example_

```ruby

strong_csv = StrongCSV.new do

  let :tax_rate, float

  let :fail_rate, float?

end

result = strong_csv.parse(<<~CSV)

  tax_rate,fail_rate

  0.02,0.1

  0.05,

  ,0.8

CSV

result.map(&:valid?) # => [true, true, false]

result[0].slice(:tax_rate, :fail_rate) # => {:tax_rate=>0.02, :fail_rate=>0.1}

result[1].slice(:tax_rate, :fail_rate) # => {:tax_rate=>0.05, :fail_rate=>nil}

result[2].slice(:tax_rate, :fail_rate) # => {:tax_rate=>nil, :fail_rate=>0.8} (`nil` is not allowed for `tax_rate`)

```

### `boolean` and `boolean?`

The value must be casted to Boolean (`true` of `false`).

`"true"`, `"True"`, and `"TRUE"` are casted to `true`,

while `"false"`, `"False"`, and `"FALSE"` are casted to `false`.

`boolean?` allows the value to be `nil` as an optional boolean

value.

_Example_

```ruby

strong_csv = StrongCSV.new do

  let :enabled, boolean

  let :active, boolean?

end

result = strong_csv.parse(<<~CSV)

  enabled,active

  True,True

  False,

  ,

CSV

result.map(&:valid?) # => [true, true, false]

result[0].slice(:enabled, :active) # => {:enabled=>true, :active=>true}

result[1].slice(:enabled, :active) # => {:enabled=>false, :active=>nil}

result[2].slice(:enabled, :active) # => {:enabled=>nil, :active=>nil} (`nil` is not allowed for `enabled`)

```

### `string` and `string?`

The value must be casted to String. `string?` allows the value to be `nil` as an optional string value.

They also support `:within` in its arguments, and it limits the length of the string value within the specified `Range`.

_Example_

```ruby

strong_csv = StrongCSV.new do

  let :name, string(within: 1..4)

  let :description, string?

end

result = strong_csv.parse(<<~CSV)

  name,description

  JB,Hello

  yykamei,

  ,🤷

CSV

result.map(&:valid?) # => [true, false, false]

result[0].slice(:name, :description) # => {:name=>"JB", :description=>"Hello"}

result[1].slice(:name, :description) # => {:name=>"yykamei", :description=>nil} ("yykamei" exceeds the `Range` specified with `:within`)

result[2].slice(:name, :description) # => {:name=>nil, :description=>"🤷"} (`nil` is not allowed for `name`)

```

### `time` and `time?`

The value must be casted to Time. `time?` allows the value to be `nil` as an optional time value.

They have the `:format` argument, which is used as the format

of [`Time.strptime`](https://rubydoc.info/stdlib/time/Time.strptime);

it means you can ensure the value must satisfy the time format. The default value of `:format` is `"%Y-%m-%d"`.

_Example_

```ruby

strong_csv = StrongCSV.new do

  let :start_on, time

  let :updated_at, time?(format: "%FT%T")

end

result = strong_csv.parse(<<~CSV)

  start_on,updated_at

  2022-04-01,2022-04-30T15:30:59

  2022-05-03

  05-03,2021-09-03T09:48:23

CSV

result.map(&:valid?) # => [true, true, false]

result[0].slice(:start_on, :updated_at) # => {:start_on=>2022-04-01 00:00:00 +0900, :updated_at=>2022-04-30 15:30:59 +0900}

result[1].slice(:start_on, :updated_at) # => {:start_on=>2022-05-03 00:00:00 +0900, :updated_at=>nil}

result[2].slice(:start_on, :updated_at) # => {:start_on=>"05-03", :updated_at=>2021-09-03 09:48:23 +0900} ("05-03" does not satisfy the default format `"%Y-%m-%d"`)

```

### `optional`

While each type above has its optional type with `?`, literals cannot be suffixed with `?`.

However, there would be a case to have an optional literal type.

In this case, `optional` might be useful and lets you declare such types.

_Example_

```ruby

strong_csv = StrongCSV.new do

  let :foo, optional(123)

  let :bar, optional("test")

end

result = strong_csv.parse(<<~CSV)

  foo,bar

  123,test

  ,

  124

CSV

result.map(&:valid?) # => [true, true, false]

result[0].slice(:foo, :bar) # => {:foo=>123, :bar=>"test"}

result[1].slice(:foo, :bar) # => {:foo=>nil, :bar=>nil}

result[2].slice(:foo, :bar) # => {:foo=>"124", :bar=>nil} (124 is not equal to 123)

```

### Literal

You can declare literal value as types. The supported literals are `Integer`, `Float`, `String`, and `Range`.

_Example_

```ruby

strong_csv = StrongCSV.new do

  let 0, 123

  let 1, "test"

  let 2, 2.5

  let 3, 1..10

  let 4, /[a-z]+/

end

result = strong_csv.parse(<<~CSV)

  123,test,2.5,9,abc

  123,test,2.5,0,xyz

  123,Hey,2.5,10,!

CSV

result.map(&:valid?) # => [true, false, false]

result[0].slice(0, 1, 2, 3, 4) # => {0=>123, 1=>"test", 2=>2.5, 3=>9, 4=>"abc"}

result[1].slice(0, 1, 2, 3, 4) # => {0=>123, 1=>"test", 2=>2.5, 3=>"0", 4=>"xyz"} (0 is out of 1..10)

result[2].slice(0, 1, 2, 3, 4) # => {0=>123, 1=>"Hey", 2=>2.5, 3=>10, 4=>"!"} ("Hey" is not equal to "test", and "!" does not match /[a-z]+/)

```

### Union

There would be a case that it's alright if a value satisfies one of the types.

Union types are useful for such a case.

_Example_

```ruby

strong_csv = StrongCSV.new do

  let :priority, 10, 20, 30

  let :size, "S", "M", "L"

end

result = strong_csv.parse(<<~CSV)

  priority,size

  10,M

  30,A

  11,S

CSV

result.map(&:valid?) # => [true, false, false]

result[0].slice(:priority, :size) # => {:priority=>10, :size=>"M"}

result[1].slice(:priority, :size) # => {:priority=>30, :size=>"A"} ("A" is not one of "S", "M", and "L")

result[2].slice(:priority, :size) # => {:priority=>"11", :size=>"S"} (11 is not one of 10, 20, and 30)

```

## Contributing

Bug reports and pull requests are welcome on the [GitHub repository](https://github.com/yykamei/strong_csv).

This project is intended to be a safe, welcoming space for collaboration,

and contributors are expected to adhere to the

[code of conduct](https://github.com/yykamei/strong_csv/blob/main/CODE_OF_CONDUCT.md).