https://github.com/karolsluszniak/process-csv

Simple Elixir vs Ruby project for checking performance of both languages when it comes to processing large text files (as part of command line script or background job process).
https://github.com/karolsluszniak/process-csv

Last synced: 4 days ago
JSON representation

Simple Elixir vs Ruby project for checking performance of both languages when it comes to processing large text files (as part of command line script or background job process).

Host: GitHub
URL: https://github.com/karolsluszniak/process-csv
Owner: karolsluszniak
Created: 2016-06-12T15:14:20.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2016-06-15T18:29:49.000Z (over 9 years ago)
Last Synced: 2025-08-26T17:52:22.829Z (about 1 month ago)
Language: Elixir
Homepage: http://cloudless.pl/articles/12-elixir-vs-ruby-file-i-o-performance
Size: 6.84 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

This is a source code for the article [Elixir vs Ruby: File I/O performance](http://cloudless.pl/articles/12-elixir-vs-ruby-file-i-o-performance) that you can find on the [Phoenix on Rails blog](http://cloudless.pl/articles?series=phoenix-on-rails). It's basically a sample text file processing script implemented in Elixir and Ruby that does the following:

1. Loads the input CSV, line by line.
2. Parses first column which is of format Some text N.
3. Leaves only those lines where N is dividable by 2 or 5.
4. Saves those filtered, but unchanged lines into another CSV.

It does so both in a streaming manner, which is slower but works with all file sizes, and as a faster but less secure and less universal one-shot read.

**Disclaimer:** I'm not after proving that either Elixir or Ruby is "better" at reading files. This is just an exercise to better understand the practical consequences of running simple command-line script via MRI vs running it in a complex Erlang VM environment.

## Generating samples

You can generate sample CSV file of given size, compilant with the algorithm, like this:

```sh
ruby lib/generate.rb sample-500k.csv 500000
```

The syntax is: `ruby lib/generate.rb []` where `` default to 3.

## Running benchmarks

Elixir version:

```sh
MIX_ENV=prod mix escript.build
time ./process_csv sample-500k.csv [read | stream]
```

Ruby version:

```sh
time ruby lib/process_csv.rb sample-500k.csv [read | stream]
```

## Improvements

Please look into the [article](http://cloudless.pl/articles/12-elixir-vs-ruby-file-i-o-performance) to see which optimizations I've tried. Open Pull Request if you've found a better way.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/karolsluszniak/process-csv

Awesome Lists containing this project

README