Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/erpuno/ecsv

💠 ECSV: Потоковий CSV парсер
https://github.com/erpuno/ecsv

Last synced: 2 months ago
JSON representation

💠 ECSV: Потоковий CSV парсер

Awesome Lists containing this project

README

        

# Erlang NIF CSV parser and writer #

[![Hex pm](http://img.shields.io/hexpm/v/ecsv.svg?style=flat&x=1)](https://hex.pm/packages/ecsv)

Copyright (c) 2016 Altworx

__Version:__ 0.2.0

__Authors:__ Hynek Vychodil ([`[email protected]`](mailto:[email protected])).

__See also:__ [ecsv](http://github.com/altworx/ecsv/blob/master/doc/ecsv.md).

`ecsv` is fast NIF parser and writer based on [`libcsv`](https://sourceforge.net/projects/libcsv/)

The main purpose of the module is the fast parsing of CSV data in GB volumes.
This requirement leads to the necessity of stream oriented API (see [`ecsv:parse_stream/4`](http://github.com/altworx/ecsv/blob/master/doc/ecsv.md#parse_stream-4)).

### Building ###

Application requires development files for [`libcsv`](https://sourceforge.net/projects/libcsv/) version 3.0.x
(tested with 3.0.3). For example in debian you need run

```
apt-get install libcsv3 libcsv-dev
```

For building you need installed [`rebar3`](https://www.rebar3.org/).

```
rebar3 compile
```

And run tests using

```
rebar3 eunit
```

### Running ###

For interactive Erlang shell use

```
rebar3 shell
```

Try as starter

```
1> ecsv:parse(<<"Hello,World">>).
[{<<"Hello">>,<<"World">>}]
```

See [`ecsv`](http://github.com/altworx/ecsv/blob/master/doc/ecsv.md) for API description.

### Known performance caveats ###

[`ecsv:write/1`](http://github.com/altworx/ecsv/blob/master/doc/ecsv.md#write-1) and [`ecsv:write_lines/1`](http://github.com/altworx/ecsv/blob/master/doc/ecsv.md#write_lines-1) doesn't perform as
expected. For an unknown reason, it is slower than pure Erlang implementation
when compiled using HiPE and used in a real application. On the other hand,
[`ecsv:parse_stream/5`](http://github.com/altworx/ecsv/blob/master/doc/ecsv.md#parse_stream-5) met expectations and performs around 100MB/s on
commodity HW (i7 2.6GHz).

Current implementation uses `enif_make_new_binary()` for parsed fields. From
our experience, this call allocates small binaries on the process heap in
contrast to `enif_alloc_binary()` always allocates on the binary heap which
is slightly slower. Fields from currently parsed line is kept between NIF
calls in an own environment which could lead to bad behavior when [`parse_raw/3`](http://github.com/altworx/ecsv/blob/master/doc/README.md#parse_raw-3) is called with very short binaries and there is a long row with
many fields. In an extreme case, one long line with short or in worst case
empty fields will lead to quadratic behavior if fed one byte a time. If you
would like parse CSV file with more than 20kB rows with thousands of fields
you should probably use another parser or fix the issue.

## Modules ##

ecsv