Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wmfs/smithereens


https://github.com/wmfs/smithereens

package tymly

Last synced: 3 months ago
JSON representation

Awesome Lists containing this project

README

        

# smithereens
[![Tymly Package](https://img.shields.io/badge/tymly-package-blue.svg)](https://tymly.io/)
[![npm (scoped)](https://img.shields.io/npm/v/@wmfs/smithereens.svg)](https://www.npmjs.com/package/@wmfs/smithereens)
[![CircleCI](https://circleci.com/gh/wmfs/smithereens.svg?style=svg)](https://circleci.com/gh/wmfs/smithereens)
[![codecov](https://codecov.io/gh/wmfs/smithereens/branch/master/graph/badge.svg)](https://codecov.io/gh/wmfs/smithereens)
[![CodeFactor](https://www.codefactor.io/repository/github/wmfs/smithereens/badge)](https://www.codefactor.io/repository/github/wmfs/smithereens)
[![Dependabot badge](https://img.shields.io/badge/Dependabot-active-brightgreen.svg)](https://dependabot.com/)
[![Commitizen friendly](https://img.shields.io/badge/commitizen-friendly-brightgreen.svg)](http://commitizen.github.io/cz-cli/)
[![JavaScript Style Guide](https://img.shields.io/badge/code_style-standard-brightgreen.svg)](https://standardjs.com)
[![license](https://img.shields.io/github/license/mashape/apistatus.svg)](https://github.com/wmfs/tymly/blob/master/packages/pg-concat/LICENSE)

> Smash CSV files into more manageable files based on column values

## Install
```bash
$ npm install smithereens --save
```

## Usage

```javascript
const smithereens = require('smithereens')

smithereens(
[
'/some/input/csv/files/people.csv'

// people.csv:
//
// personNo,firstName,LastName,personType,action
// 10,"Lisa","Simpson","c","u"
// 20,"Homer","Simpson","a","u"
// 30,"Bart","Simpson","c","d"
// 40,"Marge","Simpson","a","d"
// 50,"Maggie","Simpson","c","x"
// 60,"Grampa","Simpson","x","u"
// 70,"Milhouse","Van Houten","c","u"

],
{

outputDirRootPath: '/some/output/dir',

parser: {
quote: '"',
delimiter: ',',
newline: '\n',
skipFirstLine: true,
trimWhitespace: true
},

dirSplits: [
{
columnIndex: 3,
valueToDirMap: {
'c': 'children',
'a': 'adults'
}
}
],

fileSplits: {
columnIndex: 4,
valueToFileMap: {
'u': {
filename: 'changes',
outputColumns: [
{name: 'person_no', columnIndex: 0},
{name: 'first_name', columnIndex: 1},
{name: 'last_name', columnIndex: 2}
]
},
'd': {
filename: 'deletes',
outputColumns: [
{name: 'person_no', columnIndex: 0}
]
}
}
}
},

function (err, manifest) {

// File output
// -----------
// /some/output/dir
// ./adults
// changes.csv:
// person_no,first_name,last_name
// 20,Homer,Simpson
// deletes.csv:
// person_no
// 40
// ./children
// changes.csv:
// person_no,first_name,last_name
// 10,Lisa,Simpson
// 70,Milhouse,Van Houten
// deletes.csv:
// person_no
// 30
// unknown.csv:
// 50,Maggie,Simpson,c,x
// ./unknown
// changes.csv:
// person_no,first_name,last_name
// 60,Grampa,Simpson

}
)

```

## smithereens(`sourceFilePaths`, `options`, `callback`)

| Arg | Type | Description |
| --- | ---- | ----------- |
| `sourceFilePaths` | `string` \| `[string]` | A string or an array of strings identifying one or more files. Uses `glob` so `/some/dir/*.csv` style patterns are supported, as is directory recursion via `/some/dir/**/*.csv` |
| `options` | `object` | An object configuring how output should be produced. See [Options](#options) for more information. |
| `callback` | `function` | To be of the form `function(err, manifest)`. Manifest contains a summary of the output files produced. |

## Options

| Property | Type | Description |
| --- | ---- | ----------- |
| `outputDirRootPath` | `string` | An absolute directory path where to write output to. All missing directories will be created. |
| `parser` | `object` | An `parser` object for configuring how input CSV files should be parsed. |
| `dirSplits` | `[object]` | An array of of `dirSplit` objects |
| `fileSplits` | `object` | A `fileSplit` object |

### `parser` object

Configures how to parse incoming CSV lines. Uses [csv-streamify](https://www.npmjs.com/package/csv-streamify) under the bonnet.

| Property | Type | Description |
| --- | ---- | ----------- |
| `skipFirstLine` | `boolean` | Should the first line of each file be ignored? Set to `true` if files include a header line, for example. |
| `delimiter` | `string` | Comma, semicolon, whatever - defaults to comma. |
| `newline` | `string` | Newline character (use \\r\\n for CRLF files). |
| `quote` | `string` | What's considered a quote. |
| `empty` | `string` | Empty fields are replaced by this value. |

### `dirSplit` object

Smithereens can break CSV files across a nested set of directories based on values defined in each line.

| Property | Type | Description |
| --- | ---- | ----------- |
| `columnIndex` | `integer` | Each line of each CSV file will be parsed into an array of strings. This value identifies which value to split on. |
| `valueToDirMap` | `object` | A simple mapping of an expected string value (as identified by `columnIndex`) and the directory name that this line should be routed to. |

### `fileSplit` object

In a similar way, Smithereens can route lines to different files, based on the contents of a parsed CSV column.

| Property | Type | Description |
| --- | ---- | ----------- |
| `columnIndex` | `integer` | Identifies which of the parsed string values from each CSV line should be used to determine a filename that a row should be routed to. |
| `valueToFileMap` | `object` | A key/value map where key is a string value that is expected via `columnIndex` and value is a `file` object. |

### `file` object

Defines which filename a CSV row should be routed to, along with some output-formatting configuration.

| Property | Type | Description |
| --- | ---- | ----------- |
| `filename` | `string` | The filename which a row should be routed to. All output files will be in CSV format. Note that the `.csv` extension is added automatically, so don't include it here. |
| `outputColumns` | `[object]` | An array of `outputColumn` objects - each defining a column that should appear in the output file. |

### `outputColumn` object

Defines the values for each column in the output.

| Property | Type | Description |
| --- | ---- | ----- |
| `name` | string | The name of the column, used in the first line of the CSV output |
| `columnIndex` | integer | Identifies the column in the corresponding row of the parsed incoming CSV array to copy to the output. If `columnIndex` is given any `type` parameter is ignored. |
| `type` | string | Describes other output - `hash`, `uuid`, `constant`. `hash` generates a hash value from the contents of the corresponding input row, `uuid` outputs a unique id, `constant` output a fix value, given by the `value` property. |
| `value` | number or string | A fixed value to put output when the `type` property is `constant`. |

## Testing

```bash
$ npm test
```

## License
[MIT](https://github.com/wmfs/smithereens/blob/master/LICENSE)