https://github.com/edwardsmit/csv_to_es
POC for loading a CSV into Elasticsearch
https://github.com/edwardsmit/csv_to_es
csv-importer elasticsearch elixir
Last synced: about 2 months ago
JSON representation
POC for loading a CSV into Elasticsearch
- Host: GitHub
- URL: https://github.com/edwardsmit/csv_to_es
- Owner: edwardsmit
- License: mit
- Created: 2019-06-17T11:28:31.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2019-06-17T11:39:42.000Z (about 7 years ago)
- Last Synced: 2025-02-02T00:13:00.959Z (over 1 year ago)
- Topics: csv-importer, elasticsearch, elixir
- Language: Elixir
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CsvToEs
This is a POC to parse a CSV in a streaming fashion and store a JSON-object
derived from a CSV-line in Elasticsearch, using the Bulk-API of Elasticsearch.
## Build
```shell
mix escript.build
```
## Load a CSV
```shell
./csv_to_es
```
## Limitations
* Currently only a `;`-separated file is supported which must have a header-line
for naming the ES-doc-fields. This project has only been tested with a
bagadres-full.csv file downloaded from NLExtract.nl
[download](https://data.nlextract.nl/bag/csv/bag-adressen-full-laatst.csv.zip)
* Elasticsearch is expected to run at [localhost](http://localhost:9200)
* As we don't create an `_id` field explicitly, multiple runs of the tool will
create duplicates
* The batch-size is fixed at 1_000 this figure has been made up with no test
or knowledge whatsoever
* The time-out of 60s has been chose as "large enough" to avoid timeouts
* No error-handling is implemented
* The target index is hardcoded to `elixir-csv`
## Tip
Before running this tool you'd best set the `number_of_replicas` to `0` and the
`refresh_interval` to `-1` for the target-elasticsearch-index `elixir-csv`
## Can I use this in Production
Probably not as is