{"id":13783258,"url":"https://github.com/moshe/elasticsearch_loader","last_synced_at":"2025-05-16T19:04:02.691Z","repository":{"id":11423753,"uuid":"68472454","full_name":"moshe/elasticsearch_loader","owner":"moshe","description":"A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch","archived":false,"fork":false,"pushed_at":"2022-07-03T17:41:27.000Z","size":134,"stargazers_count":401,"open_issues_count":5,"forks_count":82,"subscribers_count":21,"default_branch":"master","last_synced_at":"2025-05-13T04:07:38.659Z","etag":null,"topics":["csv","elasticsearch","elasticsearch-loader","json","logstash","parquet","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/moshe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-09-17T19:02:44.000Z","updated_at":"2025-05-09T06:29:21.000Z","dependencies_parsed_at":"2022-09-19T06:50:44.970Z","dependency_job_id":null,"html_url":"https://github.com/moshe/elasticsearch_loader","commit_stats":null,"previous_names":[],"tags_count":32,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moshe%2Felasticsearch_loader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moshe%2Felasticsearch_loader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moshe%2Felasticsearch_loader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moshe%2Felasticsearch_loader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/moshe","download_url":"https://codeload.github.com/moshe/elasticsearch_loader/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254592367,"owners_count":22097010,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","elasticsearch","elasticsearch-loader","json","logstash","parquet","python"],"created_at":"2024-08-03T19:00:17.343Z","updated_at":"2025-05-16T19:04:02.602Z","avatar_url":"https://github.com/moshe.png","language":"Python","readme":"# elasticsearch_loader [![Build Status](https://travis-ci.org/moshe/elasticsearch_loader.svg?branch=master)](https://travis-ci.org/moshe/elasticsearch_loader) [![Can I Use Python 3?](https://caniusepython3.com/project/elasticsearch-loader.svg)](https://caniusepython3.com/project/elasticsearch-loader) [![PyPI version](https://badge.fury.io/py/elasticsearch_loader.svg)](https://pypi.python.org/pypi/elasticsearch-loader)\n\n## Main features\n\n-   Batch upload CSV (actually any \\*SV) files to Elasticsearch\n-   Batch upload JSON files / JSON lines to Elasticsearch\n-   Batch upload parquet files to Elasticsearch\n-   Pre defining custom mappings\n-   Delete index before upload\n-   Index documents with \\_id from the document itself\n-   Load data directly from url\n-   SSL and basic auth\n-   Unicode Support ✌️\n\n## Plugins\nIn order to install plugin, simply run `pip install plugin-name`\n-   [esl-redis](https://pypi.org/project/esl-redis) - Read continuously from a redis list(s) and index to elasticsearch\n-   [esl-s3](https://pypi.org/project/esl-s3) - Plugin for listing and indexing files from S3\n\n### Test matrix\n\n| python / es | 5.6.16 | 6.8.0 | 7.1.1 | 8.1.2 |\n| ----------- | ----- | ----- | ----- | ----- |\n| 3.7         | V     | V     | V     | V     |\n\n### Installation\n\n`pip install elasticsearch-loader`  \n_In order to add parquet support run `pip install 'elasticsearch-loader[parquet]'`_\n\n### Usage\n\n```\n(venv)/tmp $ elasticsearch_loader --help\nUsage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...\n\nOptions:\n  -c, --config-file TEXT          Load default configuration file from esl.yml\n  --bulk-size INTEGER             How many docs to collect before writing to\n                                  Elasticsearch (default 500)\n  --es-host TEXT                  Elasticsearch cluster entry point. (default\n                                  http://localhost:9200)\n  --verify-certs                  Make sure we verify SSL certificates\n                                  (default false)\n  --use-ssl                       Turn on SSL (default false)\n  --ca-certs TEXT                 Provide a path to CA certs on disk\n  --http-auth TEXT                Provide username and password for basic auth\n                                  in the format of username:password\n  --index TEXT                    Destination index name  [required]\n  --delete                        Delete index before import? (default false)\n  --update                        Merge and update existing doc instead of\n                                  overwrite\n  --progress                      Enable progress bar - NOTICE: in order to\n                                  show progress the entire input should be\n                                  collected and can consume more memory than\n                                  without progress bar\n  --type TEXT                     Docs type. TYPES WILL BE DEPRECATED IN APIS\n                                  IN ELASTICSEARCH 7, AND COMPLETELY REMOVED\n                                  IN 8.  [required]\n  --id-field TEXT                 Specify field name that be used as document\n                                  id\n  --as-child                      Insert _parent, _routing field, the value is\n                                  same as _id. Note: must specify --id-field\n                                  explicitly\n  --with-retry                    Retry if ES bulk insertion failed\n  --index-settings-file FILENAME  Specify path to json file containing index\n                                  mapping and settings, creates index if\n                                  missing\n  --timeout FLOAT                 Specify request timeout in seconds for\n                                  Elasticsearch client\n  --encoding TEXT                 Specify content encoding for input files\n  --keys TEXT                     Comma separated keys to pick from each\n                                  document\n  -h, --help                      Show this message and exit.\n\nCommands:\n  csv\n  json     FILES with the format of [{\"a\": \"1\"}, {\"b\": \"2\"}]\n  parquet\n  redis\n  s3\n\n```\n\n### Examples\n\n#### Load 2 CSV to elasticsearch\n\n`elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv`\n\n#### Load JSONs to elasticsearch\n\n`elasticsearch_loader --index incidents --type incident json *.json`\n\n#### Load all git commits into elasticsearch\n\n`git log --pretty=format:'{\"sha\":\"%H\",\"author_name\":\"%aN\", \"author_email\": \"%aE\",\"date\":\"%ad\",\"message\":\"%f\"}' | elasticsearch_loader --type git --index git json --json-lines -`\n\n#### Load parquet to elasticsearch\n\n`elasticsearch_loader --index incidents --type incident parquet file1.parquet`\n\n#### Load CSV from github repo (actually any http/https is ok)\n\n`elasticsearch_loader --index data --type avg_height --id-field country json https://raw.githubusercontent.com/samayo/country-data/master/src/country-avg-male-height.json`\n\n#### Load data from stdin\n\n`generate_data | elasticsearch_loader --index data --type incident csv -`\n\n#### Read id from incident_id field\n\n`elasticsearch_loader --id-field incident_id --index incidents --type incident csv file1.csv file2.csv`\n\n#### Load custom mappings\n\n`elasticsearch_loader --index-settings-file samples/mappings.json --index incidents --type incident csv file1.csv file2.csv`\n\n### Tests and sample data\n\nEnd to end and regression tests are located under test directory and can run by running `./test.py`\nInput formats can be found under samples\n\n### Stargazers over time\n\n[![Stargazers over time](https://starcharts.herokuapp.com/moshe/elasticsearch_loader.svg)](https://starcharts.herokuapp.com/moshe/elasticsearch_loader)\n","funding_links":[],"categories":["Elasticsearch developer tools and utilities","Python"],"sub_categories":["Import and Export"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoshe%2Felasticsearch_loader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmoshe%2Felasticsearch_loader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoshe%2Felasticsearch_loader/lists"}