{"id":13654251,"url":"https://github.com/miku/esdump","last_synced_at":"2025-04-11T09:17:03.809Z","repository":{"id":57565207,"uuid":"254355654","full_name":"miku/esdump","owner":"miku","description":"Stream documents from elasticsearch with scroll (and HTTP GET only)","archived":false,"fork":false,"pushed_at":"2024-06-28T17:59:08.000Z","size":251,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-25T06:33:39.791Z","etag":null,"topics":["code4lib","command-line-tool","elasticsearch"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/miku.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-09T11:43:04.000Z","updated_at":"2024-06-28T17:59:11.000Z","dependencies_parsed_at":"2024-08-02T02:10:57.519Z","dependency_job_id":"b1251e3f-f106-4060-b41f-eecc6625c951","html_url":"https://github.com/miku/esdump","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fesdump","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fesdump/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fesdump/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fesdump/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/miku","download_url":"https://codeload.github.com/miku/esdump/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248366541,"owners_count":21091983,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code4lib","command-line-tool","elasticsearch"],"created_at":"2024-08-02T02:01:25.679Z","updated_at":"2025-04-11T09:17:03.788Z","avatar_url":"https://github.com/miku.png","language":"Go","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"readme":"# esdump\n\nStream docs from Elasticsearch to stdout for ad-hoc data mangling using the\n[Scroll\nAPI](https://www.elastic.co/guide/en/elasticsearch/guide/master/scroll.html#scroll).\nJust like [solrdump](https://github.com/ubleipzig/solrdump), but for\n[elasticsearch](https://elastic.co/). Since esdump 0.1.11, the default operator can be set explicitly and changed from `OR` to `AND`.\n\nLibraries can use both GET and POST requests to issue scroll requests.\n\n* [elasticsearch-py](https://github.com/elastic/elasticsearch-py/blob/c0767a9569a719dcb15adec91a88afc32b27b1b0/elasticsearch/client/__init__.py#L1300-L1323) uses POST\n* [esapi](https://github.com/elastic/go-elasticsearch/blob/6f36a473b19f05f20933da8f59347b308ab46594/esapi/api.scroll.go#L65) uses GET\n\nThis tool uses HTTP GET only, and does not clear scrolls (which would probably\nuse\n[DELETE](https://github.com/elastic/go-elasticsearch/blob/6f36a473b19f05f20933da8f59347b308ab46594/esapi/api.clear_scroll.go#L60))\nso this tool works with read-only servers, that only allow GET.\n\n## Install\n\n```\n$ go install github.com/miku/esdump/cmd/esdump@latest\n```\n\nOr via a [release](https://github.com/miku/esdump/releases).\n\n## Usage\n\n```\nesdump uses the elasticsearch scroll API to stream documents to stdout.\n\nOriginally written to extract samples from https://search.fatcat.wiki (a\nscholarly communications preservation and discovery project).\n\n    $ esdump -s https://search.fatcat.wiki -i fatcat_release -q 'web archiving'\n\nUsage of ./esdump:\n  -i string\n        index name (default \"fatcat_release\")\n  -ids string\n        a path to a file with one id per line to fetch\n  -l int\n        limit number of documents fetched, zero means no limit\n  -mq string\n        path to file, one lucene query per line\n  -op string\n        default operator for query string queries (default \"AND\")\n  -q string\n        lucene syntax query to run, example: 'affiliation:\"alberta\"' (default \"*\")\n  -s string\n        elasticsearch server (default \"https://search.fatcat.wiki\")\n  -scroll string\n        context timeout (default \"5m\")\n  -size int\n        batch size (default 1000)\n  -v    show version\n  -verbose\n        be verbose\n```\n\n## Performance data point(s)\n\n```\n925636 docs in 4m47.460217252s (3220 docs/s)\n```\n\n## TODO\n\n* [ ] move to [`search_after`](https://www.elastic.co/guide/en/elasticsearch/reference/current/scroll-api.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiku%2Fesdump","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmiku%2Fesdump","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiku%2Fesdump/lists"}