{"id":21660192,"url":"https://github.com/slub/es2json","last_synced_at":"2025-05-08T00:22:06.580Z","repository":{"id":41279298,"uuid":"237158764","full_name":"slub/es2json","owner":"slub","description":"elasticsearch bulk harvester which is using the scroll-API","archived":false,"fork":false,"pushed_at":"2024-05-03T07:21:21.000Z","size":134,"stargazers_count":3,"open_issues_count":0,"forks_count":3,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-31T15:33:50.023Z","etag":null,"topics":["elasticsearch","elasticsearch-client","json","line-delimited-json","python3"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/slub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-30T07:11:12.000Z","updated_at":"2024-05-03T07:21:25.000Z","dependencies_parsed_at":"2023-12-14T16:51:28.311Z","dependency_job_id":null,"html_url":"https://github.com/slub/es2json","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fes2json","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fes2json/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fes2json/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slub%2Fes2json/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/slub","download_url":"https://codeload.github.com/slub/es2json/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252974479,"owners_count":21834261,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elasticsearch","elasticsearch-client","json","line-delimited-json","python3"],"created_at":"2024-11-25T09:32:28.533Z","updated_at":"2025-05-08T00:22:06.540Z","avatar_url":"https://github.com/slub.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg alt=\"EFRE-Lod logo\" src=\"https://raw.githubusercontent.com/slub/data.slub-dresden.de/master/assets/images/EFRE_EU_quer_2015_rgb_engl.svg\" width=\"300\" \u003e\n\n# es2jon\n\nes2json is a simple elasticsearch index download/search tool. You can use your own queries via the -body switch or give it an idfile with \\n-delmited IDs. The idfile\\_consume switch consumes the idfile, leaving back in the file just the IDs which couldnt get retrieved because of any reasons. Output is in line-delimited JSON over STDOUT, if you don't use -headless, elasticsearch metadata is getting printed out too.\n\n## usage\n\n```\nusage: es2json [-h] [-server SERVER] [-ign-source] [-size N[:M]]\n               [-timeout TIMEOUT] [-includes INCLUDES] [-excludes EXCLUDES]\n               [-headless] [-body BODY] [-idfile IDFILE]\n               [-idfile_consume IDFILE_CONSUME] [-pretty] [-verbose]\n               [-chunksize CHUNKSIZE] [-auth [USER]]\n\nQuery elasticsearch indices/index/documents and print them formatted as JSON-Objects\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -server SERVER        use http://host:port/index/type/id.\n                        host:port - hostname or IP with port of the elasticsearch node to query\n                                    default: localhost:9200\n                        index     - index to query\n                                    default: None → queries across all available indices\n                        type      - elasticsearch doctype to use (optional)\n                        id        - identifier of one specific document to query (optional)\n  -use-ssl              use https instead of http\n  -ign-source           return the Document or just the Elasticsearch-Metadata\n  -size N[:M]           just return the first n-Records of the search,\n                        or return a python slice, e.g. 2:10 returns a list\n                        from the 2nd including the 9th element of the search\n                        only works with the ESGenerator\n                        Note: Not all slice variants may be supported\n  -timeout TIMEOUT      Set the time in seconds after when a ReadTimeoutError can occur.\n                        Default is 10 seconds. Raise for big/difficult querys \n  -includes INCLUDES    just include following _source field(s) in the _source object\n  -excludes EXCLUDES    exclude following _source field(s) from the _source object\n  -headless             don't print Elasticsearch metadata\n  -body BODY            Elasticsearch Query object that can be in the form of\n                        1) a JSON string (e.g. '{\"query\": {\"match\": {\"name\": \"foo\"}}}')\n                        2) a file containing the upper query string\n  -idfile IDFILE        path to a file with \\n-delimited IDs to process\n  -idfile_consume IDFILE_CONSUME\n                        path to a file with \\n-delimited IDs to process\n  -pretty               prettyprint the json output\n  -verbose              print progress for large dumps\n  -chunksize CHUNKSIZE  chunksize of the search window to use\n  -auth [USER]          Provide authentication, this can be done using:\n                        1) set environment variables E2J_USER and E2J_PASSWD. In\n                           this case there is no further argument needed here\n                        2) as a string \"username\". The password is then asked interactively\n                        3) as \"username:password\" (not recommended)\n\n```\n\n## tests\nThis package comes with tests, of course this needs to be setup. See tests/Readme for setting this up.\nRunning tests after setup is as easy as `python3 -m pytest tests`\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslub%2Fes2json","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fslub%2Fes2json","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslub%2Fes2json/lists"}