{"id":17091665,"url":"https://github.com/juarezr/solrcopy","last_synced_at":"2025-04-12T22:41:08.930Z","repository":{"id":43002700,"uuid":"242227262","full_name":"juarezr/solrcopy","owner":"juarezr","description":"Command line tool for backup and restore of information stored in cores of Apache Solr","archived":false,"fork":false,"pushed_at":"2025-03-19T18:38:16.000Z","size":400,"stargazers_count":8,"open_issues_count":11,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-26T16:47:38.276Z","etag":null,"topics":["apache-solr","dataimport","fulltext-search","solr","solr-cli","solr-client","solr-dataimporter","solr-setup"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/juarezr.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.txt","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"liberapay":"juarezr"}},"created_at":"2020-02-21T20:43:52.000Z","updated_at":"2025-03-19T18:33:28.000Z","dependencies_parsed_at":"2025-03-19T12:15:38.262Z","dependency_job_id":null,"html_url":"https://github.com/juarezr/solrcopy","commit_stats":null,"previous_names":[],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juarezr%2Fsolrcopy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juarezr%2Fsolrcopy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juarezr%2Fsolrcopy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juarezr%2Fsolrcopy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/juarezr","download_url":"https://codeload.github.com/juarezr/solrcopy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248643045,"owners_count":21138353,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-solr","dataimport","fulltext-search","solr","solr-cli","solr-client","solr-dataimporter","solr-setup"],"created_at":"2024-10-14T13:59:16.884Z","updated_at":"2025-04-12T22:41:08.924Z","avatar_url":"https://github.com/juarezr.png","language":"Rust","funding_links":["https://liberapay.com/juarezr","https://liberapay.com/juarezr/donate"],"categories":[],"sub_categories":[],"readme":"# solrcopy\n\nCommand line tool for backup and restore of documents stored in cores of [Apache Solr](https://lucene.apache.org/solr/).\n\n## Status\n\n![build-test-and-lint](https://github.com/juarezr/solrcopy/workflows/build-test-and-lint/badge.svg)\n\n[![Coverage Status](https://coveralls.io/repos/github/juarezr/solrcopy/badge.svg?branch=master)](https://coveralls.io/github/juarezr/solrcopy?branch=master)\n\n- solrcopy backup/restore\n  - Should work well in most common cases.\n  - Works for me... :)\n- Check the issues in github\n- Patches welcome!\n\n\u003c!-- markdownlint \u003c!-- markdownlint-disable-next-line no-inline-html --\u003e --\u003e\n[\u003cimg alt=\"Send some cookies\" src=\"http://img.shields.io/liberapay/receives/juarezr.svg?label=Send%20some%20cookies\u0026logo=liberapay\"\u003e](https://liberapay.com/juarezr/donate)\n\n## Usage\n\n1. Use the command `solrcopy backup` for dumping documents from a Solr core into local zip files.\n   1. Use the switch `--query` for filtering the documents extracted by using a [Solr](https://lucene.apache.org/solr/guide/8_4/the-standard-query-parser.html) [Query](https://lucene.apache.org/solr/guide/8_4/the-standard-query-parser.html)\n   2. Use the switch `--order` for specifying the sorting of documents extracted.\n   3. Use the switches `--limit` and `--skip` for restricting the number of documents extracted.\n   4. Use the switch `--select` for restricting the columns extracted.\n2. Use the command `solrcopy restore` for uploading the extracted documents from local zip files into the same Solr core or another with same field names as extracted.\n   1. The documents are updated in the target core in the same format that they were extracted.\n   2. The documents are inserted/updated based on their `uniqueKey` field defined in core.\n   3. If you want to change the documents/columns use the switches in `solrcopy backup` for extracting more than one slice of documents to be updated.\n\n### Huge cores\n\nExtracting and updating documents in huge cores can be challenging. It can take too much time and can fail any time.\n\nBellow some tricks for dealing with such cores:\n\n1. For reducing time, you can use the switches `--readers`  and `--writers` for executing operations in parallel.\n2. When the number of docs to extract is huge, `backup` subcommand tend to slow as times goes and eventually fails. This is because Solr is suffers to get docs batches with hight skip/start parameters. For dealing with this:\n   1. Use the parameters `--iterate-by`n `between` and `--step`for iterating through parameter `--query` with variables `{begin}` and `{end}`.\n   2. This way it will iterate and restrict by hour, day, range the docs being downloaded.\n   3. For example: `--query 'date:[{begin} TO {end}]' --iterate-by day --between '2020-04-01' '2020-04-30T23:59:59'`\n3. Use the parameter `--param shards=shard1` for copying by each shard by name in `backkup`subcommand.\n4. Use the parameter `--delay` for avoiding to overload the Solr server.\n\n## Invocation\n\n``` text\n$ solrcopy --help\nCommand line tool for backup and restore of documents stored in cores of Apache Solr.\n\nSolrcopy is a command for doing backup and restore of documents stored on Solr cores. It let you filter docs by using a expression, limit quantity, define order and desired columns to export. The data is stored as json inside local zip files. It is agnostic to data format, content and storage place. Because of this data is restored exactly as extracted and your responsible for extracting, storing and updating the correct data from and into correct cores.\n\nUsage: solrcopy \u003cCOMMAND\u003e\n\nCommands:\n  backup    Dumps documents from a Apache Solr core into local backup files\n  restore   Restore documents from local backup files into a Apache Solr core\n  commit    Perform a commit in the Solr core index for persisting documents in disk/memory\n  delete    Removes documents from the Solr core definitively\n  generate  Generates man page and completion scripts for different shells\n  help      Print this message or the help of the given subcommand(s)\n\nOptions:\n  -h, --help\n          Print help (see a summary with '-h')\n\n  -V, --version\n          Print version\n```\n\n``` text\n$ solrcopy backup --help\nDumps documents from a Apache Solr core into local backup files\n\nUsage: solrcopy backup [OPTIONS] --url \u003clocalhost:8983/solr\u003e --core \u003ccore\u003e --dir \u003c/path/to/output\u003e\n\nOptions:\n  -u, --url \u003clocalhost:8983/solr\u003e [env: SOLR_COPY_URL=]\n          Url pointing to the Solr cluster\n\n  -c, --core \u003ccore\u003e\n          Case sensitive name of the core in the Solr server\n\n  -d, --dir \u003c/path/to/output\u003e [env: SOLR_COPY_DIR=]\n          Existing folder where the zip backup files containing the extracted documents are stored\n\n  -q, --query \u003c'f1:vl1 AND f2:vl2'\u003e\n          Solr Query param 'q' for filtering which documents are retrieved See: https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html\n\n  -o, --order \u003cf1:asc,f2:desc,...\u003e\n          Solr core fields names for sorting documents for retrieval\n\n  -k, --skip \u003cquantity\u003e\n          Skip this quantity of documents in the Solr Query [default: 0]\n\n  -l, --limit \u003cquantity\u003e\n          Maximum quantity of documents for retrieving from the core (like 100M)\n\n  -s, --select \u003cfield1,field2,...\u003e\n          Names of core fields retrieved in each document [default: all but _*]\n\n  -i, --iterate-by \u003cmode\u003e\n          Slice the queries by using the variables {begin} and {end} for iterating in `--query` Used in bigger solr cores with huge number of docs because querying the end of docs is expensive and fails frequently [default: day]\n\n          Possible values:\n          - none\n          - minute: Break the query in slices by a first ordered date field repeating between {begin} and {end} in the query parameters\n          - hour\n          - day\n          - range:  Break the query in slices by a first ordered integer field repeating between {begin} and {end} in the query parameters\n\n  -b, --between \u003cbegin\u003e \u003cend\u003e \u003cbegin\u003e \u003cend\u003e\n          The range of dates/numbers for iterating the queries throught slices. Requires that the query parameter contains the variables {begin} and {end} for creating the slices. Use numbers or dates in ISO 8601 format (yyyy-mm-ddTHH:MM:SS)\n\n      --step \u003cnum\u003e\n          Number to increment each step in iterative mode [default: 1]\n          \n  -p, --params \u003cuseParams=mypars\u003e\n          Extra parameter for Solr Update Handler. See: https://lucene.apache.org/solr/guide/transforming-and-indexing-custom-json.html\n\n  -m, --max-errors \u003ccount\u003e\n          How many times should continue on source document errors [default: 0]\n\n      --delay-before \u003ctime\u003e\n          Delay before any processing in solr server. Format as: 30s, 15min, 1h\n\n      --delay-per-request \u003ctime\u003e\n          Delay between each http operations in solr server. Format as: 3s, 500ms, 1min\n\n      --delay-after \u003ctime\u003e\n          Delay after all processing. Usefull for letting Solr breath\n\n      --num-docs \u003cquantity\u003e\n          Number of documents to retrieve from solr in each reader step [default: 4k]\n\n      --archive-files \u003cquantity\u003e\n          Max number of files of documents stored in each zip file [default: 40]\n\n      --zip-prefix \u003cname\u003e\n          Optional prefix for naming the zip backup files when storing documents\n\n      --workaround-shards \u003ccount\u003e\n          Use only when your Solr Cloud returns a distinct count of docs for some queries in a row. This may be caused by replication problems between cluster nodes of shard replicas of a core. Response with 'num_found' bellow the greatest value are ignored for getting all possible docs. Use with `--params shards=shard_name` for retrieving all docs for each shard of the core\n\n  -r, --readers \u003ccount\u003e\n          Number parallel threads exchanging documents with the solr core [default: 1]\n\n  -w, --writers \u003ccount\u003e\n          Number parallel threads syncing documents with the zip archives [default: 1]\n\n      --log-level \u003clevel\u003e\n          What level of detail should print messages [default: info]\n\n      --log-mode \u003cmode\u003e\n          Terminal output to print messages [default: mixed]\n          \n\n      --log-file-path \u003cpath\u003e\n          Write messages to a local file\n\n      --log-file-level \u003clevel\u003e\n          What level of detail should write messages to the file [default: debug]\n\n  -h, --help\n          Print help (see a summary with '-h')\n\n$ solrcopy backup --url http://localhost:8983/solr --core demo --query 'price:[1 TO 400] AND NOT popularity:10' --order price:desc,weight:asc --limit 10000 --select id,date,name,price,weight,popularity,manu,cat,store,features --dir ./tmp\n```\n\n``` text\n$ solrcopy restore --help\nRestore documents from local backup files into a Apache Solr core\n\nUsage: solrcopy restore [OPTIONS] --url \u003clocalhost:8983/solr\u003e --core \u003ccore\u003e --dir \u003c/path/to/output\u003e\n\nOptions:\n  -u, --url \u003clocalhost:8983/solr\u003e  Url pointing to the Solr cluster [env: SOLR_COPY_URL=]\n  -c, --core \u003ccore\u003e                Case sensitive name of the core in the Solr server\n  -d, --dir \u003c/path/to/output\u003e      Existing folder where the zip backup files containing the extracted documents are stored [env: SOLR_COPY_DIR=]\n  -f, --flush \u003cmode\u003e               Mode to perform commits of the documents transaction log while updating the core [possible values: none, soft, hard, \u003cinterval\u003e] [default: hard]\n      --no-final-commit            Do not perform a final hard commit before finishing\n      --disable-replication        Disable core replication at start and enable again at end\n  -p, --params \u003cuseParams=mypars\u003e  Extra parameter for Solr Update Handler. See: https://lucene.apache.org/solr/guide/transforming-and-indexing-custom-json.html\n  -m, --max-errors \u003ccount\u003e         How many times should continue on source document errors [default: 0]\n      --delay-before \u003ctime\u003e        Delay before any processing in solr server. Format as: 30s, 15min, 1h\n      --delay-per-request \u003ctime\u003e   Delay between each http operations in solr server. Format as: 3s, 500ms, 1min\n      --delay-after \u003ctime\u003e         Delay after all processing. Usefull for letting Solr breath\n  -s, --search \u003ccore*.zip\u003e         Search pattern for matching names of the zip backup files\n      --order \u003casc | desc\u003e         Optional order for searching the zip archives\n  -r, --readers \u003ccount\u003e            Number parallel threads exchanging documents with the solr core [default: 1]\n  -w, --writers \u003ccount\u003e            Number parallel threads syncing documents with the zip archives [default: 1]\n      --log-level \u003clevel\u003e          What level of detail should print messages [default: info]\n      --log-mode \u003cmode\u003e            Terminal output to print messages [default: mixed]\n      --log-file-path \u003cpath\u003e       Write messages to a local file\n      --log-file-level \u003clevel\u003e     What level of detail should write messages to the file [default: debug]\n  -h, --help                       Print help\n\n$ solrcopy restore --url http://localhost:8983/solr  --dir ./tmp --core demo\n```\n\n``` text\n$ solrcopy delete --help\nRemoves documents from the Solr core definitively\n\nUsage: solrcopy delete [OPTIONS] --query \u003cf1:val1 AND f2:val2\u003e --url \u003clocalhost:8983/solr\u003e --core \u003ccore\u003e\n\nOptions:\n  -u, --url \u003clocalhost:8983/solr\u003e    Url pointing to the Solr cluster [env: SOLR_COPY_URL=]\n  -c, --core \u003ccore\u003e                  Case sensitive name of the core in the Solr server\n  -q, --query \u003cf1:val1 AND f2:val2\u003e  Solr Query for filtering which documents are removed in the core. \n                                     Use '*:*' for excluding all documents in the core. There are no way of recovering excluded docs.\n                                     Use with caution and check twice\n  -f, --flush \u003cmode\u003e                 Wether to perform a commits of transaction log after removing the documents [default: soft]\n      --log-level \u003clevel\u003e            What level of detail should print messages [default: info]\n      --log-mode \u003cmode\u003e              Terminal output to print messages [default: mixed]\n      --log-file-path \u003cpath\u003e         Write messages to a local file\n      --log-file-level \u003clevel\u003e       What level of detail should write messages to the file [default: debug]\n  -h, --help                         Print help\n\n$ solrcopy delete --url http://localhost:8983/solr --core demo --query '*:*'\n```\n\n``` text\n$ solrcopy commit --help\nPerform a commit in the Solr core index for persisting documents in disk/memory\n\nUsage: solrcopy commit [OPTIONS] --url \u003clocalhost:8983/solr\u003e --core \u003ccore\u003e\n\nOptions:\n  -u, --url \u003clocalhost:8983/solr\u003e  Url pointing to the Solr cluster [env: SOLR_COPY_URL=]\n  -c, --core \u003ccore\u003e                Case sensitive name of the core in the Solr server\n      --log-level \u003clevel\u003e          What level of detail should print messages [default: info]\n      --log-mode \u003cmode\u003e            Terminal output to print messages [default: mixed]\n      --log-file-path \u003cpath\u003e       Write messages to a local file\n      --log-file-level \u003clevel\u003e     What level of detail should write messages to the file [default: debug]\n  -h, --help                       Print help\n\n$ solrcopy commit --url http://localhost:8983/solr --core demo\n```\n\n## Known Issues\n\n- Error extracting documents from a Solr cloud cluster with corrupted shards or unreplicated replicas:\n  - Cause: In this case Cause: Solr reports diferent document count each time is answering the query.\n  - Fix: extract data pointing directly to the shard instance address, not for the cloud address.\n  - Also can use custom params to solr as `--params timeAllowed=15000\u0026segmentTerminatedEarly=false\u0026cache=false\u0026shards=shard1`\n\n## Related\n\n1. [solrbulk](https://github.com/miku/solrbulk)\n2. [solrdump](https://github.com/ubleipzig/solrdump)\n3. [Solr documentation of backup/restore](https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html)\n\n---\n\n## Building\n\nFor compiling a version from source:\n\n1. Install rust following the instructions on [https://rustup.rs](https://rustup.rs)\n2. Build with the command: `cargo build --release`\n3. Install locally with the command: `cargo install`\n\n## Development\n\nFor setting up a development environment:\n\nFor using Visual Studio Code:\n\n1. Install rust following the instructions on [https://rustup.rs](https://rustup.rs)\n2. Install Visual Studio Code following the instructions on the microsoft [site](https://code.visualstudio.com/download)\n3. Install the following extensions in VS Code:\n   - vadimcn.vscode-lldb\n   - rust-lang.rust\n   - swellaby.vscode-rust-test-adapter\n\nYou can also use Intellij Idea, vim, emacs or you prefered IDE.\n\n## Testing\n\nFor setting up a testing environment you will need:\n\n1. A server instance of [Apache Solr](https://lucene.apache.org/solr/)\n2. A **source** core with some documents for testing the `solrcopy backup` command.\n3. A **target** core with same schema for testing the `solrcopy restore` command.\n4. Setting the server address and core names for the `solrcopy` parameters in command line or IDE launch configuration.\n\n### Use a existing server\n\n 1. Select on your Solr server a existing **source** core or create a new one and fill with some documents.\n 2. Clone a new **target** core with the same schema as the previous but without documents.\n\n### Install a server in a docker container\n\nCheck the Solr docker [documentation](https://solr.apache.org/guide/solr/latest/deployment-guide/solr-in-docker.html) for help in how to create a Solr container.\n\n#### Using docker compose\n\n1. Install [docker stable](https://docs.docker.com/get-started/get-docker/) for your platform\n2. Create the container and the cores for testing with the commands bellow.\n3. Check the cores created in the admin ui at `http://localhost:8983/solr`\n\n``` bash\n# This command creates the container with a solr server with two cores: 'demo' and 'target'\n$ docker compose -f docker/docker-compose.yml up -d\n# Run this command to insert some data into the cores\n$ docker compose exec solr solr-ingest-all\n# Run this command to test backup\n$ cargo run -- backup --url http://localhost:8983/solr --core demo --dir $PWD\n# Run this command to test restoring the backukp data into a existing empty core\n$ cargo run -- restore --url http://localhost:8983/solr --search demo --core target --dir $PWD\n```\n\n#### Using only docker tools\n\nIts possible to create the solr container using just docker instead of docker compoose.\n\nFollow these instructions if you'd rather prefer this way:\n\n``` bash\n$ cd docker\n# Pull solr latest solr image from docker hub\n$ docker pull solr:slim\n...\n# 1. Create a container running solr and after\n# 2. Create the **source** core with the name 'demo'\n# 3. Import some docs into the 'demo' core\n$ docker run -d --name solr4test -p 8983:8983 solr:slim solr-demo\n...\n# Create a empty **target** core named 'target'\n$ docker exec -it solr4test solr create_core -c target\n```\n\n### Developing in Visual Studio Code\n\nThere are some pre-configured launch configurations in this repository for debugging\nsolrcopy.\n\n1. Start the SOLR docker container with the procedures above.\n2. Run Solrcopy using one of the predefined lauch configuration.\n    1. You will be asked for the program argumentls like:\n        1. SolrURL\n        2. Query\n3. You can also edit the settings file `.vscode/launch.json` if you'd rather prefer:\n   1. Set the following parameters for specifying a query to extract documents:\n      - `--query`\n      - `--order`\n      - `--select`\n      - `--batch`\n      - `--skip`\n      - `--limit`\n   2. Check the [Solr Query](https://lucene.apache.org/solr/guide/8_4/the-standard-query-parser.html) docs for understanding this parameters.\n4. You can also run any query in [Solr admin UI](http://localhost:8983/solr/#/demo)\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuarezr%2Fsolrcopy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuarezr%2Fsolrcopy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuarezr%2Fsolrcopy/lists"}