{"id":19653569,"url":"https://github.com/sematext/solr-reindexer","last_synced_at":"2025-04-28T17:31:38.731Z","repository":{"id":140122963,"uuid":"468790838","full_name":"sematext/solr-reindexer","owner":"sematext","description":"Reindexes documents from a Solr query to a destination collection","archived":false,"fork":false,"pushed_at":"2023-12-21T08:02:04.000Z","size":37,"stargazers_count":7,"open_issues_count":2,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-21T11:08:30.290Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sematext.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-03-11T14:53:03.000Z","updated_at":"2024-05-30T11:29:16.000Z","dependencies_parsed_at":null,"dependency_job_id":"ae7bc6fa-c7ec-4b05-a9e8-b9b3d21630e8","html_url":"https://github.com/sematext/solr-reindexer","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sematext%2Fsolr-reindexer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sematext%2Fsolr-reindexer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sematext%2Fsolr-reindexer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sematext%2Fsolr-reindexer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sematext","download_url":"https://codeload.github.com/sematext/solr-reindexer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251355434,"owners_count":21576358,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T15:14:32.359Z","updated_at":"2025-04-28T17:31:38.457Z","avatar_url":"https://github.com/sematext.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# solr-reindexer\nReindexes documents from a Solr query to a destination collection. Quick tutorial [here](https://sematext.com/blog/solr-reindexer-quick-way-to-reindex-to-a-new-collection/).\n## Usage\nDownload the uber-jar from [releases](https://github.com/sematext/solr-reindexer/releases) and run it with Java (11+). Here's an example with all the options:\n```\njava -jar solr-reindexer.jar\\\n -sourceCollection my_collection_v1\\\n -targetCollection my_collection_v2\\ \n -uniqueKey id\\\n -sourceZkAddress localhost:9983,localhost:2181\\\n -targetZkAddress zoo1:2181,zoo2:2181\\\n -skipFields _version_,text\\\n -numWriteThreads 2\\\n -queueSize 10000\\\n -retries 7\\\n -retryInterval 2000\\\n -query \"isDeleted:false AND isIgnored:false\"\\\n -rows 100\n```\n\nOnly `sourceCollection` and `targetCollection` are mandatory.\nThe rest are:\n- `uniqueKey`: we use a cursor to go over the data. The cursor requires to sort on the `uniqueKey` defined in the schema, which in turn defaults to `id`\n- `sourceZkAddress` and `targetZkAddress`: the Zookeeper host:port for SolrCloud (source and destination). If there are more, comma-separate them\n- `skipFields`: we reindex all the stored and docValues fields by default. But some may be skipped, like the default `_version_` (which will break the reindex because it will cause a version conflict) or copyFields that are also stored (they'll duplicate the values, because you'll redo the copyField operation). Comma-separate multiple fields\n- `retries` and `retryInterval`: if we encounter an exception, we wait for `retryInterval` millis and retry up to `retries` times\n- `queueSize`: the reader thread writes into an in-memory queue of this size (in pages, see `rows` below for page size). Defaults to 100\n- `numWriteThreads`: this many threads consume from the in-memory queue, writing to the target collection. Defaults to 2\n- `query`: you may not want to reindex everything with the default `*:*`\n- `rows`: we read one page of this size at a time. We also write one batch of this size at a time. Typically, the best performance is around 1MB per batch. Default is 1000 rows per page/batch\n\n## SSL\n\nIn order to connect to Solr via SSL, you can pass system properties to CloudSolrClient as described [here](https://solr.apache.org/guide/solr/latest/deployment-guide/enabling-ssl.html#index-a-document-using-cloudsolrclient)\n\nHere's an example command:\n```\njava -jar solr-reindexer.jar\\\n### SSL options begin\\\n  -Djavax.net.ssl.keyStore=/path/to/solr-ssl.keystore.p12\\\n  -Djavax.net.ssl.keyStorePassword=secret\\\n  -Djavax.net.ssl.keyStoreType=pkcs12\\\n  -Djavax.net.ssl.trustStore=/path/to/solr-ssl.keystore.p12\\\n  -Djavax.net.ssl.trustStorePassword=secret\\\n  -Djavax.net.ssl.trustStoreType=pkcs12\\\n### SSL options end\\\n  -sourceCollection sourceCollectionName\\\n  -targetCollection targetCollectionName\\\n  -sourceZkAddress localhost:9983\\\n  -targetZkAddress localhost:2181\n```\n\n## Parallelizing and other performance tips\n\nYou can start multiple instances of the reindexer, one per shard, by specifying `-sourceShards shard1` for one instance, `-sourceShards shard2` for another, etc.\n\nYou can also group N shards per reindexer by saying `-sourceShards shard1,shard2...` you get it, by comma-separating values.\n\nTypically, the bottleneck is reading. You'll want to run the reindexer close to the source. The default of 2 write threads should keep up, unless the destination (or the network to it) is slow.\n\n## Contributing\nFeel free to clone the repository, import it as a Gradle project, and add features.\n\nTo build the uber-jar, use `gradle jar`.\n\nTentative roadmap:\n- authentication support\n- supporting non-SolrCloud\n- using Export instead of Cursor\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsematext%2Fsolr-reindexer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsematext%2Fsolr-reindexer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsematext%2Fsolr-reindexer/lists"}