{"id":21620754,"url":"https://github.com/miku/solrbulk","last_synced_at":"2025-08-24T22:05:52.674Z","repository":{"id":25792940,"uuid":"29231498","full_name":"miku/solrbulk","owner":"miku","description":"SOLR bulk indexing utility for the command line.","archived":false,"fork":false,"pushed_at":"2025-07-10T10:30:39.000Z","size":196,"stargazers_count":44,"open_issues_count":0,"forks_count":7,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-10T18:03:13.631Z","etag":null,"topics":["code4lib","indexing","solr"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/miku.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-01-14T06:55:20.000Z","updated_at":"2025-05-14T14:07:15.000Z","dependencies_parsed_at":"2023-12-14T14:44:51.408Z","dependency_job_id":"ceaab8ed-1bde-40b4-8dae-ae4c25522ad8","html_url":"https://github.com/miku/solrbulk","commit_stats":null,"previous_names":[],"tags_count":36,"template":false,"template_full_name":null,"purl":"pkg:github/miku/solrbulk","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fsolrbulk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fsolrbulk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fsolrbulk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fsolrbulk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/miku","download_url":"https://codeload.github.com/miku/solrbulk/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fsolrbulk/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271470280,"owners_count":24765363,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-21T02:00:08.990Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code4lib","indexing","solr"],"created_at":"2024-11-24T23:12:41.028Z","updated_at":"2025-08-24T22:05:52.615Z","avatar_url":"https://github.com/miku.png","language":"Go","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"readme":"# solrbulk\n\nMotivation:\n\n\u003e Sometimes you need to index a bunch of documents really, really fast.\n  Even with Solr 4.0 and soft commits, if you send one document at a time\n  you will be limited by the network. The solution is two-fold: batching\n  and multi-threading. http://lucidworks.com/blog/high-throughput-indexing-in-solr/\n\nsolrbulk expects as input a file with [line-delimited\nJSON](https://en.wikipedia.org/wiki/JSON_Streaming#Line-delimited_JSON). Each\nline represents a single document. solrbulk takes care of reformatting the\ndocuments into the bulk JSON format, that [SOLR\nunderstands](https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-JSONFormattedIndexUpdates).\n\nsolrbulk will send documents in batches and in parallel. The number of\ndocuments per batch can be set via `-size`, the number of workers with `-w`.\n\n[![Project Status: Active – The project has reached a stable, usable state and\nis being actively\ndeveloped.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n![GitHub All Releases](https://img.shields.io/github/downloads/miku/solrbulk/total.svg)\n\nThis tool has been developed for [project finc](https://finc.info) at [Leipzig\nUniversity Library](https://ub.uni-leipzig.de).\n\n## Installation\n\nInstallation via Go tools.\n\n    $ go install github.com/miku/solrbulk/cmd/solrbulk@latest\n\nThere are also DEB and RPM packages\navailable at [https://github.com/miku/solrbulk/releases/](https://github.com/miku/solrbulk/releases/).\n\n## Usage\n\nFlags.\n\n    $ solrbulk\n    Usage of solrbulk:\n      -auth string\n            username:password pair for basic auth\n      -commit int\n            commit after this many docs (default 1000000)\n      -cpuprofile string\n            write cpu profile to file\n      -memprofile string\n            write heap profile to file\n      -no-final-commit\n            omit final commit\n      -optimize\n            optimize index\n      -purge\n            remove documents from index before indexing (use purge-query to selectively clean)\n      -purge-pause duration\n            insert a short pause after purge (default 2s)\n      -purge-query string\n            query to use, when purging (default \"*:*\")\n      -server string\n            url to SOLR server, including host, port and path to collection, e.g. http://localhost:8983/solr/biblio\n      -size int\n            bulk batch size (default 1000)\n      -update-request-handler-name string\n            where solr.UpdateRequestHandler is mounted on the server, https://is.gd/s0eirv (default \"/update\")\n      -v\tprints current program version\n      -verbose\n            output basic progress\n      -w int\n            number of workers to use (default 8)\n      -z\tunzip gz'd file on the fly\n\n## Example\n\nGiven a [newline delimited JSON](http://jsonlines.org/) file:\n\n    $ cat file.ldj\n    {\"id\": \"1\", \"state\": \"Alaska\"}\n    {\"id\": \"2\", \"state\": \"California\"}\n    {\"id\": \"3\", \"state\": \"Oregon\"}\n    ...\n\n    $ solrbulk -verbose -server https://192.168.1.222:8085/collection1 file.ldj\n\nThe server parameter contains host, port and path up to, but excluding the\ndefault [*update*\nroute](https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html)\nfor search (since 0.3.4, this can be adjusted via\n`-update-request-handler-name` flag).\n\nFor example, if you usually update via `https://192.168.1.222:8085/solr/biblio/update` the server parameter would be:\n\n    $ solrbulk -server https://192.168.1.222:8085/solr/biblio file.ldj\n\n## Some performance observations\n\n* Having as many workers as core is generally a good idea. However the returns seem to diminish fast with more cores.\n* Disable `autoCommit`, `autoSoftCommit` and the transaction log in `solrconfig.xml`.\n* Use some high number for `-commit`. solrbulk will issue a final commit request at the end of the processing anyway.\n* For some use cases, the bulk indexing approach is about twice as fast as a standard request to `/solr/update`.\n* On machines with more cores, try to increase [maxIndexingThreads](https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig).\n\n## Elasticsearch?\n\nTry [esbulk](https://github.com/miku/esbulk).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiku%2Fsolrbulk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmiku%2Fsolrbulk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiku%2Fsolrbulk/lists"}