{"id":13651147,"url":"https://github.com/jermdavis/SearchIndexBuilder","last_synced_at":"2025-04-22T22:30:34.631Z","repository":{"id":38048523,"uuid":"211378644","full_name":"jermdavis/SearchIndexBuilder","owner":"jermdavis","description":"A tool for rebuilding search indexes from outside the Sitecore web app - for very long-running builds...","archived":false,"fork":false,"pushed_at":"2022-12-08T06:14:39.000Z","size":89,"stargazers_count":4,"open_issues_count":3,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-01-25T00:16:12.797Z","etag":null,"topics":["sitecore"],"latest_commit_sha":null,"homepage":null,"language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jermdavis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-27T18:23:31.000Z","updated_at":"2021-11-10T06:26:59.000Z","dependencies_parsed_at":"2023-01-24T16:19:22.342Z","dependency_job_id":null,"html_url":"https://github.com/jermdavis/SearchIndexBuilder","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jermdavis%2FSearchIndexBuilder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jermdavis%2FSearchIndexBuilder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jermdavis%2FSearchIndexBuilder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jermdavis%2FSearchIndexBuilder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jermdavis","download_url":"https://codeload.github.com/jermdavis/SearchIndexBuilder/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250333862,"owners_count":21413470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["sitecore"],"created_at":"2024-08-02T02:00:45.715Z","updated_at":"2025-04-22T22:30:34.260Z","avatar_url":"https://github.com/jermdavis.png","language":"C#","funding_links":[],"categories":["Content Search"],"sub_categories":[],"readme":"\u003cpre\u003e\n  _____                     _     _____           _           \n / ____|   Sitecore        | |   |_   _|         | |          \n| (___   ___  __ _ _ __ ___| |__   | |  _ __   __| | _____  _ \n \\___ \\ / _ \\/ _` | '__/ __| '_ \\  | | | '_ \\ / _` |/ _ \\ \\/ /\n ____) |  __/ (_| | | | (__| | | |_| |_| | | | (_| |  __/\u003e  \u003c \n|_____/ \\___|\\__,_|_|  \\___|_| |_|_____|_| |_|\\__,_|\\___/_/\\_\\\n                                                   Builder    \n\u003c/pre\u003e\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n\nIf you've dealt with older Sitecore projects that use large search indexes, then you've almost certainly hit the\nissue of \"My search index rebuild takes so long, that the IIS process recycles before it finishes\"...\n\nThis tool tries to help with that by managing the indexing operation from outside the ASP.Net website process. If\nsomething causes the web app to recycle, this tool will detect the error and back off before retrying and continuing\nthe process. You can also stop the process and restart it later if necessary.\n\nIt will also try to manage errors raised by the Sitecore indexing process - but this behaviour is somewhat limited\nby the data returned from an indexing job by Sitecore. As far as I can tell, most internal failures return a message\nthat still looks like success - even if, say, a computed field threw an exception. So you will need to check\nyour crawler log to investigate whether any errors which were unreported by Sitecore occurred.\n\nThis hasn't been exhaustively tested, as it was something I hacked together to help with a work problem. But it's been\ntried against both Solr and Lucene indexes, with Sitecore v7.1, v7.2 \u0026 v9.0 - but in theory it should work with V7.0 and up.\n\nGrab a [release](/jermdavis/SearchIndexBuilder/releases) and then make use of the options it provides...\n\n## Step 1: Deploying the endpoint\n\nThe first step in running the tool is to deploy the special endpoint it uses into your sitecore application. The tool can\ndo this with the `Deploy` verb:\n\n`SearchIndexBuilder.exe deploy -w \u003cyour website folder\u003e [-o] [-t \u003ctoken\u003e]` \n\nThe parameters are:\n\n* `-w` / `--website` (Required, string) : The Sitecore website's web root folder. This is where the endpoint file will be deployed to.\n  Remember to put quotes around this string if it includes spaces. e.g. `-w \"c:\\inetpub\\wwroot\\mysite\\Website\"`\n* `-o` / `--overwrite` (Optional) : By default the tool will not overwrite an existing endpoint file if one is found. If you do want\n  to overwrite the existing file, add this parameter.\n* `-t` / `--token` (Optional, string) : To do anything, requests to the endpoint file must include a security token. By default a\n  random Guid will be used. But if you want to specify your own, then pass it with this parameter. Remember to use quotes if it includes\n  spaces. The value used will be echoed to the screen, as you will need it in the next step. e.g. `-t MySuperSecretToken94!`\n\nYou must complete this step before proceeding.\n\n## Step 2: Setting up some config\n\nTo run an indexing job, the tool relies on a JSON file which specifies job configuration and settings. You can write this file manually if\nyou want to, but the tool will generate it for you using the `Setup` verb.\n\n`SearchIndexBuilder.exe setup -u \u003curl of the endpoint\u003e -d \u003cdatabase\u003e -t \u003ctoken\u003e [-q \u003cquery for items\u003e] [-c \u003cconfig file name\u003e] [-o]`\n\nThe parameters are:\n\n* `-u` / `--url` (Required, string) : The base url for the website you want the tool to talk to. It should be the web path that matches the\n  website folder you specified in the deploy step above. You do not need to specify the name of the endpoint file. That will be added by\n  tool. e.g. `-u https://mysite.domain.com/`\n* `-d` / `--database` (Required, string) : The Sitecore database name that you want to extract item data from. When the tool generates the\n  list of items to process it will use this for the query. e.g `-d web`\n* `-t` / `--token` (Required, string) : You must supply the same security token that you set up above when you deployed the endpoint. If you\n  you supplied your own you can just type it here. If you let the tool generate one, you will need to copy it from the output that the\n  previous step generated. e.g. `-t MySuperSecretToken94!`\n* `-q` / `--query` (Optional, string) : By default, the tool will reindex all the items in the database you specify. If you omit this parameter the code\n  will go directly to the underlying Items database table for maximum performance. If you specify a query then this will be run against the Sitecore\n  database object for the database name you provided. That means that a query will have more of a performance hit on the target server. Take care when\n  using this option, as a query like `\\\\*` will potentially process many thousands of items. e.g. `-q \"/sitecore/Content/*//[@@templatename='Homepage']\"`\n* `-c` / `--config` (Optional, string) : By default the tool will write the results of this operation to a file called `config.json` in the current folder.\n  If you want to write to a different name or location, specify it with this parameter. e.g. `-c mySite.json`\n* `-o` / `--overwrite` (Optional) : By default the tool will not overwrite an existing config file if one is found. If you do want\n  to overwrite the existing file, add this parameter.\n* `-t` / `--timeout` (Optional, integer) : The default timeout for HTTP operations with Sitecore is 60 seconds. You can specify a longer timeout (in seconds) using this flag.\n\nThe config file will include all the Sitecore indexes defined on your site by default. If you only want to build certain indexes, use a text editor to\nremove the unwanted ones from the JSON data. Just remember not to break the format of the file.\n\nYou can use the `-z` parameter from the global options below to change the file format at this point. The compressed formats are useful for large config files\nwhen you don't have a lot of disk space to play with, as the files tend to compress by 50-75%. However you cannot easily edit the files in these formats. If\nyou want to make changes before running processing, use the `convert` verb instead.\n\n## Step 3.5: Converting the format of a config file\n\nIf you want to convert a config file from one format to another, you can make use of the `convert` verb.\n\n`SearchIndexBuilder.exe convert -s \u003cconfig file\u003e -t \u003cconfig file\u003e -f \u003cformat\u003e`\n\nThe parameters are:\n\n* `-s` / `--source` (Required, string): The source config file to read in.\n* `-t` / `--target` (Required, string): The target config file to write out in the new format.\n* `-w` / `--writeformat` (Required, string): The format to use for the target config file. One of `Text`, `Archive` or `GZip`.\n* `-o` / `--overwrite` (Optional) : If the target config file exists, should it be overwritten?\n\nThe code will try to determine the format of the source file using it's extension, or you can override this using the `-z` global option. The\ntarget file format is set by the `-w` parameter.\n\nThis option exists to save disk space on constrained systems - as a zip/GZip stream will reduce a config file\nbut as much as 75% in some cases. But it does this at the expense of performance, as it takes longer to reand and write these\nfiles due to the processing for compression.\n\n## Step 3: Running an index build\n\nTo start the process of re-indexing, you use the `index` verb. This will take a config file created by the previous step, and process each of the content\nitems it specifies. Using the endpoint you've deployed, the tool will ask Sitecore to reindex each of the items, using each of the indexes\nyou have specified. \n\n`SearchIndexBuilder.exe index [-c \u003cconfig file\u003e] [-o \u003coutput Every X items\u003e] [-r \u003cretries in case of error\u003e] [-p \u003cms to pause for\u003e] [-t \u003cseconds\u003e]`\n\nThe parameters are:\n\n* `-c` / `--config` (Optional, string) : The tool will try to load configuration from a file named `config.json` in the current directoy by default. If you want to use\n  a different config file, specifiy it with this parameter. e.g. `-c ..\\testing\\mySite.json`\n* `-o` / `--outputEvery` (Optional, integer) : The code tries to estimate the time remaining for the rebuild operation by using a rolling average over the last 50 items that\n  have been processed. This flag specifies how often the estimates should be displayed on the screen. It defaults to once every 10 items processed. e.g. `-o 35`\n* `-r` / `--retries` (Optional, integer) : If an indexing call to the endpoint returns an error (either because the endpoint could not be accessed, or because Sitecore returned\n  an error) then the operation will be retried this number of times before the tool decides the error is permenant and moves on to the next item. The tool will back off an\n  increasing amount after each error. The default value is five retries. e.g. `-r 10`\n* `-p` / `--pause` (Optional, integer) : If you want to lower the impact of the indexing process on your target server then you can use this\n  parameter to add a pause between each item indexing request. The value is in milliseconds. e.g. `-p 250`\n* `-t` / `--timeout` (Optional, integer) : For convenience, you can override the timeout taken from the `setup` config above, with this optional parameter.\n\nYou can stop the tool safely with `Ctrl-C`. It will finish its current operation, and then end. The current state (specifically what items are left to process, and what errors\nhave been recorded - both transient and permenant) will be written to disk in the config file. The previous state of the config fill will be preserved in a backup file named with\nthe format `backup-\u003cdate\u003e-\u003ctime\u003e-\u003cconfig\u003e.json` so that you can revert to this previous state if necessary.\n\nTo try and help with situations where the tool fails unexpectedly, it will also write (and overwrite) a file name `RuntimeBackup-\u003cconfig\u003e.json` each time the tool outputs\nstatistics as part of the `-outputEvery` option. This is the current state of the job configuration. It will also pay attention to remaining disk space - and if it gets down\nto less than 1.25 times the size of the last backup written, the indexing job will be cancelled in order to prevent data loss due to running out of disk space.\n\nThe updated config is also saved to disk when the tool finishes normally - giving a record of items which caused problems.\n\nYou can use `-z` from the global options below to specify the file format to use. However the code will try and work out the correct format automatically,\nbased on the file extension you specify.\n\n## Step 4: Retrying errored items\n\nIf you have a config file with errors recorded in it, and you want to re-process those items, you can use the `retry` verb to generate a new config file\nfrom the processed one. It will clear the processed items, elapsed time and attempts count data, and add any errors into the items list. You can then re-run\nthe `index` verb.\n\n`SearchIndexBuilder.exe retry [-s \u003csource config file\u003e] [-t \u003ctarget config file\u003e] [-o]` \n\nThe parameters are:\n\n* `-s` / `--source` (Optional, string) : The config file you've already run, that you want to retry the errors from. Defaults to `config.json`. e.g. `-s currentSite.json`\n* `-t` / `--target` (Optional, string) : The new config file you want to save the results to. Defaults to `retry-config.json`. e.g. `-t \"retry currentSite.json\"`\n* `-o` / `--overwrite` (Optional) : If the target file exists, it can be overwritten if this flag is provided.\n\n## Step 5: Removing the endpoint\n\nOnce you're finished, you should remove the endpoint file from the target website. You can do that by just deleting the file, but the tool\ncan do this for you with the `Remove` verb.\n\n`SearchIndexBuilder.exe remove -w \u003cyour website folder\u003e`\n\nThe parameters are:\n\n* `-w` / `--website` (Required, string) : The Sitecore website's web root folder. This is where the endpoint file will be removed from.\n  Remember to put quotes around this string if it includes spaces. e.g. `-w \"c:\\inetpub\\wwroot\\mysite\\Website\"`\n\n## Global parameters\n\nThe system also supports some global parameters, which will affect all of the verbs:\n\n* `-a` / `--attach` (Optional) : Causes the processing to pause between parsing the command line options and starting any processing.\n  This allows you to attach a debugger if you need to. It will wait for a keypress, or for the debugger to attach before proceeding with execution.\n* `-f` / `--fake` (Optional) : Makes the code use a \"fake\" object as the endpoint proxy class - allowing it to process some data without a\n  connection to Sitecore being possible. It generates random results for whatever data is being processed.\n* `-z` / `--ziptype` (Optional) : Allows you to apply zip compression to the config files saved by the tool. The default option is `Text`\n  to save a raw text JSON file. You can also specify `GZip` to write a raw zip stream, or `Archive` to write a normal zip file containg\n  the config data. These options will add appropriate extensions to the config file specified.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjermdavis%2FSearchIndexBuilder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjermdavis%2FSearchIndexBuilder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjermdavis%2FSearchIndexBuilder/lists"}