{"id":18710731,"url":"https://github.com/apify/actor-algolia-website-indexer","last_synced_at":"2025-11-03T16:32:56.411Z","repository":{"id":42990515,"uuid":"203551953","full_name":"apify/actor-algolia-website-indexer","owner":"apify","description":"Apify actor that crawls website and indexes selected web pages to Algolia index. It's used to power the search on https://help.apify.com","archived":false,"fork":false,"pushed_at":"2022-12-11T02:49:10.000Z","size":297,"stargazers_count":1,"open_issues_count":10,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2023-03-09T04:01:25.477Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apify.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-21T09:31:07.000Z","updated_at":"2022-10-04T05:27:31.000Z","dependencies_parsed_at":"2023-01-26T14:31:21.214Z","dependency_job_id":null,"html_url":"https://github.com/apify/actor-algolia-website-indexer","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-algolia-website-indexer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-algolia-website-indexer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-algolia-website-indexer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-algolia-website-indexer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apify","download_url":"https://codeload.github.com/apify/actor-algolia-website-indexer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223515369,"owners_count":17158361,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T12:35:27.204Z","updated_at":"2025-11-03T16:32:56.381Z","avatar_url":"https://github.com/apify.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Algolia Website Indexer\n\nThe Indexer crawls a website using the Puppeteer browser (headless Chrome) and indexes the selected pages to the Algolia index.\nIt was designed to run in an Apify actor.\n\n## Usage\n\nYou can find instructions on how to run it in the Apify cloud on its Apify Store page.\nIf you want to run it in your environment, you can use the Apify CLI.\n\n## Input\n\nThe input of the actor is JSON with the following parameters.\n\n| Field | Type | Description |\n| ----- | ---- | ----------- |\n| algoliaAppId | String | Your Algolia Application ID |\n| algoliaApiKey | String | Your Algolia API key |\n| algoliaIndexName | String | Your Algolia index name  |\n| crawlerName | String | Crawler name, it updates/removes/adds pages into the index regarding this name. In this case, you can have more websites in the index. |\n| startUrls | Array | URLs where crawler starts crawling |\n| selectors | Array | Selectors, which text content you want to index. Key is name of the attribute and value is the CSS selector.  |\n| waitForElement | String | Selector of an element to wait on each page. |\n| additionalPageAttrs | Object | Additional attributes you want to attach to each record in the index. |\n| skipIndexUpdate | Boolean | Option to switch off updating the Algolia index. |\n\n### Advanced\n\nThere are a few parameters not shown in the UI. These parameters change the behaviour of crawling, and you can set them up using the API or in the local environment.\n\n| Field | Type | Description |\n| ----- | ---- | ----------- |\n| pageFunction | String | Overrides default pageFunction |\n| pseudoUrls | Array | Overrides default pseudoUrls |\n| clickableElements | String | Overrides default clickableElements |\n| keepUrlFragment | Boolean | Option to switch on enqueueing URL with URL fragments |\n| omitSearchParamsFromUrl | Boolean | Option to switch off enqueueing with search params. |\n\n## Debug indexed pages\n\nYou can find all the pages that will be indexed in the default dataset for a specific actor run.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapify%2Factor-algolia-website-indexer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapify%2Factor-algolia-website-indexer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapify%2Factor-algolia-website-indexer/lists"}