{"id":19002561,"url":"https://github.com/signcl/docsearch-scraper-action","last_synced_at":"2025-04-23T13:27:11.109Z","repository":{"id":65155699,"uuid":"383156910","full_name":"signcl/docsearch-scraper-action","owner":"signcl","description":"Algolia DocSearch Scraper in Docker for GitHub Actions","archived":false,"fork":false,"pushed_at":"2023-08-29T10:22:54.000Z","size":12,"stargazers_count":17,"open_issues_count":1,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-12-01T17:16:47.762Z","etag":null,"topics":["actions","algolia","algolia-docsearch","algolia-search","docker","docusaurus","github-actions"],"latest_commit_sha":null,"homepage":"","language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/signcl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-05T13:49:50.000Z","updated_at":"2024-10-26T09:37:26.000Z","dependencies_parsed_at":"2024-06-21T14:14:51.772Z","dependency_job_id":"1db338fb-4b76-4f3e-9cf4-3cc933b82322","html_url":"https://github.com/signcl/docsearch-scraper-action","commit_stats":{"total_commits":8,"total_committers":1,"mean_commits":8.0,"dds":0.0,"last_synced_commit":"504242b8a087c976da66eaad64a95d98ff90dab5"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/signcl%2Fdocsearch-scraper-action","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/signcl%2Fdocsearch-scraper-action/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/signcl%2Fdocsearch-scraper-action/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/signcl%2Fdocsearch-scraper-action/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/signcl","download_url":"https://codeload.github.com/signcl/docsearch-scraper-action/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229430077,"owners_count":18071657,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actions","algolia","algolia-docsearch","algolia-search","docker","docusaurus","github-actions"],"created_at":"2024-11-08T18:15:26.698Z","updated_at":"2024-12-12T18:07:01.043Z","avatar_url":"https://github.com/signcl.png","language":"Makefile","funding_links":[],"categories":["Makefile"],"sub_categories":[],"readme":"# Algolia DocSearch Scraper in Docker for GitHub Actions\n\nRun self-hosted Algolia [DocSearch scraper](https://github.com/algolia/docsearch-scraper) in Docker with Github Actions\n\n- The base image can make GitHub Actions workflow faster (less than 1 min image build time) and keep the scraper up-to-date automatically thanks to Docker Hub base image auto build\n- You can get some config examples at [algolia/docsearch-configs](https://github.com/algolia/docsearch-configs)\n\n## Usage\n\nBasic usage:\n\n```yaml\n- name: Push indices to Algolia\n  uses: signcl/docsearch-scraper-action@master\n  env:\n    APPLICATION_ID: ${{ secrets.ALGOLIA_APPLICATION_ID }}\n    API_KEY: ${{ secrets.ALGOLIA_API_KEY }}\n    CONFIG: '{\"index_name\": \"docs\",\"start_urls\": [\"https://example.com/\"],\"sitemap_urls\": [\"https://example.com/sitemap.xml\"],\"sitemap_alternate_links\": true,\"stop_urls\": [],\"selectors\": {\"lvl1\": \"header h1\",\"lvl2\": \"article h2\",\"lvl3\": \"article h3\",\"lvl4\": \"article h4\",\"lvl5\": \"article h5, article td:first-child\",\"lvl6\": \"article h6\",\"text\": \"article p, article li, article td:last-child\"},\"strip_chars\": \" .,;:#\",\"custom_settings\": {\"separatorsToIndex\": \"_\",\"attributesForFaceting\": [\"language\",\"version\",\"type\",\"docusaurus_tag\"],\"attributesToRetrieve\": [\"hierarchy\",\"content\",\"anchor\",\"url\",\"url_without_anchor\",\"type\"]}}'\n```\n\nThe tricky part is how to pass `CONFIG` to the scraper. The above example won't work if your configuration contains XPath select like `ul[contains(@class,'menu__list')]`. A more elegant way is committing your config as `algolia.json` into the repository and checkout within the workflow:\n\n```yaml\n- uses: actions/checkout@v2\n\n- name: Get the content of algolia.json as config\n  id: algolia_config\n  run: echo \"config=$(cat algolia.json | jq -r tostring)\" \u003e\u003e $GITHUB_OUTPUT\n\n- name: Push indices to Algolia\n  uses: signcl/docsearch-scraper-action@master\n  env:\n    APPLICATION_ID: ${{ secrets.ALGOLIA_APPLICATION_ID }}\n    API_KEY: ${{ secrets.ALGOLIA_API_KEY }}\n    CONFIG: ${{ steps.algolia_config.outputs.config }}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsigncl%2Fdocsearch-scraper-action","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsigncl%2Fdocsearch-scraper-action","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsigncl%2Fdocsearch-scraper-action/lists"}