{"id":13838291,"url":"https://github.com/typesense/typesense-docsearch-scraper","last_synced_at":"2025-04-04T14:06:58.022Z","repository":{"id":43289521,"uuid":"327114908","full_name":"typesense/typesense-docsearch-scraper","owner":"typesense","description":"A fork of Algolia's awesome DocSearch Scraper, customized to index data in Typesense (an open source alternative to Algolia)","archived":false,"fork":false,"pushed_at":"2025-03-18T00:54:15.000Z","size":979,"stargazers_count":114,"open_issues_count":21,"forks_count":44,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T13:07:59.150Z","etag":null,"topics":["algolia","docsearch","documentation","search","typesense"],"latest_commit_sha":null,"homepage":"https://typesense.org/docs/guide/docsearch.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/typesense.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-05T20:44:09.000Z","updated_at":"2025-03-22T15:04:56.000Z","dependencies_parsed_at":"2023-10-12T13:00:32.766Z","dependency_job_id":"75832573-2730-4cb7-bfaa-4a7ce403ee54","html_url":"https://github.com/typesense/typesense-docsearch-scraper","commit_stats":null,"previous_names":[],"tags_count":39,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/typesense%2Ftypesense-docsearch-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/typesense%2Ftypesense-docsearch-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/typesense%2Ftypesense-docsearch-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/typesense%2Ftypesense-docsearch-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/typesense","download_url":"https://codeload.github.com/typesense/typesense-docsearch-scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247190250,"owners_count":20898702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algolia","docsearch","documentation","search","typesense"],"created_at":"2024-08-04T15:01:48.697Z","updated_at":"2025-04-04T14:06:58.002Z","avatar_url":"https://github.com/typesense.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Typesense DocSearch scraper\n\n\u003ca href=\"https://hub.docker.com/r/typesense/docsearch-scraper/tags\"\u003e\u003cimg src=\"https://img.shields.io/docker/pulls/typesense/docsearch-scraper\"\u003e\u003c/a\u003e\n\nThis is a maintained fork of Algolia's awesome [DocSearch Scraper](https://github.com/algolia/docsearch-scraper), customized to index data in [Typesense](https://typesense.org). \n\nYou'd typically setup this scraper to run on your documentation site, and then use [typesense-docsearch.js](https://github.com/typesense/typesense-docsearch.js) to add a search bar to your site. \n\n#### What is Typesense? \n\nIf you're new to Typesense, it is an **open source** search engine that is simple to use, run and scale, with clean APIs and documentation. \n\nThink of it as an open source alternative to Algolia and an easier-to-use, batteries-included alternative to ElasticSearch. Get a quick overview from [this guide](https://typesense.org/guide/).\n\n## Usage\n\nRead detailed step-by-step instructions on how to configure and setup the scraper on Typesense's dedicated documentation site: https://typesense.org/docs/guide/docsearch.html\n\n## Changelog\n\nWe use git tags to identify every release. \n\nSo to view the changelog for a release, you can compare tags using a GitHub link like this:\n\n[https://github.com/typesense/typesense-docsearch-scraper/compare/0.8.0...0.9.0](https://github.com/typesense/typesense-docsearch-scraper/compare/0.8.0...0.9.0).\n\nRemember to change the version numbers in the URL as needed. \n\n## Compatibility\n\n| typesense-docsearch-scraper | typesense-server |\n| --- | --- |\n| 0.5.0 | \u003e= 0.22.1 |\n| 0.4.x and below | \u003e= 0.21.0  |\n\n## Development Workflow\n\nThis section only applies if you're making changes to this scraper itself. If you only need to run the scraper, see Usage instructions above.\n\n#### Running the code locally\n\n```shellsession\n$ pipenv shell\n$ ./docsearch run configs/public/typesense_docs.json\n```\n\n#### Releasing a new version\n\nBasic/abbreviated instructions:\n\n```shellsession\n$ pipenv shell\n$ ./docsearch docker:build\n$ git tag -a 0.2.1 -m \"0.2.1\"\n$ ./docsearch deploy:scraper\n$ git push --follow-tags\n```\n\nDetailed instructions starting from a fresh Ubuntu Server 22.02:\n\n```bash\n# Install Docker:\n# https://docs.docker.com/engine/install/ubuntu/\nsudo apt update\nsudo apt remove docker docker-engine docker.io containerd runc --yes\nsudo apt install \\\n    ca-certificates \\\n    curl \\\n    gnupg \\\n    lsb-release \\\n    --yes\nsudo mkdir -m 0755 -p /etc/apt/keyrings\ncurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg\necho \\\n  \"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \\\n  $(lsb_release -cs) stable\" | sudo tee /etc/apt/sources.list.d/docker.list \u003e /dev/null\nsudo apt update\nsudo apt install \\\n  docker-ce \\\n  docker-ce-cli \\\n  containerd.io \\\n  docker-buildx-plugin \\\n  docker-compose-plugin \\\n  --yes\nsudo docker run hello-world\n\n# Run Docker as a non-root user:\n# https://www.digitalocean.com/community/questions/how-to-fix-docker-got-permission-denied-while-trying-to-connect-to-the-docker-daemon-socket\nsudo usermod -aG docker ${USER}\nexit\n# (Relogin.)\ndocker run hello-world\n\n# Install dependencies for pyenv:\n# https://github.com/pyenv/pyenv/wiki#suggested-build-environment\nsudo apt update\nsudo apt install \\\n  build-essential \\\n  curl \\\n  libbz2-dev \\\n  libffi-dev \\\n  liblzma-dev \\\n  libncursesw5-dev \\\n  libreadline-dev \\\n  libsqlite3-dev \\\n  libssl-dev \\\n  libxml2-dev \\\n  libxmlsec1-dev \\\n  llvm \\\n  make \\\n  tk-dev \\\n  wget \\\n  xz-utils \\\n  zlib1g-dev \\\n  --yes\n\n# Install pyenv:\n# https://github.com/pyenv/pyenv#automatic-installer\ncurl https://pyenv.run | bash\n\n# Add pyenv to path:\necho \u003e\u003e ~/.bashrc\necho '# Adding pyenv' \u003e\u003e ~/.bashrc\necho 'export PYENV_ROOT=\"$HOME/.pyenv\"' \u003e\u003e ~/.bashrc\necho 'command -v pyenv \u003e/dev/null || export PATH=\"$PYENV_ROOT/bin:$PATH\"' \u003e\u003e ~/.bashrc\necho 'eval \"$(pyenv init -)\"' \u003e\u003e ~/.bashrc\nsource ~/.bashrc\n\n# Install Python 3.11 inside pyenv:\npyenv install 3.11\n\n# Set the active version of Python:\npyenv local 3.11\n\n# Upgrade pip:\npip install --upgrade pip\n\n# Install pipenv:\npip install --user pipenv\n\n# There will be a warning:\n# \"The script virtualenv-clone is installed in '/home/[username]/.local.bin' which is not on PATH.\"\n# Fix the warning by adding it to the PATH:\necho \u003e\u003e ~/.bashrc\necho '# Fixing pip warning' \u003e\u003e ~/.bashrc\necho 'PATH=$PATH:~/.local/bin' \u003e\u003e ~/.bashrc\nsource ~/.bashrc\n\n# Ensure that you are in the \"typesense-docsearch-scraper\" directory.\n# Then, install the Python dependencies for this project:\npipenv --python 3.11\npipenv lock --clear\npipenv install\n\n# Then, open a shell with with the Python environment:\npipenv shell\n\n# Enable containerd image store in Docker Engine: https://docs.docker.com/engine/storage/containerd/\n# This allows to build cross-platform images below\n# Add the following to\n# /etc/docker/daemon.json\n# {\n#  \"features\": {\n#     \"containerd-snapshotter\": true\n#  }\n# }\n# sudo systemctl restart docker\n\n# The following should say containerd, if not follow instructions above\ndocker info -f '{{ .DriverStatus }}'\n\n# Build a new version of the base Docker container - ONLY NEEDED WHEN WE CHANGE DEPENDENCIES\nexport SCRAPER_BASE_VERSION=\"0.9.0\" # Only need to change this when we update dependencies\ndocker buildx use typesense-builder || docker buildx create --name typesense-builder --driver docker-container --use --bootstrap # use same buildx context for all containers to build\ndocker buildx build --platform linux/amd64,linux/arm64 --load -f ./scraper/dev/docker/Dockerfile.base -t typesense/docsearch-scraper-base:${SCRAPER_BASE_VERSION} .\ndocker push typesense/docsearch-scraper-base:${SCRAPER_BASE_VERSION}\ndocker tag typesense/docsearch-scraper-base:${SCRAPER_BASE_VERSION} typesense/docsearch-scraper-base:latest\ndocker push typesense/docsearch-scraper-base:latest\n\n# Build a new version of the scraper Docker container\nexport SCRAPER_VERSION=\"0.11.0.rc1\"\nexport SCRAPER_BASE_VERSION=\"latest\"\ndocker buildx use typesense-builder || docker buildx create --name typesense-builder --driver docker-container --use --bootstrap # use same buildx context for all containers to build\ndocker buildx build --platform linux/amd64,linux/arm64 --load -f ./scraper/dev/docker/Dockerfile --build-arg SCRAPER_BASE_VERSION=${SCRAPER_BASE_VERSION} -t typesense/docsearch-scraper:${SCRAPER_VERSION} .\ndocker push typesense/docsearch-scraper:${SCRAPER_VERSION}\ndocker tag typesense/docsearch-scraper:${SCRAPER_VERSION} typesense/docsearch-scraper:latest\ndocker push typesense/docsearch-scraper:latest\n\n# Add a new Git tag.\ngit tag -a \"${SCRAPER_VERSION}\" -m \"${SCRAPER_VERSION}\"\n\n# Sync with GitHub.\ngit push --follow-tags\n\n\n```\n\n## Help\n\nIf you have any questions or run into any problems, please create a Github issue and we'll try our best to help.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftypesense%2Ftypesense-docsearch-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftypesense%2Ftypesense-docsearch-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftypesense%2Ftypesense-docsearch-scraper/lists"}