{"id":21426100,"url":"https://github.com/bcdh/exist-algolia-index","last_synced_at":"2026-05-03T10:05:32.492Z","repository":{"id":57739381,"uuid":"73559611","full_name":"BCDH/exist-algolia-index","owner":"BCDH","description":"Uses eXist-db's internal mechanisms to upload and sync indexes with Algolia's cloud services","archived":false,"fork":false,"pushed_at":"2023-12-21T21:30:58.000Z","size":236,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-04-24T12:26:46.719Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BCDH.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-11-12T15:07:25.000Z","updated_at":"2023-08-14T13:14:48.000Z","dependencies_parsed_at":"2023-12-21T22:28:46.147Z","dependency_job_id":"b76a1bef-1095-4732-a643-970d13e15607","html_url":"https://github.com/BCDH/exist-algolia-index","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BCDH%2Fexist-algolia-index","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BCDH%2Fexist-algolia-index/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BCDH%2Fexist-algolia-index/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BCDH%2Fexist-algolia-index/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BCDH","download_url":"https://codeload.github.com/BCDH/exist-algolia-index/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225970066,"owners_count":17553391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-22T21:39:56.858Z","updated_at":"2026-05-03T10:05:32.486Z","avatar_url":"https://github.com/BCDH.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# eXist-db Indexer for Algolia\n\n![Build Status](https://github.com/BCDH/exist-algolia-index/actions/workflows/ci.yml/badge.svg) ![Java 17+](https://img.shields.io/badge/java-17%2B-007396.svg) ![eXist-db 6.4.1](https://img.shields.io/badge/eXist--db-6.4.1-6f42c1.svg) [![License GPL 3](https://img.shields.io/badge/license-GPL%203-blue.svg)](https://www.gnu.org/licenses/gpl-3.0.html)\n\neXist Indexer for Algolia is a configurable index plug-in for the [eXist-db](https://github.com/eXist-db/exist) native XML database. It uses eXist's own indexing mechanisms to create, upload and incrementally sync local indexes with [Algolia's](http://www.algolia.com) cloud services.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://i.imgur.com/yqIlRI0.png\"\u003e\n  \u003cspan style=\"color:gray; font-size:0.8rem\"\u003eExample deployment: autocomplete search on \u003ca href=\"http://raskovnik.org\"\u003ehttp://raskovnik.org\u003c/a\u003e\u003c/span\u003e\n\u003c/p\u003e\n\n## Installation\n\nThis README covers the build, manual installation, and general configuration of the plugin.\n\n### Build\n\nRequirements: Java 17, `sbt`.\n\n```bash\nsbt assembly\n```\n\nThe assembly is written to:\n\n```bash\ntarget/scala-2.13/exist-algolia-index-assembly-\u003cversion\u003e.jar\n```\n\n### Manual install\n\nThe plugin JAR must be built and then installed into eXist manually.\n\n1. Build the assembly:\n\n   ```bash\n   sbt assembly\n   ```\n\n2. Copy the resulting JAR into eXist's plugin/library directory.\n\n3. Add the Algolia module to `conf.xml` inside `indexer/modules`:\n\n   ```xml\n   \u003cmodule id=\"algolia-index\"\n       class=\"org.humanistika.exist.index.algolia.AlgoliaIndex\"\n       application-id=\"YOUR-ALGOLIA-APPLICATION-ID\"\n       admin-api-key=\"YOUR-ALGOLIA-ADMIN-API-KEY\"\n       batch-size=\"1000\"/\u003e\n   ```\n\n4. Add the dependency entry to `startup.xml`:\n\n   ```xml\n   \u003cdependency\u003e\n       \u003cgroupId\u003eorg.humanistika.exist.index.algolia\u003c/groupId\u003e\n       \u003cartifactId\u003eexist-algolia-index\u003c/artifactId\u003e\n       \u003cversion\u003eVERSION_FROM_VERSION_SBT\u003c/version\u003e\n       \u003crelativePath\u003eexist-algolia-index-assembly-VERSION_FROM_VERSION_SBT.jar\u003c/relativePath\u003e\n   \u003c/dependency\u003e\n   ```\n\n5. Restart eXist.\n\n6. Reindex the configured collections so already-present data is pushed into Algolia.\n   The correct reindex target depends on your own collection structure. Reindex the collection or subcollection whose `collection.xconf` contains the Algolia index configuration.\n\n## Configuration\n\nFor a single collection in eXist, you can put data into one or more indexes in Algolia, just create an \"index\" element inside the \"algolia\" element for each index and give it the name of the Algolia index, if the index doesn't exist in Algolia it will be automatically created for you.\n\nFor incremental indexing to work, you need to have two sets of unique ids, one for each document in the collection (documentId) and one for each rootObject (nodeId).\n\nAlgolia writes are sent in batches. The global `batch-size` module attribute defaults to `1000` operations per request. A collection-level `\u003cindex\u003e` can override it with `batchSize` if a specific Algolia index needs smaller or larger chunks.\n\n```xml\n\u003ccollection xmlns=\"http://exist-db.org/collection-config/1.0\"\u003e\n    \u003cindex\u003e\n        \u003calgolia\u003e\n            \u003cnamespaceMappings\u003e\n                \u003cnamespaceMapping\u003e\n                    \u003cprefix\u003exml\u003c/prefix\u003e\n                    \u003cnamespace\u003ehttp://www.w3.org/XML/1998/namespace\u003c/namespace\u003e\n                \u003c/namespaceMapping\u003e\n            \u003c/namespaceMappings\u003e\n            \u003cindex name=\"my-algolia-index-1\" documentId=\"/path/to/unique-id/@xml:id\" visibleBy=\"/path/to/unique-id\" batchSize=\"1000\"\u003e\n                \u003crootObject path=\"/path/to/element\" nodeId=\"@xml:id\"\u003e\n                    \u003cattribute name=\"f1\" path=\"/further/patha\"/\u003e\n                    \u003cattribute name=\"f2\" path=\"/further/pathb\" type=\"integer\"/\u003e\n                    \u003cobject name=\"other\" path=\"/further/pathc\"\u003e\n                        \u003cmap path=\"/x\" type=\"boolean\"/\u003e\n                   \u003c/object\u003e\n                \u003c/rootObject\u003e\n            \u003c/index\u003e\n        \u003c/algolia\u003e\n    \u003c/index\u003e\n\u003c/collection\u003e\n```\n\nAn optional `visibleBy` attribute can be used to restrict data access when searching the Algolia index.\n\nA `rootObject` is equivalent to an object inside an Algolia Index. We create one \"rootObject\" either for each document, or document fragment (if you specify a path attribute on the rootObject).\n\nAn `attribute` (represents a JSON object attribute, not to be confused with an XML attribute) is a simple key/value pair that is extracted from the XML and placed into the Algolia object (\"rootObject\" as we call it). All of the text nodes or attribute values indicated by the \"path\" on the \"attribute\" element will be serialized to a string (and then converted if you set an explicit \"type\" attribute).\n\nThe path for an \"attribute\" may point to either an XML element or XML attribute node. Paths must be simple, you can use namespace prefixes in the path, but you must also set the namespaceMappings element in the `collection.xconf`.\n\nThe XML Schema file [exist-algolia-index-config.xsd](https://github.com/BCDH/exist-algolia-index/blob/master/src/main/resources/xsd/exist-algolia-index-config.xsd) defines and documents the index configuration.\n\nAn `object` represents a JSON object, and this is where things become fun, we basically serialize the XML node pointed to by the \"path\" attribute on the \"object\" element to a JSON equivalent. This allows you to create highly complex and structured objects in the Algolia index from your XML.\n\nThe `name` attribute that is available on the \"attribute\" and \"object\" elements allows you to set the name of the field in the JSON object of the Algolia index, this means that name names of your data fields can be different in Algolia to eXist if you wish.\n\n### Reindexing Existing Data\n\nInstalling or updating the plugin does not by itself upload already-present XML documents to Algolia. After installation, reindex each configured collection in eXist so the configured `rootObject`s are serialized and pushed to Algolia.\n\nIn general:\n\n- reindex the full configured collection for a first-time backfill\n- reindex a narrower subcollection if your deployment replaced only part of the XML corpus and that subcollection has the relevant Algolia collection config\n- avoid reindexing broad parent collections unless they are the intended scope of the Algolia configuration\n\n### Indexing Status\n\nThe plugin writes deployment-readable indexing status to `algolia-index/status.json` under eXist's configured data directory. The status records are keyed by Algolia index and collection path where a collection is known.\n\nStatus states:\n\n- `current`: the latest tracked operation for that index or collection completed successfully\n- `degraded`: Algolia rejected or failed a terminal operation such as a batch write, document delete, collection delete, or index drop\n- `stale_local_store`: the plugin could not derive collection-delete object IDs from the local Algolia store, usually because the collection was removed before a successful backfill created local state\n\nThe local and staging helper scripts fail verification when `status.json` contains `degraded` or `stale_local_store` records. Resolve those states before treating a deployment as successful. In practice, check the failure message in `status.json` and the Algolia/eXist logs, then retry the targeted reindex or run a wider backfill if the local store is missing the needed collection state.\n\n### Limiting Object Access\n\nYou can limit data access by setting the `visibleBy` attribute in `collection.xconf` and mapping it to the corresponding path in your XML data, preferably in the document header.\n\nSee the test fixture examples:\n\n- XML: [VSK.TEST.xml](https://github.com/BCDH/exist-algolia-index/tree/master/src/test/resources/integration/user-specified-visibleBy/VSK.TEST.xml)\n- Configuration: [collection.xconf](https://github.com/BCDH/exist-algolia-index/tree/master/src/test/resources/integration/user-specified-visibleBy/collection.xconf)\n\n## Enable logging in eXist (optional)\n\nYou can see what we are sending to Algolia by adding the following to your `$EXIST_HOME/log4j2.xml` file:\n\nAdd this as a child of the `\u003cAppenders\u003e` element:\n\n```xml\n\u003cRollingRandomAccessFile name=\"algolia.index\"\n        filePattern=\"${logs}/algolia-index.${rollover.file.pattern}.log.gz\"\n        fileName=\"${logs}/algolia-index.log\"\u003e\n    \u003cPolicies\u003e\n        \u003cSizeBasedTriggeringPolicy size=\"${rollover.max.size}\"/\u003e\n    \u003c/Policies\u003e\n    \u003cDefaultRolloverStrategy max=\"${rollover.max}\"/\u003e\n    \u003cPatternLayout pattern=\"${exist.file.pattern}\"/\u003e\n\u003c/RollingRandomAccessFile\u003e\n```\n\nAnd add this as a child of the `\u003cLoggers\u003e` element:\n\n```xml\n\u003cLogger name=\"org.humanistika.exist.index.algolia\" additivity=\"false\" level=\"trace\"\u003e\n    \u003cAppenderRef ref=\"algolia.index\"/\u003e\n\u003c/Logger\u003e\n```\n\nThe log output will then appear in eXist's configured log directory, usually `logs/algolia-index.log` under the active eXist home or container layout, the next time eXist is started.\n\n## Current limitations\n\nWhen you back up eXist, you should also back up the `algolia-index` directory inside eXist's configured data directory, because it holds the local representation of what is stored on the remote Algolia server. Support for integrating that local store into a native backup/restore workflow may be added later.\n\n## Acknowledgements\n\nHats off to [Adam Retter](https://github.com/adamretter) for sharing his superb programming skills with us in this project.\n\nThis tool was developed in the context of ongoing work at [BCDH](http://www.humanistika.org), including Raskovnik, a Serbian dictionary platform built together with the Institute of Serbian Language.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbcdh%2Fexist-algolia-index","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbcdh%2Fexist-algolia-index","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbcdh%2Fexist-algolia-index/lists"}