{"id":13555469,"url":"https://github.com/science-periodicals/web-verse","last_synced_at":"2025-04-03T08:31:16.856Z","repository":{"id":25727347,"uuid":"29164471","full_name":"science-periodicals/web-verse","owner":"science-periodicals","description":"Toolbox for deep, resilient, markup-invariant linking into HTML documents without their cooperation","archived":false,"fork":false,"pushed_at":"2022-12-08T18:01:13.000Z","size":2390,"stargazers_count":26,"open_issues_count":22,"forks_count":4,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-04-27T11:31:55.942Z","etag":null,"topics":["annotation","publishing","science"],"latest_commit_sha":null,"homepage":"https://sci.pe","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/science-periodicals.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-01-13T00:32:53.000Z","updated_at":"2022-09-21T17:01:57.000Z","dependencies_parsed_at":"2023-01-14T03:18:01.577Z","dependency_job_id":null,"html_url":"https://github.com/science-periodicals/web-verse","commit_stats":null,"previous_names":["scienceai/web-verse"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/science-periodicals%2Fweb-verse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/science-periodicals%2Fweb-verse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/science-periodicals%2Fweb-verse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/science-periodicals%2Fweb-verse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/science-periodicals","download_url":"https://codeload.github.com/science-periodicals/web-verse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246965708,"owners_count":20861912,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotation","publishing","science"],"created_at":"2024-08-01T12:03:13.802Z","updated_at":"2025-04-03T08:31:15.581Z","avatar_url":"https://github.com/science-periodicals.png","language":"JavaScript","funding_links":[],"categories":["JavaScript","publishing"],"sub_categories":[],"readme":"# Web Verse\n\n[![styled with prettier](https://img.shields.io/badge/styled_with-prettier-ff69b4.svg)](https://github.com/prettier/prettier)\n\nWeb Verse enables deep-linking into HTML text, without requiring specific coöperation from the\ncontent (such as adding `id` attributes everywhere). It can be used to generate locator keys for\ncontent inside of a page that are reasonably resilient to markup modifications as well as to edits\nto the text itself. As such, it can be used to build an annotation system for text that is likely\nto be edited over time. Obviously it is not\n[altogether unstoppable][no-power-in-the-verse] but it offers good enough resilience to\nbe used in production systems.\n\nIt was inspired by [Emphasis][nyt] by Michael Donohoe and [Ted Nelson parallel markup](https://www.xml.com/pub/a/w3j/s3.nelson.html), but leverages the [Range interface][ranges]\nand [selection object][selections].\n\nWe do not provide direct support for instance for mapping a URL's hash containing a Web Verse key\ninto a specific paragraph or the such. Rather, the expectation is that one can build one's own\npreferred annotation system (or more generally deep, resilient linking system) very easily on top\nof Web Verse.\n\nWe fingerprint a block-level element (e.g. a paragraph) by:\n\n1. Normalising the text to abstract away from markup and formatting differences.\n\n2. Breaking the text into sentences. We attempt to be smart about handling full-stops. We'll ignore\n   things like \"Dr. Who\" and a number of similar cases. It is generally enough to avoid getting\n   single word nonsense for our sentences.\n3. Taking the first and last sentences. It's OK if the first and last sentences are the same, the\n   key is still meaningful.\n4. Taking the first character from the first three words of each sentence. Words are defined as\n   tokens composed of a run of non-white-space characters.\n\nThese fingerprints [have been shown][jsconf] to provide reasonable uniqueness for reasonably-sized\ndocuments. Since it's deterministic yet not dependent on all the content, this method is tolerant to\nsmaller changes in the content. Furthermore, finding keys can take edit-distance into account, which\nenables additional resilience to change.\n\nRegions of text more specific than a block-level element can be referenced from within a block using\ncharacter ranges. For instance, in the following paragraph:\n\n\u003ccode\u003e\n`I` `a`m `a` paragraph with 2 **sentences**.\n`I` `a`m `t`he second sentence.\n\u003c/code\u003e\n\nWe can refer to the word `sentences` in the first sentence by using the range, `24-33`. Altogether\nwith the paragraph's fingerprint, this gives us an address selecting just that word of\n`IaaIat:25-33`. (Note that the text offsets are zero-based, and apply to normalised text.)\n\n## Installation\n\n`npm install web-verse`\n\n### In the browser\n\nThis is primarily a client-side library (~7k minizipped), just include the `web-verse.min.js` script\nthat comes with the distribution.\n\n### In Node\n\nWeb Verse works with Node, but you have to bring your own DOM. Currently, the best option is likely\nto be `jsdom`, but it has limitations due to it not supporting `Range`s.\n\nThe following subset of methods works with Node and `jsdom`:\n\n* `createKey()`\n* `createHash()`\n* `getScope()`, but only with a `node` argument\n* `serializeNode()`\n* `findKey()`\n* `getChildOffsets()`\n* `normalizeText()`\n* `normalizeOffset()`\n* `denormalizeOffset()`\n\nThese should normally be more than enough to carry out the sort of operations that you are likely to\nwant to do on the server (as opposed to, say, getting the user's selection and producing a link from\nit).\n\n## API\n\nWhen loaded in a Web context, Web Verse exposes itself as a global `WebVerse` object, on which the\nfollowing methods are available.\n\n### `key = WebVerse.createKey($el)`\n\nGiven an element, returns a 6-char key that summarises it for the purposes of deep, resilient\nlinking.\n\n### `result = WebVerse.findKey(targetKey, candidateKeys)`\n\nGiven a key that is being searched for, and a list of candidate keys (for instance, all the keys for\nblock elements in the document), this will return the best match it can find.\n\nThe returned object has fields for `index` (the index in `candidateKeys` that best matched), `value`\n(the value that actually matched, which may differ slightly from the `targetKey`), and `lev` (an\nindication of the Levenshtein edit distance of the match). If no match was found, all of those\nfields will be `undefined`.\n\nThe match works by first attempting an exact match, then by choosing the candidate with the smallest\nedit distance. No edit distance can be greater than or equal to 3.\n\n### `hash = WebVerse.createHash($el)`\n\nGiven an element, it will return a hash for it that is invariant to numerous markup changes inside\nof it, looking only at its normalised text content. Such a hash can also be used to generate\nresilient identifiers.\n\n### `details = WebVerse.serializeRange(range, $el)`\n\nGiven a range and optionally a scoping element (which defaults to `getScope(range)`), it will return\nthe details one needs in order to create a resilient pointer to that range. The returned object\ncontains:\n\n* `$scope`: The scoping element (which was used for key and hash generation).\n* `hash`: The hash of the scoping element, can be used as an ID that is resilient to markup and\n  white space changes but not to text edits.\n* `key`: The key for the scoping element; can *also* be used as an ID. It is resilient to markup and\n  white space changes, as well as to a certain amount of text editing; but it is less unique than\n  the `hash`.\n* `startOffset`, `endOffset`: The normalised offsets into the text for that range.\n\nSo if you were to wish to use the key+offsets fingerpint that is discussed in this README's\nintroduction in order to obtain a resilient pointer into what a given range captures, you would:\n\n```js\nvar details = WebVerse.serializeRange(range);\nvar fingerprint = details. key + ':' + details.startOffset + '-' + details.endOffset;\n```\n\n### `details = WebVerse.serializeSelection()`\n\nReturns the same details as `serializeRange()` but for the current selection. If there is no\nselection (or if it is collapsed) it returns `undefined`.\n\n### `details = WebVerse.serializeNode($node, $el)`\n\nThe same as `serializeRange()` but instead of a `Range` it uses a node, taking its own text content\nas the offsets into the given scope. If no scoping `$el` is given, it will use `getScope($node)`.\n\n### `range = WebVerse.rangeFromOffsets($scope, startOffset, endOffset)`\n\nGiven a scope and normalised start/end offsets (that you may have stored in a fingerprint), returns\na `range` object suitable to use directly on the DOM (i.e. applying to the raw content).\n\nIf you start with a fingerprint such as the `IaaIat:24-33` example you would use the `IaaIat` part\nto find the `$scope` (typically with `findKey()`) and then this method using the scope and the\noffsets. It returns a `Range` that you could wrap to highlight, etc.\n\n### `ranges = WebVerse.getRangesFromText($el, searchText)`\n\nGiven an element to scope the search in, and a string, it will find all instances of that string\n(in a normalised, white-space-invariant manner) inside the textual content of that element, and\nreturn an array of `Range` elements pointing into the matches.\n\nThis can be used to find an highlight a specific string. Or, for instance, if a user is creating a\nlink around a given string in a text this can offer the option of linking all other occurrences of\nthe same string.\n\nSince it returns `Range`s, it can be easily used with [`Range.surroundContents`][surround-contents].\n\n### `WebVerse.citeable`\n\nAn array of element `tagName`s (i.e. uppercase) that are considered acceptable scopes (block-level\nelements). You can modify this to alter Web Verse's behaviour.\n\n### `text = WebVerse.normalizeText(text)`\n\nGiven a string, returns a version normalised according to Web Verse's internal normalising\nalgorithm. This is essentially `str.trim().replace(/\\s+/g, ' ')` but with its behaviour made\nresistant to browser vagaries.\n\n### `offset = WebVerse.normalizeOffset(rawOffset, rawText)`\n\nWeb Verse hides away a lot of the complexity involved in dealing with normalised text internally but\nhaving to manipulate a DOM that has raw, unnormalised text content (obviously, without changing the\nDOM).\n\nThis method returns the offset in the normalised text equivalent to the given raw offset into the\nunadulterated text. So calling it with `4, ' a  b'` (which has the offset right before the `b`) will\nreturn `2`, since the normalised text is `a b`.\n\nThis may seem cryptic, and in many ways it is. You should only need this if you are trying to\nmanipulate the text in the same manner as Web Verse does, for instance to extend its functionality.\n\n### `offset = WebVerse.denormalizeOffset(normalisedOffset, rawText)`\n\nDoes the reverse of the previous one: given a normalised offset and the *raw* text, it will return\nthe matching raw offset.\n\n### `$element = WebVerse.getScope(range|$node)`\n\nGiven a range or a `$node`, will return the closest enclosing element that may scope it (i.e. a\nblock-level element from `citeable`). This can the range's `commonAncestorContainer` or any of its\nparents. If it goes up the tree without finding a valid candidate, it will return `undefined`.\n\n### `details = WebVerse.getOffsets(range, $el)`\n\nGiven a range and an element scope, return an object with `startOffset` and `endOffset` that are the\noffsets into the normalised text equivalent to that range, for that scope. Mostly of internal use.\n\n### `details = WebVerse.getChildOffsets($parent, $child)`\n\nSame as `getOffsets()` but uses a `$child` text node (or a `$child` element containing text) as\ndetermining the offsets inside a `$parent` element. Returns `startOffset` and `endOffset` fields\nbeing the offsets normalised to the content of the `$parent`.\n\n\n## Development\n\nThe best thing when developing is to `npm run watch`. This will build both Node and browser versions\ncontinuously. It is also a good idea to `npm run test-local`, which will keep the Karma tests\nrunning (just in Chrome, so as not to be too invasive) whenever you make changes.\n\n---\n[jsconf]: http://2014.jsconf.eu/speakers/michael-donohoe-deeplink-to-anything-on-the-web.html\n[nyt]: https://github.com/NYTimes/Emphasis\n[ranges]: https://developer.mozilla.org/en-US/docs/Web/API/Range\n[selections]: https://developer.mozilla.org/en-US/docs/Web/API/Selection\n[surround-contents]: https://developer.mozilla.org/en-US/docs/Web/API/Range/surroundContents\n[no-power-in-the-verse]: https://youtu.be/uRdbEY_YfV4?t=24s\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscience-periodicals%2Fweb-verse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscience-periodicals%2Fweb-verse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscience-periodicals%2Fweb-verse/lists"}