{"id":13809812,"url":"https://github.com/ines/spacy-js","last_synced_at":"2025-04-06T03:10:18.306Z","repository":{"id":39707853,"uuid":"154664219","full_name":"ines/spacy-js","owner":"ines","description":"🎀 JavaScript API for spaCy with Python REST API","archived":false,"fork":false,"pushed_at":"2023-09-16T21:31:07.000Z","size":658,"stargazers_count":186,"open_issues_count":27,"forks_count":23,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-05-14T10:23:27.085Z","etag":null,"topics":["javascript","natural-language-processing","nlp","python","rest-api","spacy"],"latest_commit_sha":null,"homepage":"https://spacy.io","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ines.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-10-25T12:06:55.000Z","updated_at":"2024-05-12T23:56:48.000Z","dependencies_parsed_at":"2024-01-13T14:40:24.586Z","dependency_job_id":"682ef016-2090-49bb-9bc6-d50a01eac49d","html_url":"https://github.com/ines/spacy-js","commit_stats":{"total_commits":56,"total_committers":1,"mean_commits":56.0,"dds":0.0,"last_synced_commit":"5b7a86cb0d1099285e01252f7e1d44a36ad9a07f"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ines%2Fspacy-js","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ines%2Fspacy-js/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ines%2Fspacy-js/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ines%2Fspacy-js/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ines","download_url":"https://codeload.github.com/ines/spacy-js/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247427006,"owners_count":20937201,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["javascript","natural-language-processing","nlp","python","rest-api","spacy"],"created_at":"2024-08-04T02:00:36.610Z","updated_at":"2025-04-06T03:10:18.286Z","avatar_url":"https://github.com/ines.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"\u003ca href=\"https://explosion.ai\"\u003e\u003cimg src=\"https://explosion.ai/assets/img/logo.svg\" width=\"125\" height=\"125\" align=\"right\" /\u003e\u003c/a\u003e\n\n# spaCy JS\n\n[![travis](https://img.shields.io/travis/ines/spacy-js/master.svg?style=flat-square\u0026logo=travis)](https://travis-ci.org/ines/spacy-js)\n[![npm](https://img.shields.io/npm/v/spacy.svg?style=flat-square)](https://www.npmjs.com/package/spacy)\n[![GitHub](https://img.shields.io/github/release/ines/spacy-js/all.svg?style=flat-square)](https://github.com/ines/spacy-js)\n[![unpkg](https://img.shields.io/badge/unpkg-dist/index.js-brightgreen.svg?style=flat-square)](https://unpkg.com/spacy)\n\n\nJavaScript interface for accessing linguistic annotations provided by\n[spaCy](https://spacy.io). This project is mostly experimental and was\ndeveloped for fun to play around with different ways of mimicking spaCy's\nPython API.\n\nThe results will still be computed in Python and made available via a REST API.\nThe JavaScript API resembles spaCy's Python API as closely as possible (with\na few exceptions, as the values are all pre-computed and it's tricky to express\ncomplex recursive relationships).\n\n```javascript\nconst spacy = require('spacy');\n\n(async function() {\n    const nlp = spacy.load('en_core_web_sm');\n    const doc = await nlp('This is a text about Facebook.');\n    for (let ent of doc.ents) {\n        console.log(ent.text, ent.label);\n    }\n    for (let token of doc) {\n        console.log(token.text, token.pos, token.head.text);\n    }\n})();\n```\n\n## ⌛️ Installation\n\n### Installing the JavaScript library\n\nYou can install the JavaScript package via npm:\n\n```bash\nnpm install spacy\n```\n\n### Setting up the Python server\n\nFirst, clone this repo and install the requirements. If you've installed the\npackage via npm, you can also use the `api/server.py` and `requirements.txt` in\nyour `./node_modules/spacy` directory. It's recommended to use a virtual\nenvironment.\n\n```bash\npip install -r requirements.txt\n```\n\nYou can then run the REST API. By default, this will serve the API via\n`0.0.0.0:8080`:\n\n```bash\npython api/server.py\n```\n\nIf you like, you can install more [models](https://spacy.io/models) and specify\na comma-separated list of models to load as the first argument when you run\nthe server. All models need to be installed in the same environment.\n\n```bash\npython api/server.py en_core_web_sm,de_core_news_sm\n```\n\n| Argument | Type | Description | Default |\n| --- | --- | --- | --- |\n| `models` | positional (str) | Comma-separated list of models to load and make available. | `en_core_web_sm` |\n| `--host`, `-ho` | option (str) | Host to serve the API. | `0.0.0.0` |\n| `--port`, `-p` | option (int) | Port to server the API. | `8080` |\n\n## 🎛 API\n\n### `spacy.load`\n\n\"Load\" a spaCy model. This method mostly exists for consistency with the Python\nAPI. It sets up the REST API and `nlp` object, but doesn't actually load\nanything, since the models are already available via the REST API.\n\n```javascript\nconst nlp = spacy.load('en_core_web_sm');\n```\n\n| Argument | Type | Description |\n| --- | --- | --- |\n| `model` | String | Name of model to load, e.g. `'en_core_web_sm'`. Needs to be available via the REST API. |\n| `api` | String | Alternative URL of REST API. Defaults to `http://0.0.0.0:8080`. |\n| **RETURNS** | [`Language`](src/language.js) | The `nlp` object. |\n\n### `nlp` \u003ckbd\u003easync\u003c/kbd\u003e\n\nThe `nlp` object created by `spacy.load` can be called on a string of text\nand makes a request to the REST API. The easiest way to use it is to wrap the\ncall in an `async` function and use `await`:\n\n```javascript\nasync function() {\n    const nlp = spacy.load('en_core_web_sm');\n    const doc = await nlp('This is a text.');\n}\n```\n\n| Argument | Type | Description |\n| --- | --- | --- |\n| `text` | String | The text to process. |\n| **RETURNS** | [`Doc`](src/tokens.js) | The processed `Doc`. |\n\n### `Doc`\n\nJust like [in the original API](https://spacy.io/api/doc), the `Doc` object can\nbe constructed with an array of `words` and `spaces`. It also takes an\nadditional `attrs` object, which corresponds to the JSON-serialized linguistic\nannotations created in [`doc2json` in `api/server.py`](api/server.py).\n\nThe `Doc` behaves just like the regular spaCy `Doc` – you can iterate over its\ntokens, index into individual tokens, access the `Doc` attributes and properties\nand also use native JavaScript methods like `map` and `slice` (since there's no\nreal way to make Python's slice notation like `doc[2:4]` work).\n\n#### Construction\n\n```javascript\nimport { Doc } from 'spacy';\n\nconst words = ['Hello', 'world', '!'];\nconst spaces = [true, false, false];\nconst doc = Doc(words, spaces)\nconsole.log(doc.text) // 'Hello world!'\n```\n\n| Argument | Type | Description |\n| --- | --- | --- |\n| `words` | Array | The individual token texts. |\n| `spaces` | Array | Whether the token at this position is followed by a space or not. |\n| `attrs` | Object | JSON-serialized attributes, see [`doc2json`](api/server.py). |\n| **RETURNS** | [`Doc`](src/tokens.js) | The newly constructed `Doc`. |\n\n#### Symbol iterator and token indexing\n\n```javascript\nasync function() {\n    const nlp = spacy.load('en_core_web_sm');\n    const doc = await nlp('Hello world');\n\n    for (let token of doc) {\n        console.log(token.text);\n    }\n    // Hello\n    // world\n\n    const token1 = doc[0];\n    console.log(token1.text);\n    // Hello\n}\n```\n\n#### Properties and Attributes\n\n| Name | Type | Description |\n| --- | --- | --- |\n| `text` | String | The `Doc` text. |\n| `length` | Number | The number of tokens in the `Doc`. |\n| `ents` | Array | A list of [`Span`](src/tokens.js) objects, describing the named entities in the `Doc`. |\n| `sents` | Array | A list of [`Span`](src/tokens.js) objects, describing the sentences in the `Doc`. |\n| `nounChunks` | Array | A list of [`Span`](src/tokens.js) objects, describing the base noun phrases in the `Doc`. |\n| `cats` | Object | The document categories predicted by the text classifier, if available in the model. |\n| `isTagged` | Boolean | Whether the part-of-speech tagger has been applied to the `Doc`. |\n| `isParsed` | Boolean | Whether the dependency parser has been applied to the `Doc`. |\n| `isSentenced` | Boolean | Whether the sentence boundary detector has been applied to the `Doc`. |\n\n### `Span`\n\nA `Span` object is a slice of a `Doc` and contains of one or more tokens. Just\nlike [in the original API](https://spacy.io/api/span), it can be constructed\nfrom a `Doc`, a start and end index and an optional label, or by slicing a `Doc`.\n\n#### Construction\n\n```javascript\nimport { Doc, Span } from 'spacy';\n\nconst doc = Doc(['Hello', 'world', '!'], [true, false, false]);\nconst span = Span(doc, 1, 3);\nconsole.log(span.text) // 'world!'\n```\n\n| Argument | Type | Description |\n| --- | --- | --- |\n| `doc` | `Doc` | The reference document. |\n| `start` | Number | The start token index. |\n| `end` | Number | The end token index. This is *exclusive*, i.e. \"up to token X\". |\n| `label` | String | Optional label. |\n| **RETURNS** | [`Span`](src/tokens.js) | The newly constructed `Span`. |\n\n#### Properties and Attributes\n\n| Name | Type | Description |\n| --- | --- | --- |\n| `text` | String | The `Span` text. |\n| `length` | Number | The number of tokens in the `Span`. |\n| `doc` | `Doc` | The parent `Doc`. |\n| `start` | Number | The `Span`'s start index in the parent document. |\n| `end` | Number | The `Span`'s end index in the parent document. |\n| `label` | String | The `Span`'s label, if available. |\n\n### `Token`\n\nFor token attributes that exist as string and ID versions (e.g. `Token.pos` vs.\n`Token.pos_`), only the string versions are exposed.\n\n#### Usage Examples\n\n```javascript\nasync function() {\n    const nlp = spacy.load('en_core_web_sm');\n    const doc = await nlp('Hello world');\n\n    for (let token of doc) {\n        console.log(token.text, token.pos, token.isLower);\n    }\n    // Hello INTJ false\n    // world NOUN true\n}\n```\n\n#### Properties and Attributes\n\n| Name | Type | Description |\n| --- | --- | --- |\n| `text` | String | The token text. |\n| `whitespace` | String | Whitespace character following the token, if available. |\n| `textWithWs` | String | Token text with training whitespace. |\n| `length` | Number | The length of the token text. |\n| `orth` | Number | ID of the token text. |\n| `doc` | `Doc` | The parent `Doc`. |\n| `head` | `Token` | The syntactic parent, or \"governor\", of this token. |\n| `i` | Number | Index of the token in the parent document. |\n| `entType` | String | The token's named entity type. |\n| `entIob` | String | IOB code of the token's named entity tag. |\n| `lemma` | String | The token's lemma, i.e. the base form. |\n| `norm` | String | The normalised form of the token. |\n| `lower` | String | The lowercase form of the token. |\n| `shape` | String | Transform of the tokens's string, to show orthographic features. For example, \"Xxxx\" or \"dd\". |\n| `prefix` | String | A length-N substring from the start of the token. Defaults to `N=1`. |\n| `suffix` | String | Length-N substring from the end of the token. Defaults to `N=3`. |\n| `pos` | String | The token's coarse-grained part-of-speech tag. |\n| `tag` | String | The token's fine-grained part-of-speech tag. |\n| `isAlpha` | Boolean | Does the token consist of alphabetic characters? |\n| `isAscii` | Boolean | Does the token consist of ASCII characters? |\n| `isDigit` | Boolean | Does the token consist of digits? |\n| `isLower` | Boolean | Is the token lowercase? |\n| `isUpper` | Boolean | Is the token uppercase? |\n| `isTitle` | Boolean | Is the token titlecase? |\n| `isPunct` | Boolean | Is the token punctuation? |\n| `isLeftPunct` | Boolean | Is the token left punctuation? |\n| `isRightPunct` | Boolean | Is the token right punctuation? |\n| `isSpace` | Boolean | Is the token a whitespace character? |\n| `isBracket` | Boolean | Is the token a bracket? |\n| `isCurrency` | Boolean | Is the token a currency symbol? |\n| `likeUrl` | Boolean | Does the token resemble a URL? |\n| `likeNum` | Boolean | Does the token resemble a number? |\n| `likeEmail` | Boolean | Does the token resemble an email address? |\n| `isOov` | Boolean | Is the token out-of-vocabulary? |\n| `isStop` | Boolean | Is the token a stop word? |\n| `isSentStart` | Boolean | Does the token start a sentence? |\n\n## 🔔 Run Tests\n\n### Python\n\nFirst, make sure you have `pytest` and all dependencies installed. You can then\nrun the tests by pointing `pytest` to [`/tests`](/tests):\n\n```bash\npython -m pytest tests\n```\n\n### JavaScript\n\nThis project uses [Jest](https://jestjs.io) for testing. Make sure you have\nall dependencies and development dependencies installed. You can then run:\n\n```bash\nnpm run test\n```\n\nTo allow testing the code without a REST API providing the data, the test suite\ncurrently uses a [mock of the `Language` class](src/__mocks__), which returns\nstatic data located in [`tests/util.js`](tests/util.js).\n\n## ✅ Ideas and Todos\n\n- [ ] Improve JavaScript tests.\n- [ ] Experiment with NodeJS bindings to make Python integration easier. To be fair, running a separate API in an environment controlled by the user and *not* hiding it a few levels deep is often much easier. But maybe there are some modern Node tricks that this project could benefit from.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fines%2Fspacy-js","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fines%2Fspacy-js","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fines%2Fspacy-js/lists"}