{"id":25622574,"url":"https://github.com/unfoldingword/string-punctuation-tokenizer","last_synced_at":"2025-06-11T12:37:51.994Z","repository":{"id":29922983,"uuid":"122679395","full_name":"unfoldingWord/string-punctuation-tokenizer","owner":"unfoldingWord","description":"Small library that provides functions to tokenize a string into an array of words with or without punctuation","archived":false,"fork":false,"pushed_at":"2023-08-09T14:35:40.000Z","size":2249,"stargazers_count":8,"open_issues_count":21,"forks_count":1,"subscribers_count":8,"default_branch":"develop","last_synced_at":"2025-05-15T12:11:36.806Z","etag":null,"topics":["javascript","nlp","nlp-library","scripture-open-components","segmentation","tokenizers"],"latest_commit_sha":null,"homepage":"https://string-punctuation-tokenizer.netlify.app/#/Tokenize","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/unfoldingWord.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-23T22:29:54.000Z","updated_at":"2023-08-09T13:57:50.000Z","dependencies_parsed_at":"2024-06-18T22:41:04.499Z","dependency_job_id":"f370293a-99ea-42a6-b409-380bbb1f54f3","html_url":"https://github.com/unfoldingWord/string-punctuation-tokenizer","commit_stats":{"total_commits":100,"total_committers":10,"mean_commits":10.0,"dds":0.6799999999999999,"last_synced_commit":"c4c088e867716643e755b6493b8298a3f760c2ee"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unfoldingWord%2Fstring-punctuation-tokenizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unfoldingWord%2Fstring-punctuation-tokenizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unfoldingWord%2Fstring-punctuation-tokenizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unfoldingWord%2Fstring-punctuation-tokenizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/unfoldingWord","download_url":"https://codeload.github.com/unfoldingWord/string-punctuation-tokenizer/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unfoldingWord%2Fstring-punctuation-tokenizer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259265832,"owners_count":22831266,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["javascript","nlp","nlp-library","scripture-open-components","segmentation","tokenizers"],"created_at":"2025-02-22T10:19:52.172Z","updated_at":"2025-06-11T12:37:51.964Z","avatar_url":"https://github.com/unfoldingWord.png","language":"JavaScript","readme":"[![npm](https://img.shields.io/npm/dt/string-punctuation-tokenizer.svg)](https://www.npmjs.com/package/string-punctuation-tokenizer)\n[![npm](https://img.shields.io/npm/v/string-punctuation-tokenizer.svg)](https://www.npmjs.com/package/string-punctuation-tokenizer)\n\n# string-punctuation-tokenizer\nSmall library that provides functions to tokenize a string into an array of words with or without punctuation\n\n## Setup\n`npm install string-punctuation-tokenizer`\n\n## Usage\n`var stringTokenizer = require('string-punctuation-tokenizer');`\n\nor ES6 \n\n`import {tokenize} from 'string-punctuation-tokenizer';`\n\n#### Tokenize with punctuation\n```js\nimport {tokenize} from './src/tokenizers'; // use the import from above instead of this\nlet words = tokenize({text: 'Hello world, my name is Manny!', includePunctuation: true});\n// words = [\"Hello\", \"world\", \",\", \"my\", \"name\", \"is\", \"Manny\", \"!\"]\n```\n#### Tokenize without punctuation\n```js\nimport {tokenize} from './src/tokenizers'; // use the import from above instead of this\nlet words = tokenize({text: 'Hello world, my name is Manny!'});\n// words = [\"Hello\", \"world\", \"my\", \"name\", \"is\", \"Manny\"]\n```\n\n### Documentation\nSee detailed documentation and live WYSIWYG playground here: https://string-punctuation-tokenizer.netlify.app/#/Tokenize\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funfoldingword%2Fstring-punctuation-tokenizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funfoldingword%2Fstring-punctuation-tokenizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funfoldingword%2Fstring-punctuation-tokenizer/lists"}