{"id":17291887,"url":"https://github.com/eklem/words-n-numbers","last_synced_at":"2025-09-02T15:05:22.826Z","repository":{"id":38826955,"uuid":"199393541","full_name":"eklem/words-n-numbers","owner":"eklem","description":"Tokenizing strings of text. Regex extracting arrays of words and optionally numbers, emojis, tags, usernames and email addresses from strings. For Node.js and the browser. When you need more than just [a-z] regular expressions.","archived":false,"fork":false,"pushed_at":"2024-09-28T04:43:17.000Z","size":1344,"stargazers_count":12,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"trunk","last_synced_at":"2025-08-31T19:39:23.679Z","etag":null,"topics":["nlp","offline-first","regex","tokenization","tokenizer"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eklem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-29T06:38:32.000Z","updated_at":"2025-04-15T16:01:23.000Z","dependencies_parsed_at":"2024-01-12T10:25:24.780Z","dependency_job_id":"0f809cc9-019e-4316-8ff9-a20a2a1d7ec9","html_url":"https://github.com/eklem/words-n-numbers","commit_stats":{"total_commits":230,"total_committers":2,"mean_commits":115.0,"dds":0.4869565217391304,"last_synced_commit":"3e7b041c80bb75e2ab4d9832469f09c31db717f4"},"previous_names":["eklem/words-and-numbers"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/eklem/words-n-numbers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eklem%2Fwords-n-numbers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eklem%2Fwords-n-numbers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eklem%2Fwords-n-numbers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eklem%2Fwords-n-numbers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eklem","download_url":"https://codeload.github.com/eklem/words-n-numbers/tar.gz/refs/heads/trunk","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eklem%2Fwords-n-numbers/sbom","scorecard":{"id":370670,"data":{"date":"2025-08-11","repo":{"name":"github.com/eklem/words-n-numbers","commit":"3d7ed3658d065e9bb43da9acd30f5d472d6dcf49"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.6,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Code-Review","score":0,"reason":"Found 0/18 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/codeql-analysis.yml:1","Warn: no topLevel permission defined: .github/workflows/tests.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql-analysis.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/eklem/words-n-numbers/codeql-analysis.yml/trunk?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql-analysis.yml:32: update your workflow using https://app.stepsecurity.io/secureworkflow/eklem/words-n-numbers/codeql-analysis.yml/trunk?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql-analysis.yml:40: update your workflow using https://app.stepsecurity.io/secureworkflow/eklem/words-n-numbers/codeql-analysis.yml/trunk?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/codeql-analysis.yml:54: update your workflow using https://app.stepsecurity.io/secureworkflow/eklem/words-n-numbers/codeql-analysis.yml/trunk?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/tests.yml:12: update your workflow using https://app.stepsecurity.io/secureworkflow/eklem/words-n-numbers/tests.yml/trunk?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/tests.yml:13: update your workflow using https://app.stepsecurity.io/secureworkflow/eklem/words-n-numbers/tests.yml/trunk?enable=pin","Warn: npmCommand not pinned by hash: .github/workflows/tests.yml:18","Info:   0 out of   6 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 npmCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'trunk'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":7,"reason":"SAST tool detected but not run on all commits","details":["Info: SAST configuration detected: CodeQL","Warn: 0 commits out of 12 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":7,"reason":"3 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-v6h2-p8h4-qcjw","Warn: Project is vulnerable to: GHSA-3xgq-45jj-v275","Warn: Project is vulnerable to: GHSA-952p-6rrq-rcjv"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-18T12:50:56.507Z","repository_id":38826955,"created_at":"2025-08-18T12:50:56.507Z","updated_at":"2025-08-18T12:50:56.507Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273301876,"owners_count":25081105,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-02T02:00:09.530Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp","offline-first","regex","tokenization","tokenizer"],"created_at":"2024-10-15T10:42:10.177Z","updated_at":"2025-09-02T15:05:22.803Z","avatar_url":"https://github.com/eklem.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Words'n'numbers\nTokenizing strings of text. Extracting arrays of words and optionally number, emojis, tags, usernames and email addresses from strings. For Node.js and the browser. When you need more than just [a-z] regular expressions. Part of document processing for [search-index](https://github.com/fergiemcdowall/search-index) and [nowsearch.xyz](https://github.com/eklem/nowsearch.xyz).\n\nInspired by [extractwords](https://github.com/f-a-r-a-z/extractwords)\n\n[![NPM version][npm-version-image]][npm-url]\n[![NPM downloads][npm-downloads-image]][npm-url]\n[![](https://data.jsdelivr.com/v1/package/npm/words-n-numbers/badge?style=rounded)](https://www.jsdelivr.com/package/npm/words-n-numbers)\n[![Build Status][build-image]][build-url]\n[![JavaScript Style Guide][standardjs-image]][standardjs-url]\n[![MIT License][license-image]][license-url]\n\n## Breaking change\n\nFrom `v8.0.0` - `emojis`-regular expression now extracts single emojis, so no more \"words\" formed by several emojis. This because each emoji in a sense are words. You can still make a custom regular expression to grab several emojis in a row as one item with `const customEmojis = '\\\\p{Emoji_Presentation}'` and then use it as your custom regex.\n\nMeaning that instead of:\n\n```javaScript\nextract('A ticket to 大阪 costs ¥2000 👌😄 😢', { regex: emojis})\n// ['👌😄', '😢']\n```\n\n...you will get:\n\n```javaScript\nextract('A ticket to 大阪 costs ¥2000 👌😄 😢', { regex: emojis})\n// ['👌', '😄', '😢']\n```\n\n## Initiating\n\n### CJS\n\n```javascript\nconst { extract, words, numbers, emojis, tags, usernames, email } = require('words-n-numbers')\n// extract, words, numbers, emojis, tags, usernames, email available\n```\n\n### ESM\n\n```javascript\nimport { extract, words, numbers, emojis, tags, usernames, email } from 'words-n-numbers'\n// extract, words, numbers, emojis, tags, usernames, email available\n```\n\n### Browser\n\n```html\n\u003cscript src=\"https://cdn.jsdelivr.net/npm/words-n-numbers/dist/words-n-numbers.umd.min.js\"\u003e\u003c/script\u003e\n\n\u003cscript\u003e\n  //wnn.extract, wnn.words, wnn.numbers, wnn.emojis, wnn.tags, wnn.usernames, wnn.email available\n\u003c/script\u003e\n```\n\n## Browser demo\nA [simple browser demo of wnn](https://eklem.github.io/words-n-numbers/demo/) to show how it works.\n\n[![Screenshot of the words-n-numbers demo](./demo/wnn-demo-screenshot.png)](https://eklem.github.io/words-n-numbers/demo/)\n\n## Use\n\nThe default regex should catch every unicode character from for every language. Default regex flags are `giu`. `emojisCustom`-regex won't work with the `u`-flag (unicode).\n\n### Only words\n```javaScript\nconst stringOfWords = 'A 1000000 dollars baby!'\nextract(stringOfWords)\n// returns ['A', 'dollars', 'baby']\n```\n\n### Only words, converted to lowercase\n```javaScript\nconst stringOfWords = 'A 1000000 dollars baby!'\nextract(stringOfWords, { toLowercase: true })\n// returns ['a', 'dollars', 'baby']\n```\n\n### Combining predefined regex for words and numbers, converted to lowercase\n```javaScript\nconst stringOfWords = 'A 1000000 dollars baby!'\nextract(stringOfWords, { regex: [words, numbers], toLowercase: true })\n// returns ['a', '1000000', 'dollars', 'baby']\n```\n\n### Combining predefined regex for words and emoticons, converted to lowercase\n```javaScript\nconst stringOfWords = 'A ticket to 大阪 costs ¥2000 👌😄 😢'\nextract(stringOfWords, { regex: [words, emojis], toLowercase: true })\n// returns [ 'A', 'ticket', 'to', '大阪', 'costs', '👌', '😄', '😢' ]\n```\n\n### Combining predefined regex for numbers and emoticons\n```javaScript\nconst stringOfWords = 'A ticket to 大阪 costs ¥2000 👌😄 😢'\nextract(stringOfWords, { regex: [numbers, emojis], toLowercase: true })\n// returns [ '2000', '👌', '😄', '😢' ]\n```\n\n### Combining predefined regex for words, numbers and emoticons, converted to lowercase\n```javaScript\ncons stringOfWords = 'A ticket to 大阪 costs ¥2000 👌😄 😢'\nextract(stringOfWords, { regex: [words, numbers, emojis], toLowercase: true })\n// returns [ 'a', 'ticket', 'to', '大阪', 'costs', '2000', '👌', '😄', '😢' ]\n```\n\n### Predefined regex for `#tags`\n```javaScript\nconst stringOfWords = 'A #49ticket to #大阪 or two#tickets costs ¥2000 👌😄😄 😢'\nextract(stringOfWords, { regex: tags, toLowercase: true })\n// returns [ '#49ticket', '#大阪' ]\n```\n\n### Predefined regex for `@usernames`\n```javaScript\nconst stringOfWords = 'A #ticket to #大阪 costs bob@bob.com, @alice and @美林 ¥2000 👌😄😄 😢'\nextract(stringOfWords, { regex: usernames, toLowercase: true })\n// returns [ '@alice123', '@美林' ]\n```\n\n### Predefined regex for email addresses\n```javaScript\nconst stringOfWords = 'A #ticket to #大阪 costs bob@bob.com, alice.allison@alice123.com, some-name.nameson.nameson@domain.org and @美林 ¥2000 👌😄😄 😢'\nextract(stringOfWords, { regex: email, toLowercase: true })\n// returns [ 'bob@bob.com', 'alice.allison@alice123.com', 'some-name.nameson.nameson@domain.org' ]\n```\n\n### Predefined custom regex for all Unicode emojis\n```javaScript\nconst stringOfWords = 'A #ticket to #大阪 costs bob@bob.com, alice.allison@alice123.com, some-name.nameson.nameson@domain.org and @美林 ¥2000 👌😄😄 😢👩🏽‍🤝‍👨🏻 👩🏽‍🤝‍👨🏻'\nextract(stringOfWords, { regex: emojisCustom, flags: 'g' })\n// returns [ '👌', '😄', '😄', '😢', '👩🏽‍🤝‍👨🏻', '👩🏽‍🤝‍👨🏻' ]\n```\n\n### Custom regex\nSome characters needs to be escaped, like `\\`and `'`. And you escape it with a backslash - `\\`.\n```javaScript\nconst stringOfWords = 'This happens at 5 o\\'clock !!!'\nextract(stringOfWords, { regex: '[a-z\\'0-9]+' })\n// returns ['This', 'happens', 'at', '5', 'o\\'clock']\n```\n\n## API\n\n### Extract function\n\nReturns an array of words and optionally numbers.\n```javascript\nextract(stringOfText, \\\u003coptions-object\\\u003e)\n```\n\n### Options object\n```javascript\n{\n  regex: 'custom or predefined regex',  // defaults to words\n  toLowercase: [true / false]             // defaults to false\n  flags: 'gmixsuUAJD' // regex flags, defaults to giu - /[regexPattern]/[regexFlags]\n}\n```\n\n### Order of combined regexes\n\nYou can add an array of different regexes or just a string. If you add an array, they will be joined with a `|`-separator, making it an OR-regex. Put the `email`, `usernames` and `tags` before `words` to get the extraction right.\n\n```javaScript\n// email addresses before usernames before words can give another outcome than\nextract(oldString, { regex: [email, usernames, words] })\n\n// than words before usernames before email addresses\nextract(oldString, { regex: [words, usernames, email] })\n```\n\n### Predefined regexes\n```javaScript\nwords              // only words, any language \u003c-- default\nnumbers            // only numbers\nemojis             // only emojis\nemojisCustom       // only emojis. Works with the `g`-flag, not `giu`. Based on custom emoji extractor from https://github.com/mathiasbynens/rgi-emoji-regex-pattern\ntags               // #tags (any language\nusernames          // @usernames (any language)\nemail              // email addresses. Most valid addresses,\n                   //   but not to be used as a validator\n```\n\n### Flags for regexes\n\nAll but one regex uses the  `giu`-flag. The one that doesn't is the `emojisCustom` that will need only a `g`-flag. `emojisCustom` is added because the standard `emojis` regex based on `\\\\p{Emoji_Presentation}` isn't able to grab all emojis. When browsers support `p\\{RGI_emoji} under a `giu`-flag the library will be changed.\n\n### Languages supported\nSupports most languages supported by [stopword](https://github.com/fergiemcdowall/stopword#language-code), and others too. Some languages like Japanese and Chinese simplified needs to be tokenized. May add tokenizers at a later stage.\n\n#### PR's welcome\nPR's and issues are more than welcome =)\n\n[license-image]: http://img.shields.io/badge/license-MIT-blue.svg?style=flat\n[license-url]: LICENSE\n[npm-url]: https://npmjs.org/package/words-n-numbers\n[npm-version-image]: http://img.shields.io/npm/v/words-n-numbers.svg?style=flat\n[npm-downloads-image]: http://img.shields.io/npm/dm/words-n-numbers.svg?style=flat\n[build-url]: https://github.com/eklem/words-n-numbers/actions/workflows/tests.yml\n[build-image]: https://github.com/eklem/words-n-numbers/actions/workflows/tests.yml/badge.svg\n[standardjs-url]: https://standardjs.com\n[standardjs-image]: https://img.shields.io/badge/code_style-standard-brightgreen.svg?style=flat-square\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feklem%2Fwords-n-numbers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feklem%2Fwords-n-numbers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feklem%2Fwords-n-numbers/lists"}