{"id":18718300,"url":"https://github.com/worldbrain/remove-stopwords","last_synced_at":"2025-04-12T13:33:36.631Z","repository":{"id":53345151,"uuid":"103812780","full_name":"WorldBrain/remove-stopwords","owner":"WorldBrain","description":"A simple repository to remove 'irrelevant for search' words, support for 51 languages ","archived":false,"fork":false,"pushed_at":"2017-09-21T16:03:23.000Z","size":65,"stargazers_count":27,"open_issues_count":1,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-09T21:30:19.276Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WorldBrain.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-09-17T08:02:53.000Z","updated_at":"2024-09-03T02:21:03.000Z","dependencies_parsed_at":"2022-08-29T10:10:35.501Z","dependency_job_id":null,"html_url":"https://github.com/WorldBrain/remove-stopwords","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WorldBrain%2Fremove-stopwords","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WorldBrain%2Fremove-stopwords/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WorldBrain%2Fremove-stopwords/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WorldBrain%2Fremove-stopwords/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WorldBrain","download_url":"https://codeload.github.com/WorldBrain/remove-stopwords/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248573640,"owners_count":21126876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T13:20:27.685Z","updated_at":"2025-04-12T13:33:36.595Z","avatar_url":"https://github.com/WorldBrain.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# remove-stopwords\n`remove-stopword` is a node module that allows you to strip stopwords from an\ninput text. [In natural language processing, \"Stopwords\" are words\nthat are so frequent that they can safely be removed from a text\nwithout altering its\nmeaning.](https://en.wikipedia.org/wiki/Stop_words).\n\nThis library is specifically designed for WorldBrain's usecase of stripping as many words from every webpage as possible to make search-indexing faster in regards to several thousand documents of varying information.\n\n**Credits:**\n\nThis module was essentially coppied directly from [@fergiemcdowall's stopword library](https://github.com/fergiemcdowall/stopword). \nThe only differences is that more language support was added from this [stopwords json lib](https://github.com/6/stopwords-json)\nAlso there are minor tweaks to several languages specifically for worldbrains use-case.\nUnless otherwise specified all the stopwords came from [stopwords json lib](https://github.com/6/stopwords-json)\n\n[![MIT License][license-image]][license-url]\n\n## Usage\n\n### Default (English)\nBy default, `stopword` will strip an array of \"meaningless\" English words\n\n```javascript\nsw = require('stopword')\nconst oldString = 'a really Interesting string with some words'.split(' ')\nconst newString = sw.removeStopwords(oldString)\n// newString is now [ 'really', 'Interesting', 'string', 'words' ]\n\n```\n\n### Other languages\nYou can also specify a language other than English, as a string:\n```javascript\nsw = require('stopword')\nconst oldString = 'Trädgårdsägare är beredda att pröva vad som helst för att bli av med de hatade mördarsniglarna åäö'.split(' ')\n// sw.sv contains swedish stopwords\nconst newString = sw.removeStopwords(oldString, 'sv')\n// newString is now [ 'Trädgårdsägare', 'beredda', 'pröva', 'helst', 'hatade', 'mördarsniglarna', 'åäö' ]\n```\n\n### All languages\nYou can also specify to remove stopwords from all languages by specifying `'all'`:\n```javascript\nsw = require('stopword')\nconst oldString = 'Trädgårdsägare är beredda att a really Interesting string with some words ciao'.split(' ')\n// 'all' iterates over every stopword list in the lib\nconst newString = sw.removeStopwords(oldString, 'all')\n// newString is now [ 'Trädgårdsägare', 'beredda', 'really', 'Interesting', 'string', 'words' ]\n```\n\n### Custom list of stopwords\nAnd last, but not least, it is possible to use your own, custom list of stopwords:\n```javascript\nsw = require('stopword')\nconst oldString = 'you can even roll your own custom stopword list'.split(' ')\n// Just add your own list/array of stopwords\nconst newString = sw.removeStopwords(oldString, [ 'even', 'a', 'custom', 'stopword', 'list', 'is', 'possible']\n// newString is now [ 'you', 'can', 'roll', 'your', 'own']\n```\n\n## API\n\n### Language List\n\nArrays of stopwords for the following languages are supplied: \n\n* `af` - Afrikaans\n* `ar` - Modern Standard Arabic\n* `hy` - Armenian\n* `eu` - Basque\n* `bn` - Bengali\n* `br` - Brazilian Portuguese\n* `bg` - Bulgarian\n* `ca` - Catalan\n* `zh` - Chinese\n* `hr` - Croation\n* `hr` - Czech\n* `da` - Danish\n* `nl` - Dutch \n* `en` - English\n* `eo` - Esperanto\n* `et` - Estonian\n* `fa` - Farsi\n* `fi` - Finnish\n* `fr` - French\n* `gl` - Galician\n* `de` - German\n* `el` - Greek\n* `ha` - Hausa\n* `he` - Hebrew\n* `hi` - Hindi\n* `hu` - Hungarian\n* `id` - Indonesian\n* `ga` - Irish\n* `it` - Italian\n* `ja` - Japanese\n* `ko` - Korean\n* `la` - Latin\n* `lv` - Latvian\n* `mr` - Marathi\n* `no` - Norwegian\n* `fa` - Persian\n* `pl` - Polish\n* `pt` - Portuguese\n* `ro` - Romanian\n* `ru` - Russian\n* `sk` - Slovak\n* `sl` - Slovenian\n* `so` - Somalia\n* `st` - Southern Sotho\n* `es` - Spanish\n* `sw` - Swahili\n* `sv` - Swedish\n* `th` - Thai\n* `yo` - Yoruba\n* `zu` - Zulu\n\n```javascript\nsw = require('stopword')\nnorwegianStopwords = sw.no\n// norwegianStopwords now contains an Array of norwgian stopwords\n```\n\n#### Languages with no space between words\n`ja` Japanese and `zh` Chinese Simplified have no space between words. For these languages you need to split the text into words before feeding it to the `stopword` module. You can check out [TinySegmenter](http://chasen.org/%7Etaku/software/TinySegmenter/) for Japanese and [chinese-tokenizer](https://github.com/yishn/chinese-tokenizer) for Chinese.\n\n### removeStopwords\n\nReturns an Array that represents the text with the specified stopwords removed.\n\n* `text` An array of words\n* `stopwords` An array of stopwords\n\n```javascript\nsw = require('stopword')\nvar text = sw.removeStopwords(text[, stopwords])\n// text is now an array of given words minus specified stopwords\n```\n\n\n## Release Notes:\n\n[license-image]: http://img.shields.io/badge/license-MIT-blue.svg?style=flat\n[license-url]: LICENSE\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworldbrain%2Fremove-stopwords","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fworldbrain%2Fremove-stopwords","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworldbrain%2Fremove-stopwords/lists"}