{"id":13822424,"url":"https://github.com/cmccomb/rust-stop-words","last_synced_at":"2025-12-12T15:26:11.497Z","repository":{"id":57668784,"uuid":"289415491","full_name":"cmccomb/rust-stop-words","owner":"cmccomb","description":"Common stop words in a variety of languages","archived":false,"fork":false,"pushed_at":"2025-02-23T20:01:21.000Z","size":265,"stargazers_count":22,"open_issues_count":2,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-09T23:12:20.604Z","etag":null,"topics":["languages","natural-language-procressing","nlp","nltk","rust-crate","stopwords"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/stop-words","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cmccomb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-22T03:59:58.000Z","updated_at":"2025-04-07T12:10:32.000Z","dependencies_parsed_at":"2024-08-04T08:08:25.660Z","dependency_job_id":"1efc88bf-5849-439a-8e1f-5236dcd863ad","html_url":"https://github.com/cmccomb/rust-stop-words","commit_stats":null,"previous_names":["cmccomb/stop-words"],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmccomb%2Frust-stop-words","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmccomb%2Frust-stop-words/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmccomb%2Frust-stop-words/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmccomb%2Frust-stop-words/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cmccomb","download_url":"https://codeload.github.com/cmccomb/rust-stop-words/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248125588,"owners_count":21051770,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["languages","natural-language-procressing","nlp","nltk","rust-crate","stopwords"],"created_at":"2024-08-04T08:01:59.680Z","updated_at":"2025-12-12T15:26:11.465Z","avatar_url":"https://github.com/cmccomb.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"[![Github CI](https://github.com/cmccomb/rust-stop-words/actions/workflows/tests.yml/badge.svg)](https://github.com/cmccomb/rust-stop-words/actions)\n[![Crates.io](https://img.shields.io/crates/v/stop-words.svg)](https://crates.io/crates/stop-words)\n[![docs.rs](https://img.shields.io/docsrs/stop-words/latest?logo=rust)](https://docs.rs/stop-words)\n# About\n\nStop words are words that don't carry much meaning, and are typically removed as a preprocessing step before text\nanalysis or natural language processing. This crate contains common stop words for a variety of languages. This crate uses stop word\nlists from [Stopwords ISO](https://github.com/stopwords-iso) and also from [NLTK](https://www.nltk.org/).\n\n# Usage\nUsing this crate is fairly straight-forward: \n```rust, ignore\n// Get the stop words\nlet words = stop_words::get(stop_words::LANGUAGE::English);\n\n// Print them\nfor word in words {\n    println!(\"{}\", word);\n}\n```\nThe function ``get`` will take either a member of the `LANGUAGE` enum or a two-letter ISO language code as either a `str` or a `String` type.\n\nYou can find a complete example of how to read in a text file and remove stop words [here](https://github.com/cmccomb/rust-stop-words/blob/master/examples/remove_stop_words_with_regex.rs).\n\n\n# ISO Language Availability\nThis crate supports all languages from [Stopwords ISO](https://github.com/stopwords-iso) and also from [NLTK](https://www.nltk.org/). Expand the table below to see a comprehensive description.\n\u003cdetails\u003e\n    \u003csummary\u003eLanguage Coverage Table\u003c/summary\u003e\n\n| ISO 639-1 Code | Language                                                                         | Stopwords ISO | NLTK |\n|----------------|----------------------------------------------------------------------------------|---------------|------|\n| aa             | Afar                                                                             |               |      |\n| ab             | Abkhazian                                                                        |               |      |\n| af             | Afrikaans                                                                        | ✓             |      |\n| ak             | Akan                                                                             |               |      |\n| sq             | Albanian                                                                         |               |      |\n| am             | Amharic                                                                          |               |      |\n| ar             | Arabic                                                                           | ✓             | ✓    |\n| an             | Aragonese                                                                        |               |      |\n| hy             | Armenian                                                                         | ✓             |      |\n| as             | Assamese                                                                         |               |      |\n| av             | Avaric                                                                           |               |      |\n| ae             | Avestan                                                                          |               |      |\n| ay             | Aymara                                                                           |               |      |\n| az             | Azerbaijani                                                                      |               | ✓    |\n| ba             | Bashkir                                                                          |               |      |\n| bm             | Bambara                                                                          |               |      |\n| eu             | Basque                                                                           | ✓             |      |\n| be             | Belarusian                                                                       |               |      |\n| bn             | Bengali                                                                          | ✓             |      |\n| bh             | Bihari languages                                                                 |               |      |\n| bi             | Bislama                                                                          |               |      |\n| bo             | Tibetan                                                                          |               |      |\n| bs             | Bosnian                                                                          |               |      |\n| br             | Breton                                                                           | ✓             |      |\n| bg             | Bulgarian                                                                        | ✓             |      |\n| my             | Burmese                                                                          |               |      |\n| ca             | Catalan; Valencian                                                               | ✓             |      |\n| cs             | Czech                                                                            | ✓             |      |\n| ch             | Chamorro                                                                         |               |      |\n| ce             | Chechen                                                                          |               |      |\n| zh             | Chinese                                                                          | ✓             |      |\n| cu             | Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic |               |      |\n| cv             | Chuvash                                                                          |               |      |\n| kw             | Cornish                                                                          |               |      |\n| co             | Corsican                                                                         |               |      |\n| cr             | Cree                                                                             |               |      |\n| cy             | Welsh                                                                            |               |      |\n| da             | Danish                                                                           | ✓             | ✓    |\n| de             | German                                                                           | ✓             | ✓    |\n| dv             | Divehi; Dhivehi; Maldivian                                                       |               |      |\n| nl             | Dutch; Flemish                                                                   | ✓             | ✓    |\n| dz             | Dzongkha                                                                         |               |      |\n| el             | Greek, Modern (1453-)                                                            | ✓             | ✓    |\n| en             | English                                                                          | ✓             | ✓    |\n| eo             | Esperanto                                                                        | ✓             |      |\n| et             | Estonian                                                                         | ✓             |      |\n| ee             | Ewe                                                                              |               |      |\n| fo             | Faroese                                                                          |               |      |\n| fa             | Persian                                                                          | ✓             |      |\n| fj             | Fijian                                                                           |               |      |\n| fi             | Finnish                                                                          | ✓             | ✓    |\n| fr             | French                                                                           | ✓             | ✓    |\n| fy             | Western Frisian                                                                  |               |      |\n| ff             | Fulah                                                                            |               |      |\n| ka             | Georgian                                                                         |               |      |\n| gd             | Gaelic; Scottish Gaelic                                                          |               |      |\n| ga             | Irish                                                                            | ✓             |      |\n| gl             | Galician                                                                         | ✓             |      |\n| gv             | Manx                                                                             |               |      |\n| gn             | Guarani                                                                          |               |      |\n| gu             | Gujarati                                                                         | ✓             |      |\n| ht             | Haitian; Haitian Creole                                                          |               |      |\n| ha             | Hausa                                                                            | ✓             |      |\n| he             | Hebrew                                                                           | ✓             |      |\n| hz             | Herero                                                                           |               |      |\n| hi             | Hindi                                                                            | ✓             |      |\n| ho             | Hiri Motu                                                                        |               |      |\n| hr             | Croatian                                                                         | ✓             |      |\n| hu             | Hungarian                                                                        | ✓             | ✓    |\n| ig             | Igbo                                                                             |               |      |\n| is             | Icelandic                                                                        |               |      |\n| io             | Ido                                                                              |               |      |\n| ii             | Sichuan Yi; Nuosu                                                                |               |      |\n| iu             | Inuktitut                                                                        |               |      |\n| ie             | Interlingue; Occidental                                                          |               |      |\n| ia             | Interlingua (International Auxiliary Language Association)                       |               |      |\n| id             | Indonesian                                                                       | ✓             | ✓    |\n| ik             | Inupiaq                                                                          |               |      |  \n| it             | Italian                                                                          | ✓             | ✓    |\n| jv             | Javanese                                                                         |               |      |\n| ja             | Japanese                                                                         | ✓             |      |\n| kl             | Kalaallisut; Greenlandic                                                         |               |      |\n| kn             | Kannada                                                                          |               |      |\n| ks             | Kashmiri                                                                         |               |      |\n| kr             | Kanuri                                                                           |               |      |\n| kk             | Kazakh                                                                           |               | ✓    |\n| km             | Central Khmer                                                                    |               |      |\n| ki             | Kikuyu; Gikuyu                                                                   |               |      |\n| rw             | Kinyarwanda                                                                      |               |      |\n| ky             | Kirghiz; Kyrgyz                                                                  |               |      |\n| kv             | Komi                                                                             |               |      |\n| kg             | Kongo                                                                            |               |      |\n| ko             | Korean                                                                           | ✓             |      |\n| kj             | Kuanyama; Kwanyama                                                               |               |      |\n| ku             | Kurdish                                                                          | ✓             |      |\n| lo             | Lao                                                                              |               |      |\n| la             | Latin                                                                            | ✓             |      |\n| lv             | Latvian                                                                          | ✓             |      |\n| li             | Limburgan; Limburger; Limburgish                                                 |               |      |\n| ln             | Lingala                                                                          |               |      |\n| lt             | Lithuanian                                                                       | ✓             |      |\n| lb             | Luxembourgish; Letzeburgesch                                                     |               |      |\n| lu             | Luba-Katanga                                                                     |               |      |\n| lg             | Ganda                                                                            |               |      |\n| mk             | Macedonian                                                                       |               |      |\n| mh             | Marshallese                                                                      |               |      |\n| ml             | Malayalam                                                                        |               |      |\n| mi             | Maori                                                                            |               |      |\n| mr             | Marathi                                                                          | ✓             |      |\n| ms             | Malay                                                                            | ✓             |      |\n| mg             | Malagasy                                                                         |               |      |\n| mt             | Maltese                                                                          |               |      |\n| mn             | Mongolian                                                                        |               |      |\n| na             | Nauru                                                                            |               |      |\n| nv             | Navajo; Navaho                                                                   |               |      |\n| nr             | Ndebele, South; South Ndebele                                                    |               |      |\n| nd             | Ndebele, North; North Ndebele                                                    |               |      |\n| ng             | Ndonga                                                                           |               |      |\n| ne             | Nepali                                                                           |               | ✓    |\n| nn             | Norwegian Nynorsk; Nynorsk, Norwegian                                            |               |      |\n| nb             | Bokmål, Norwegian; Norwegian Bokmål                                              |               |      |\n| no             | Norwegian                                                                        | ✓             | ✓    |\n| ny             | Chichewa; Chewa; Nyanja                                                          |               |      |\n| oc             | Occitan (post 1500)                                                              |               |      |\n| oj             | Ojibwa                                                                           |               |      |\n| or             | Oriya                                                                            |               |      |\n| om             | Oromo                                                                            |               |      |\n| os             | Ossetian; Ossetic                                                                |               |      |\n| pa             | Panjabi; Punjabi                                                                 |               |      |\n| pi             | Pali                                                                             |               |      |\n| pl             | Polish                                                                           | ✓             |      |\n| pt             | Portuguese                                                                       | ✓             | ✓    |\n| ps             | Pushto; Pashto                                                                   |               |      |\n| qu             | Quechua                                                                          |               |      |\n| rm             | Romansh                                                                          |               |      |\n| ro             | Romanian; Moldavian; Moldovan                                                    | ✓             | ✓    |\n| rn             | Rundi                                                                            |               |      |\n| ru             | Russian                                                                          | ✓             | ✓    |\n| sg             | Sango                                                                            |               |      |\n| sa             | Sanskrit                                                                         |               |      |\n| si             | Sinhala; Sinhalese                                                               |               |      |\n| sk             | Slovak                                                                           | ✓             |      |\n| sl             | Slovenian                                                                        | ✓             | ✓    |\n| se             | Northern Sami                                                                    |               |      |\n| sm             | Samoan                                                                           |               |      |\n| sn             | Shona                                                                            |               |      |\n| sd             | Sindhi                                                                           |               |      |\n| so             | Somali                                                                           | ✓             |      |\n| st             | Sotho, Southern                                                                  | ✓             |      |\n| es             | Spanish; Castilian                                                               | ✓             | ✓    |\n| sc             | Sardinian                                                                        |               |      |\n| sr             | Serbian                                                                          |               |      |\n| ss             | Swati                                                                            |               |      |\n| su             | Sundanese                                                                        |               |      |\n| sw             | Swahili                                                                          | ✓             |      |\n| sv             | Swedish                                                                          | ✓             | ✓    |\n| ty             | Tahitian                                                                         |               |      |\n| ta             | Tamil                                                                            |               |      |\n| tt             | Tatar                                                                            |               |      |\n| te             | Telugu                                                                           |               |      |\n| tg             | Tajik                                                                            |               | ✓    |\n| tl             | Tagalog                                                                          | ✓             |      |\n| th             | Thai                                                                             | ✓             |      |\n| ti             | Tigrinya                                                                         |               |      |\n| to             | Tonga (Tonga Islands)                                                            |               |      |\n| tn             | Tswana                                                                           |               |      |\n| ts             | Tsonga                                                                           |               |      |\n| tk             | Turkmen                                                                          |               |      |\n| tr             | Turkish                                                                          | ✓             | ✓    |\n| tw             | Twi                                                                              |               |      |\n| ug             | Uighur; Uyghur                                                                   |               |      |\n| uk             | Ukrainian                                                                        | ✓             |      |\n| ur             | Urdu                                                                             | ✓             |      |\n| uz             | Uzbek                                                                            |               |      |\n| ve             | Venda                                                                            |               |      |\n| vi             | Vietnamese                                                                       | ✓             |      |\n| vo             | Volapük                                                                          |               |      |\n| wa             | Walloon                                                                          |               |      |\n| wo             | Wolof                                                                            |               |      |\n| xh             | Xhosa                                                                            |               |      |\n| yi             | Yiddish                                                                          |               |      |\n| yo             | Yoruba                                                                           | ✓             |      |\n| za             | Zhuang; Chuang                                                                   |               |      |\n| zu             | Zulu                                                                             | ✓             |      |\n\n\u003c/details\u003e\n\n# Constructed Language Availability\nWe also support some constructed (fictional/fantasy) languages! Expand the table below to see a comprehensive description. ChatGPT was used to generate these lists quickly, so they are incomplete and approximate. Help welcome! To use these languages, add the `constructed` feature.\n\n\u003cdetails\u003e\n    \u003csummary\u003eLanguage Coverage Table\u003c/summary\u003e\n\n| ISO 639-3 Code           | Language                                                                                    |\n|--------------------------|---------------------------------------------------------------------------------------------|\n| qya                      | [Quenya](https://en.wikipedia.org/wiki/Quenya)                                              |\n| sjn                      | [Sindarin](https://en.wikipedia.org/wiki/Sindarin)                                          |\n| tlh                      | [Klingon](https://en.wikipedia.org/wiki/Klingon)                                            |\n| mis (_dot_ is used here) | [Dothraki](https://en.wikipedia.org/wiki/Dothraki_language)                                 |\n| mis (_dov_ is used here) | [Dovahzul](https://www.thuum.org/library/Dovahzul%20Print%20Dictionary%204th%20Edition.pdf) |\n| mis (_nav_ is used here) | [Navi](https://en.wikipedia.org/wiki/Na%CA%BCvi_language)                                   | \n| mis (_val_ is used here) | [High Valyrian](https://en.wikipedia.org/wiki/Valyrian_languages)                           |\n\nThe following prompt was used with the Mar 14, 2023 version of ChatGPT:\n```text\nPlease give me one list of 20+ stop words for each of the following languages: Sindarin, Quenya, Dothraki, Na'vi, \nDovahzul, Klingon, and High Valyrian. I'd like the lists to be formatted as follows:\nSindarin\n1. [word goes here]\n2. [word goes here]\n...\nQuenya\n1. [word goes here]\n...\n\nAnd so on.\n```\n\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmccomb%2Frust-stop-words","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcmccomb%2Frust-stop-words","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmccomb%2Frust-stop-words/lists"}