{"id":19204366,"url":"https://github.com/vxern/wiktionary-scraper","last_synced_at":"2025-05-12T15:45:33.002Z","repository":{"id":191215274,"uuid":"684288591","full_name":"vxern/wiktionary-scraper","owner":"vxern","description":"🇬🇧 An extensible, robust and lightweight (45kB) Wiktionary.org scraper to fetch detailed information about words in various languages.","archived":false,"fork":false,"pushed_at":"2024-12-02T19:54:40.000Z","size":99,"stargazers_count":6,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-08T19:50:15.301Z","etag":null,"topics":["definitions","dictionary","english","etymology","javascript","language","parser","scraper","typescript","wiktionary","words"],"latest_commit_sha":null,"homepage":"https://npmjs.com/package/wiktionary-scraper","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vxern.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-08-28T20:45:37.000Z","updated_at":"2025-03-20T21:50:19.000Z","dependencies_parsed_at":"2024-11-09T12:22:56.229Z","dependency_job_id":"5e7ec40d-6bf0-4ff3-b1f5-f656d5f6d495","html_url":"https://github.com/vxern/wiktionary-scraper","commit_stats":null,"previous_names":["vxern/wiktionary-scraper"],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vxern%2Fwiktionary-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vxern%2Fwiktionary-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vxern%2Fwiktionary-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vxern%2Fwiktionary-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vxern","download_url":"https://codeload.github.com/vxern/wiktionary-scraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253767626,"owners_count":21961144,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["definitions","dictionary","english","etymology","javascript","language","parser","scraper","typescript","wiktionary","words"],"created_at":"2024-11-09T13:07:34.197Z","updated_at":"2025-05-12T15:45:32.980Z","avatar_url":"https://github.com/vxern.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"A lightweight scraper to fetch information about words in various languages from Wiktionary.\n\n## Table of contents\n\n- [Table of contents](#table-of-contents)\n- [Usage](#usage)\n- [Completeness](#completeness)\n    - [Features](#features)\n    - [Section support](#section-support)\n    - [Recognised parts of speech](#recognised-parts-of-speech)\n        - [Parts of speech](#parts-of-speech)\n        - [Morphemes](#morphemes)\n        - [Symbols](#symbols)\n        - [Phrases](#phrases)\n        - [Han characters and language-specific varieties](#han-characters-and-language-specific-varieties)\n        - [Other](#other)\n        - [Explicitly disallowed parts of speech](#explicitly-disallowed-parts-of-speech)\n        - [Library additions](#library-additions)\n\n\n## Usage\n\nTo start using the scraper, first install it using the following command:\n\n```shell\nnpm install wiktionary-scraper\n```\n\nThe simplest way of using the scraper is as follows:\n\n```ts\nimport * as Wiktionary from \"wiktionary-scraper\";\n\nconst results = await Wiktionary.get(\"word\");\n```\n\nYou can change the language of the target word by setting the `lemmaLanguage`:\n\n```ts\nimport * as Wiktionary from \"wiktionary-scraper\";\n\nconst results = await Wiktionary.get('o', {\n  lemmaLanguage: \"Romanian\",\n});\n```\n\nYou can specify if redirects should be followed by setting `followRedirects` to `true`:\n\n```ts\nimport * as Wiktionary from \"wiktionary-scraper\";\n\n// Redirects to and returns results for \"Germany\".\nconst results = await Wiktionary.get('germany', {\n  followRedirects: true,\n});\n```\n\nBy default, the `User-Agent` header used in requests is filled in using a default value mentioning `wiktionary-scraper`.\n\nTo remove it, set `userAgent` to `undefined`.\n\nIf you want to change it, specify `userAgent`:\n\n```ts\nimport * as Wiktionary from \"wiktionary-scraper\";\n\nconst results = await Wiktionary.get('word', {\n  userAgent: \"Your App (https://example.com)\",\n});\n```\n\nYou can also parse HTML of the website directly, bypassing the fetch step.\n\nℹ️ Notice that, as opposed to `get()`, `parse()` is synchronous:\n\n```ts\nimport * as Wiktionary from \"wiktionary-scraper\";\n\nconst results = Wiktionary.parse(html);\n```\n\n## Completeness\n\nThis library currently only supports the English version of Wiktionary.\n\n#### Features\n\n- Parses both single- and multiple-etymology entries.\n- Recognises standard, non-standard and some explicitly disallowed parts of speech, as defined [here](https://en.wiktionary.org/wiki/Wiktionary:Entry_layout#Part_of_speech). In total, there are 60+ recognised parts of speech, which should cover the vast majority of definitions.\n  - Note, however, that it is very possible that the library will fail to recognise certain niche, non-standard parts of speech. Should you come across any, please post an issue.\n\n#### Section support\n\n- [ ] Description\n- [ ] Glyph origin\n- [x] Etymology\n- [ ] Pronunciation\n- [ ] Production\n- [x] Definitions\n- [ ] Usage notes\n- [ ] Reconstruction notes\n- [ ] _Inflection sections_:\n  - [ ] Inflection\n  - [ ] Conjugation\n  - [ ] Declension\n- [ ] Mutation\n- [ ] Quotations\n- [ ] Alternative forms\n- [ ] Alternative reconstructions\n- [ ] _Relations_:\n  - [ ] Synonyms\n  - [ ] Antonyms\n  - [ ] Hypernyms\n  - [ ] Hyponyms\n  - [ ] Meronyms\n  - [ ] Holonyms\n  - [ ] Comeronyms\n  - [ ] Troponyms\n  - [ ] Parasynonyms\n  - [ ] Coordinate terms\n  - [ ] Derived terms\n  - [ ] Related terms\n- [ ] Translations\n- [ ] Trivia\n- [ ] See also\n- [ ] References\n- [ ] Further reading\n- [ ] Anagrams\n- [ ] Examples\n\n#### Recognised parts of speech\n\n###### Parts of speech\n\n- Adjective\n- Adverb\n- Ambiposition\n- Article\n- Circumposition\n- Classifier\n- Conjunction\n- Contraction\n- Counter\n- Determiner\n- Ideophone\n- Interjection\n- Noun\n- Numeral\n- Participle\n- Particle\n- Postposition\n- Preposition\n- Pronoun\n- Proper noun\n- Verb\n\n###### Morphemes\n\n- Circumfix\n- Combining form\n- Infix\n- Interfix\n- Prefix\n- Root\n- Suffix\n\n###### Symbols\n\n- Diacritical mark\n- Letter\n- Ligature\n- Number\n- Punctuation mark\n- Syllable\n- Symbol\n\n###### Phrases\n\n- Phrase\n- Proverb\n- Prepositional phrase\n\n###### Han characters and language-specific varieties\n\n- Han character\n- Hanzi\n- Kanji\n- Hanja\n\n###### Other\n\n- Romanization\n- Logogram\n- Determinative\n\n###### Explicitly disallowed parts of speech\n\nYou know, just in case somebody didn't follow the rules on Wiktionary.\n\n- Abbreviation\n- Acronym\n- Initialism\n- Cardinal-number\n- Ordinal-number\n- Cardinal-numeral\n- Ordinal-numeral\n- Clitic\n- Gerund\n- Idiom\n\n###### Library additions\n\n- Adposition\n- Affix\n- Character","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvxern%2Fwiktionary-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvxern%2Fwiktionary-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvxern%2Fwiktionary-scraper/lists"}