{"id":13502179,"url":"https://github.com/alvations/pywsd","last_synced_at":"2025-05-15T17:01:29.368Z","repository":{"id":12928636,"uuid":"15606309","full_name":"alvations/pywsd","owner":"alvations","description":"Python Implementations of Word Sense Disambiguation (WSD) Technologies.","archived":false,"fork":false,"pushed_at":"2022-07-29T17:01:53.000Z","size":136548,"stargazers_count":746,"open_issues_count":19,"forks_count":132,"subscribers_count":42,"default_branch":"master","last_synced_at":"2025-03-31T20:05:36.266Z","etag":null,"topics":["lesk","nlp","python","wordnet","wsd"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alvations.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-01-03T09:48:52.000Z","updated_at":"2025-02-16T16:43:31.000Z","dependencies_parsed_at":"2022-08-25T07:50:36.380Z","dependency_job_id":null,"html_url":"https://github.com/alvations/pywsd","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvations%2Fpywsd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvations%2Fpywsd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvations%2Fpywsd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvations%2Fpywsd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alvations","download_url":"https://codeload.github.com/alvations/pywsd/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247730069,"owners_count":20986404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lesk","nlp","python","wordnet","wsd"],"created_at":"2024-07-31T22:02:05.064Z","updated_at":"2025-04-07T21:12:37.810Z","avatar_url":"https://github.com/alvations.png","language":"Python","readme":"[![Build Status](https://travis-ci.org/alvations/pywsd.svg?branch=master)](https://travis-ci.org/alvations/pywsd)\n[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://pypi.python.org/pypi/ansicolortags/)\n[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Falvations%2Fpywsd.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Falvations%2Fpywsd?ref=badge_shield)\n\npywsd\n=====\n\nPython Implementations of Word Sense Disambiguation (WSD) technologies:\n\n* **Lesk algorithms**\n  * Original Lesk (Lesk, 1986)\n  * Adapted/Extended Lesk (Banerjee and Pederson, 2002/2003)\n  * Simple Lesk (with definition, example(s) and hyper+hyponyms)\n  * Cosine Lesk (use cosines to calculate overlaps instead of using raw counts)\n  \u003c!-- * Enhanced Lesk (Basile et al. 2014) (in wishlist) --\u003e\n\n* **Maximizing Similarity** (see also, [Pedersen et al. (2003)](http://www.d.umn.edu/~tpederse/Pubs/max-sem-relate.pdf))\n\n  * Path similarity (Wu-Palmer, 1994; Leacock and Chodorow, 1998)\n  * Information Content (Resnik, 1995; Jiang and Corath, 1997; Lin, 1998)\n\n\u003c!--\n* **Supervised WSD** (in progress)\n  * SVM WSD (Lee, Ng and Chia 2004)\n  * It Makes Sense (IMS) (Zhong and Ng, 2010)\n\n* **Vector Space Models** (in wishlist)\n  * LSI/LSA\n  * Topic Models, LDA (Li et al. 2012)\n  * NMF\n\n* **Graph based Models** (in wishlist)\n  * Babelfly (Moro et al. 2014)\n  * UKB (Agirre and Soroa, 2009)\n--\u003e\n\n* **Baselines**\n  * Random sense\n  * First NLTK sense\n  * Highest lemma counts\n\n**NOTE**: PyWSD only supports Python 3 now (`pywsd\u003e=1.2.0`).\nIf you're using Python 2, the last possible version is `pywsd==1.1.7`.\n\nInstall\n====\n\n```\npip install -U nltk\npython -m nltk.downloader 'popular'\npip install -U pywsd\n```\n\nUsage\n=====\n\n```python\n$ python\n\u003e\u003e\u003e from pywsd.lesk import simple_lesk\n\u003e\u003e\u003e sent = 'I went to the bank to deposit my money'\n\u003e\u003e\u003e ambiguous = 'bank'\n\u003e\u003e\u003e answer = simple_lesk(sent, ambiguous, pos='n')\n\u003e\u003e\u003e print answer\nSynset('depository_financial_institution.n.01')\n\u003e\u003e\u003e print answer.definition()\n'a financial institution that accepts deposits and channels the money into lending activities'\n```\n\nFor all-words WSD, try:\n\n```python\n\u003e\u003e\u003e from pywsd import disambiguate\n\u003e\u003e\u003e from pywsd.similarity import max_similarity as maxsim\n\u003e\u003e\u003e disambiguate('I went to the bank to deposit my money')\n[('I', None), ('went', Synset('run_low.v.01')), ('to', None), ('the', None), ('bank', Synset('depository_financial_institution.n.01')), ('to', None), ('deposit', Synset('deposit.v.02')), ('my', None), ('money', Synset('money.n.03'))]\n\u003e\u003e\u003e disambiguate('I went to the bank to deposit my money', algorithm=maxsim, similarity_option='wup', keepLemmas=True)\n[('I', 'i', None), ('went', u'go', Synset('sound.v.02')), ('to', 'to', None), ('the', 'the', None), ('bank', 'bank', Synset('bank.n.06')), ('to', 'to', None), ('deposit', 'deposit', Synset('deposit.v.02')), ('my', 'my', None), ('money', 'money', Synset('money.n.01'))]\n```\n\nTo read pre-computed signatures per synset:\n\n```python\n\u003e\u003e\u003e from pywsd.lesk import cached_signatures\n\u003e\u003e\u003e cached_signatures['dog.n.01']['simple']\nset([u'canid', u'belgian_griffon', u'breed', u'barker', ... , u'genus', u'newfoundland'])\n\u003e\u003e\u003e cached_signatures['dog.n.01']['adapted']\nset([u'canid', u'belgian_griffon', u'breed', u'leonberg', ... , u'newfoundland', u'pack'])\n\n\u003e\u003e\u003e from nltk.corpus import wordnet as wn\n\u003e\u003e\u003e wn.synsets('dog')[0]\nSynset('dog.n.01')\n\u003e\u003e\u003e dog = wn.synsets('dog')[0]\n\u003e\u003e\u003e dog.name()\nu'dog.n.01'\n\u003e\u003e\u003e cached_signatures[dog.name()]['simple']\nset([u'canid', u'belgian_griffon', u'breed', u'barker', ... , u'genus', u'newfoundland'])\n```\n\n***\n\nCite\n====\n\nTo cite `pywsd`:\n\nLiling Tan. 2014. Pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies [software]. Retrieved from  https://github.com/alvations/pywsd\n\nIn `bibtex`:\n\n```\n@misc{pywsd14,\nauthor =   {Liling Tan},\ntitle =    {Pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies [software]},\nhowpublished = {https://github.com/alvations/pywsd},\nyear = {2014}\n}\n```\n\n***\n\n\u003c!--\n| Algorithm  | Citations | Status | Comment |\n|:--|:--|:--|:--|\n| Original Lesk | (Lesk, 1986) | `pywsd.lesk.original_lesk` | - |\n| Adapted/Extended Lesk |  (Banerjee and Pederson, 2002/2003) | `pywsd.lesk.adapted_lesk` | - |\n| Simple Lesk | (Tan, 2014) | `pywsd.lesk.simple_lesk` | Uses definitions, examples, lemma_names|\n| Cosine Lesk | (Tan, 2014) | `pywsd.lesk.cosine_lesk` | use cosines to calculate overlaps instead of using raw counts|\n| Enhanced Lesk | (Basile et al. 2014) | (in wishlist) | - |\n\n--\u003e\n\nReferences\n=========\n\n* Michael Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation (SIGDOC '86), Virginia DeBuys (Ed.). ACM, New York, NY, USA, 24-26. DOI=10.1145/318723.318728 http://doi.acm.org/10.1145/318723.318728\n\n* Satanjeev Banerjee and Ted Pedersen. 2002. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLing '02), Alexander F. Gelbukh (Ed.). Springer-Verlag, London, UK, UK, 136-145.\n\n* Satanjeev Banerjee and Ted Pedersen. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the Eighteenth International\nJoint Conference on Artificial Intelligence, pages 805–810, Acapulco.\n\n* Jay J. Jiang and David W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics, Taiwan.\n\n* Claudia Leacock and Martin Chodorow. 1998. Combining local context and WordNet similarity for word sense identification. In Fellbaum 1998, pp. 265–283.\n\n* Lee, Yoong Keok, Hwee Tou Ng, and Tee Kiah Chia. \"Supervised word sense disambiguation with support vector machines and multiple knowledge sources.\" Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. 2004.\n\n* Dekang Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI.\n\n* Linlin Li, Benjamin Roth and Caroline Sporleder. 2010. Topic Models for Word Sense Disambiguation and Token-based Idiom Detection. The 48th Annual Meeting of the Association for Computational Linguistics (ACL). Uppsala, Sweden.\n\n* Andrea Moro, Roberto Navigli, Francesco Maria Tucci and Rebecca J. Passonneau. 2014. Annotating the MASC Corpus with BabelNet. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland.\n\n* Zhi Zhong and Hwee Tou Ng. 2010. It makes sense: a wide-coverage word sense disambiguation system for free text. In Proceedings of the ACL 2010 System Demonstrations (ACLDemos '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 78-83.\n\n* Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python (1st ed.). O'Reilly Media, Inc..\n\n* Eneko Agirre and Aitor Soroa. 2009. Personalizing PageRank for Word Sense Disambiguation. Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL-2009). Athens, Greece.\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falvations%2Fpywsd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falvations%2Fpywsd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falvations%2Fpywsd/lists"}