{"id":27426949,"url":"https://github.com/hyper-node/word_sense_induction","last_synced_at":"2025-06-28T12:32:31.116Z","repository":{"id":82615125,"uuid":"197623547","full_name":"Hyper-Node/word_sense_induction","owner":"Hyper-Node","description":"Implementation(s) of word sense induction method(s)","archived":false,"fork":false,"pushed_at":"2019-07-18T23:37:32.000Z","size":24,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-04-14T12:57:28.693Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Hyper-Node.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-07-18T16:41:02.000Z","updated_at":"2019-07-18T23:37:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"4b0f1abc-b56f-43cf-adc4-954c5caed77e","html_url":"https://github.com/Hyper-Node/word_sense_induction","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Hyper-Node/word_sense_induction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hyper-Node%2Fword_sense_induction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hyper-Node%2Fword_sense_induction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hyper-Node%2Fword_sense_induction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hyper-Node%2Fword_sense_induction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Hyper-Node","download_url":"https://codeload.github.com/Hyper-Node/word_sense_induction/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hyper-Node%2Fword_sense_induction/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262431426,"owners_count":23310040,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-14T12:45:38.110Z","updated_at":"2025-06-28T12:32:31.109Z","avatar_url":"https://github.com/Hyper-Node.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# word_sense_induction\nImplementation(s) of word sense induction (WSI) method(s)\n\n## Description\nAt the moment the code in this repository implements a word sense induction method from Brody and Lapata, \nwhich is described in detail in a [2009 paper](https://dl.acm.org/citation.cfm?id=1609078), called Bayesian Word Sense Induction. \n\nThe implementation covers the base 'Bayesian sense induction model' introduced on page 105 in the paper on figure 1. \n\nThis is created to understand the base concepts of WSI, it does not aim to cover a full implementation of the method \nintroduced in the paper necessarily. \n\nBasically it creates several Latent Dirichlet Allocation (LDA) topics for the contexts around a certain word. \nThese sets of associated words are indicators for the senses the word is used. \n\n## Data used\nFor creating the LDA-model, a corpus with 1 million german words in form of full sentences from Uni-Leipzig (2011-mixed typical)\nwas used. This corpus and corpora of several other languages can be found [here](http://wortschatz.uni-leipzig.de/en/download/).\nFormat description on this corpus can be found [here](http://pcai056.informatik.uni-leipzig.de/downloads/corpora/Format_Download_File-eng.pdf)\n\nFor later evaluation with decision on the meaning (WSD) an pre-annotated set of german words from Uni-Heidelberg can be used, \nit can be found [here](http://projects.cl.uni-heidelberg.de/dewsd/files.shtml#gold)\n\nSeveral Word Sense Disambiguation tasks were done as part of SemEval-Workshop, which is probably also a good source \nfor annotated corpora. \n\n## Basic Workflow\n1. Sentences which contain the relevant word are obtained, in this sentences a context window is applied and special chars are filtered\n2. LDA Model is created with the sentences as input \n3. Detected Top-10 associated words for each sense are logged to output \n\n## Future Improvements\n- implement more then one feature layer and combine the results, as depicted in figure 2 page 106 in the paper\n- implement evaluation method with f-scores and annotated set\n- implement automated parameter optimization\n- implement a dictionary lookup for the obtained senses \n- only use nouns and relevant words for context?","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyper-node%2Fword_sense_induction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhyper-node%2Fword_sense_induction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyper-node%2Fword_sense_induction/lists"}