{"id":17665829,"url":"https://github.com/darenr/wordnet-clusters","last_synced_at":"2025-05-07T23:38:26.128Z","repository":{"id":79273893,"uuid":"52686906","full_name":"darenr/wordnet-clusters","owner":"darenr","description":"Clustering a set of word/tags using K-Means with word2vec or wordnet distance","archived":false,"fork":false,"pushed_at":"2019-03-05T06:08:09.000Z","size":221,"stargazers_count":26,"open_issues_count":2,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-07T23:38:13.557Z","etag":null,"topics":["clustering","k-means-clustering","k-means-implementation-in-python","tags","word2vec","wordnet-clusters"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/darenr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-02-27T20:20:45.000Z","updated_at":"2025-01-18T13:20:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"b656c7c6-85d1-4e96-9dda-bd750cbe845d","html_url":"https://github.com/darenr/wordnet-clusters","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darenr%2Fwordnet-clusters","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darenr%2Fwordnet-clusters/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darenr%2Fwordnet-clusters/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darenr%2Fwordnet-clusters/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/darenr","download_url":"https://codeload.github.com/darenr/wordnet-clusters/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252973607,"owners_count":21834104,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","k-means-clustering","k-means-implementation-in-python","tags","word2vec","wordnet-clusters"],"created_at":"2024-10-23T21:08:01.328Z","updated_at":"2025-05-07T23:38:26.104Z","avatar_url":"https://github.com/darenr.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tag Clustering using `wordnet` and `word2vec` distance metrics\n\nClustering a set of `wordnet` synsets using `k-means`, the `wordnet` pair-wise distance (semantic relatedness) of word senses using the [Edge Counting method of the of Wu \u0026 Palmer (1994)](https://pdfs.semanticscholar.org/6eff/221e1cf5ae28ce7dcb60515d028b98e37aa5.pdf) is mapped to the euclidean distance to allow K-means to converge preserving the original pair-wise relationship.\n\nBy toggling `use_wordnet = False` to `True` the distance metric between words will use a `GloVe` model `glove.6B.300d_word2vec.txt` (this must be in the [word2vec format](https://radimrehurek.com/gensim/scripts/glove2word2vec.html)) and the `word2vec` similarity value\n\n`extras` folder is proof of concept/experimentations\n\n# To Use:\n\n- create a newline delimited file with a list of `wordnet` senses (eg. data/example_tags.txt)\n- to use `wordnet` set `use_wordnet=True`, to use `word2vec` `use_wordnet=False`\n- ```python generate-tag-clusters.py data/example_tags.txt 25 0.7```\n  - 25 is the number of clusters to segment the list of `wordnet` senses into.\n  - 0.7 is the similarity threshold, below this the words are considered not similar\n- results places into the `results` folder as a json file\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarenr%2Fwordnet-clusters","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdarenr%2Fwordnet-clusters","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarenr%2Fwordnet-clusters/lists"}