{"id":17104529,"url":"https://github.com/emfomy/hwe","last_synced_at":"2025-03-23T20:12:43.520Z","repository":{"id":98759429,"uuid":"148136188","full_name":"emfomy/hwe","owner":"emfomy","description":"Heterogeneous Word Embedding","archived":false,"fork":false,"pushed_at":"2018-12-30T13:30:59.000Z","size":1228,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"ver.C","last_synced_at":"2025-01-29T03:36:01.540Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/emfomy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-10T10:06:37.000Z","updated_at":"2019-05-09T00:22:30.000Z","dependencies_parsed_at":"2023-05-01T05:01:22.950Z","dependency_job_id":null,"html_url":"https://github.com/emfomy/hwe","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emfomy%2Fhwe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emfomy%2Fhwe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emfomy%2Fhwe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emfomy%2Fhwe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/emfomy","download_url":"https://codeload.github.com/emfomy/hwe/tar.gz/refs/heads/ver.C","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245162194,"owners_count":20570692,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-14T15:37:10.167Z","updated_at":"2025-03-23T20:12:43.506Z","avatar_url":"https://github.com/emfomy.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Heterogeneous Word Embedding\n\n# Information\n\nThis library is a C implementation of the Heterogeneous Word Embedding (HWE), which is a general and flexible framework to incorporate each type (e.g. word-sense, part-of-speech, topic) of contextual feature for learning feature-specific word embeddings in an explicit fashion.\n\n# Data format\n\n## Training for HWE-POS or HWE-Topic\n- Parameter setting: ```-fmode 1```\n- Corpus file: Each word is appended by a corresponded feature.\n\t- Format: `\u003cWORD\u003e(\u003cFEATURE\u003e)`\n\t- Example:\n\t\t- Original sentence: ```my dog also likes eating sausage.```\n\t\t- Modified sentence: ```my(PRP$) dog(NN) also(RB) likes(VBZ) eating(VBG) sausage(NN)```\n\n\n## Training for HWE-Sense\n\n- Parameter setting: ```-fomde 2 -knfile \u003cknowledge file\u003e```\n- Knowledge file: Each row contains a sense and corresponding words.\n\t- Fomat: `\u003cSense\u003e \u003cWord List\u003e`\n\t- Example:\n\t\t- Line1: ```SENSE_FRUIT apple banana grape```\n\t\t- Line2: ```SENSE_ANIMAL tiger monkey```\n\n## Attention\n- The words/features are represented in lower/upper-cases respectively.\n\n# Usage\n\n## Compile\n\n```\nmake hwe\n```\n\n## Setting\n```\n-train \u003cfile\u003e\n    Use text data from \u003cfile\u003e to train the model\n-output \u003cfile\u003e\n    Use \u003cfile\u003e to save the resulting word vectors / word clusters\n-size \u003cint\u003e\n    Set size of word vectors; default is 100\n-window \u003cint\u003e\n    Set max skip length between words; default is 5\n-sample \u003cfloat\u003e\n    Set threshold for occurrence of words. Those that appear with higher frequency in the training data\n    will be randomly down-sampled; default is 1e-3, useful range is (0, 1e-5)\n-negative \u003cint\u003e\n    Number of negative examples; default is 5, common values are 3 - 10 (0 = not used)\n-threads \u003cint\u003e\n    Use \u003cint\u003e threads (default 12)\n-iter \u003cint\u003e\n    Run more training iterations (default 5)\n-min-count \u003cint\u003e\n    This will discard words that appear less than \u003cint\u003e times; default is 5\n-alpha \u003cfloat\u003e\n    Set the starting learning rate; default is 0.025\n-debug \u003cint\u003e\n    Set the debug mode (default = 2 = more info during training)\n-binary \u003cint\u003e\n    Save the resulting vectors in binary moded; default is 0 (off)\n-save-vocab \u003cfile\u003e\n    The vocabulary will be saved to \u003cfile\u003e\n-read-vocab \u003cfile\u003e\n    The vocabulary will be read from \u003cfile\u003e, not constructed from the training data\n-fmode \u003cint\u003e\n    Enable the Feature mode (default = 0)\n        0 = only using skip-gram\n        1 = predicting self-feature of sequential feature tag\n        2 = predicting self-feature of global feature table\n-knfile \u003cfile\u003e\n    The sense-words file will be read from \u003cfile\u003e\n```\n\n## Example\n\n```\nwget http://cs.fit.edu/~mmahoney/compression/enwik8.zip\nunzip demo/enwik8\n\n./hwe -train enwik8 -output enwik8.emb -size 100 -window 5 -sample 1e-4 -negative 5 -binary 0 -fmode 2 -knfile demo/wordnetlower.tree -iter 2 -threads 32\n```\n\n## Author\n* Fan Jhih-Sheng \u003c\u003cfann1993814@gmail.com\u003e\u003e\n* Mu Yang \u003c\u003cemfomy@gmail.com\u003e\u003e\n\n# Reference\n\n* [Jhih-Sheng Fan, Mu Yang, Peng-Hsuan Li and Wei-Yun Ma, “HWE: Word Embedding with Heterogeneous Features”, ICSC2019](https://muyang.pro/file/paper/icsc_2019_hwe.pdf)\n\n# License\n[![License: CC BY-NC-SA 4.0](https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png)](https://creativecommons.org/licenses/by-nc-sa/4.0/) Copyright (c) 2017-2018 Fan Jhih-Sheng \u0026 Mu Yang under the [CC-BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/). All rights reserved.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femfomy%2Fhwe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Femfomy%2Fhwe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femfomy%2Fhwe/lists"}