{"id":13601357,"url":"https://github.com/ynqa/wego","last_synced_at":"2025-04-05T08:05:50.858Z","repository":{"id":41117496,"uuid":"80434833","full_name":"ynqa/wego","owner":"ynqa","description":"Word Embeddings in Go!","archived":false,"fork":false,"pushed_at":"2023-04-02T16:29:22.000Z","size":7321,"stargazers_count":474,"open_issues_count":5,"forks_count":41,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-10-29T16:42:07.452Z","etag":null,"topics":["glove","go","machine-learning","nlp","word-embeddings","word2vec"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ynqa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-01-30T15:38:58.000Z","updated_at":"2024-10-27T18:34:05.000Z","dependencies_parsed_at":"2023-07-14T16:00:50.146Z","dependency_job_id":null,"html_url":"https://github.com/ynqa/wego","commit_stats":{"total_commits":251,"total_committers":7,"mean_commits":"35.857142857142854","dds":"0.11155378486055778","last_synced_commit":"4ce56c0b4c6d46d414f2a59c9d7c351dc14c8945"},"previous_names":["ynqa/word-embedding"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ynqa%2Fwego","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ynqa%2Fwego/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ynqa%2Fwego/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ynqa%2Fwego/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ynqa","download_url":"https://codeload.github.com/ynqa/wego/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247305933,"owners_count":20917208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["glove","go","machine-learning","nlp","word-embeddings","word2vec"],"created_at":"2024-08-01T18:01:01.234Z","updated_at":"2025-04-05T08:05:50.820Z","avatar_url":"https://github.com/ynqa.png","language":"Go","funding_links":[],"categories":["Go","Repositories"],"sub_categories":[],"readme":"# Word Embeddings in Go\n\n[![Go](https://github.com/ynqa/wego/actions/workflows/go.yml/badge.svg)](https://github.com/ynqa/wego/actions/workflows/go.yml)\n[![GoDoc](https://godoc.org/github.com/ynqa/wego?status.svg)](https://godoc.org/github.com/ynqa/wego)\n[![Go Report Card](https://goreportcard.com/badge/github.com/ynqa/wego)](https://goreportcard.com/report/github.com/ynqa/wego)\n\n*wego* is the implementations **from scratch** for word embeddings (a.k.a word representation) models in Go.\n\n## What's word embeddings?\n\n[Word embeddings](https://en.wikipedia.org/wiki/Word_embeddings) make words' meaning, structure, and concept mapping into vector space with a low dimension. For representative instance:\n```\nVector(\"King\") - Vector(\"Man\") + Vector(\"Woman\") = Vector(\"Queen\")\n```\nLike this example, the models generate word vectors that could calculate word meaning by arithmetic operations for other vectors.\n\n## Features\n\nThe following models to capture the word vectors are supported in *wego*:\n\n- Word2Vec: Distributed Representations of Words and Phrases and their Compositionality [[pdf]](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)\n\n- GloVe: Global Vectors for Word Representation [[pdf]](http://nlp.stanford.edu/pubs/glove.pdf)\n\n- LexVec: Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations [[pdf]](http://anthology.aclweb.org/P16-2068)\n\nAlso, wego provides nearest neighbor search tools that calculate the distances between word vectors and find the nearest words for the target word. \"near\" for word vectors means \"similar\" for words.\n\nPlease see the [Usage](#Usage) section if you want to know how to use these for more details.\n\n## Why Go?\n\nInspired by [Data Science in Go](https://speakerdeck.com/chewxy/data-science-in-go) @chewxy\n\n## Installation\n\nUse `go` command to get this pkg.\n\n```\n$ go get -u github.com/ynqa/wego\n$ bin/wego -h\n```\n\n## Usage\n\n*wego* provides CLI and Go SDK for word embeddings.\n\n### CLI\n\n```\nUsage:\n  wego [flags]\n  wego [command]\n\nAvailable Commands:\n  console     Console to investigate word vectors\n  glove       GloVe: Global Vectors for Word Representation\n  help        Help about any command\n  lexvec      Lexvec: Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations\n  query       Query similar words\n  word2vec    Word2Vec: Continuous Bag-of-Words and Skip-gram model\n```\n\n`word2vec`, `glove` and `lexvec` executes the workflow to generate word vectors:\n1. Build a dictionary for vocabularies and count word frequencies by scanning a given corpus.\n2. Start training. The execution time depends on the size of the corpus, the hyperparameters (flags), and so on.\n3. Save the words and their vectors as a text file.\n\n`query` and `console` are the commands which are related to nearest neighbor searching for the trained word vectors.\n\n`query` outputs similar words against a given word using sing word vectors which are generated by the above models.\n\ne.g. `wego query -i word_vector.txt microsoft`:\n```\n  RANK |   WORD    | SIMILARITY\n-------+-----------+-------------\n     1 | hypercard |   0.791492\n     2 | xp        |   0.768939\n     3 | software  |   0.763369\n     4 | freebsd   |   0.761084\n     5 | unix      |   0.749563\n     6 | linux     |   0.747327\n     7 | ibm       |   0.742115\n     8 | windows   |   0.731136\n     9 | desktop   |   0.715790\n    10 | linspire  |   0.711171\n```\n\n*wego* does not reproduce word vectors between each trial because it adopts HogWild! algorithm which updates the parameters (in this case word vector) async.\n\n`console` is for REPL mode to calculate the basic arithmetic operations (`+` and `-`) for word vectors.\n\n### Go SDK\n\nIt can define the hyper parameters for models by functional options.\n\n```go\nmodel, err := word2vec.New(\n\tword2vec.Window(5),\n\tword2vec.Model(word2vec.Cbow),\n\tword2vec.Optimizer(word2vec.NegativeSampling),\n\tword2vec.NegativeSampleSize(5),\n\tword2vec.Verbose(),\n)\n```\n\nThe models have some methods:\n\n```go\ntype Model interface {\n\tTrain(io.ReadSeeker) error\n\tSave(io.Writer, vector.Type) error\n\tWordVector(vector.Type) *matrix.Matrix\n}\n```\n\n### Formats\n\nAs training word vectors wego requires the following file formats for inputs/outputs.\n\n#### Input\n\nInput corpus must be subject to the formats to be divided by space between words like [text8](http://mattmahoney.net/dc/textdata.html).\n\n```\nword1 word2 word3 ...\n```\n\n#### Output\n\nAfter training *wego* save the word vectors into a txt file with the following format (`N` is the dimension for word vectors you given):\n\n```\n\u003cword\u003e \u003cvalue_1\u003e \u003cvalue_2\u003e ... \u003cvalue_N\u003e\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fynqa%2Fwego","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fynqa%2Fwego","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fynqa%2Fwego/lists"}