{"id":50244744,"url":"https://github.com/JuliaText/Word2Vec.jl","last_synced_at":"2026-06-12T14:01:07.812Z","repository":{"id":46090828,"uuid":"41885513","full_name":"JuliaText/Word2Vec.jl","owner":"JuliaText","description":"Julia interface to word2vec","archived":false,"fork":false,"pushed_at":"2021-11-15T07:45:43.000Z","size":2674,"stargazers_count":62,"open_issues_count":7,"forks_count":15,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-11-21T17:23:14.912Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JuliaText.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-09-03T21:49:18.000Z","updated_at":"2025-03-25T11:17:55.000Z","dependencies_parsed_at":"2022-09-05T17:40:45.262Z","dependency_job_id":null,"html_url":"https://github.com/JuliaText/Word2Vec.jl","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/JuliaText/Word2Vec.jl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaText%2FWord2Vec.jl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaText%2FWord2Vec.jl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaText%2FWord2Vec.jl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaText%2FWord2Vec.jl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JuliaText","download_url":"https://codeload.github.com/JuliaText/Word2Vec.jl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaText%2FWord2Vec.jl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34247461,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-26T23:00:19.765Z","updated_at":"2026-06-12T14:01:07.806Z","avatar_url":"https://github.com/JuliaText.png","language":"Julia","funding_links":[],"categories":["Libraries"],"sub_categories":["Books"],"readme":"# Word2Vec\n\n[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](LICENSE.md)\n[![CI](https://github.com/juliatext/Word2Vec.jl/workflows/CI/badge.svg?event=push\u0026branch=master)](https://github.com/JuliaText/Word2Vec.jl/actions?query=workflow%3ACI)\n[![version](https://juliahub.com/docs/Word2Vec/version.svg)](https://juliahub.com/ui/Packages/Word2Vec/x04dc)\n[![pkgeval](https://juliahub.com/docs/Word2Vec/pkgeval.svg)](https://juliahub.com/ui/Packages/Word2Vec/x04dc)\n[![deps](https://juliahub.com/docs/Word2Vec/deps.svg)](https://juliahub.com/ui/Packages/Word2Vec/x04dc?t=2)\n\n\n\nJulia interface to [word2vec](https://code.google.com/p/word2vec/)\n\nWord2Vec takes a text corpus as input and produces the word vectors as\noutput. Training is done using the original C code, other\nfunctionalities are pure Julia. See [demo](http://nbviewer.ipython.org/github/JuliaText/Word2Vec.jl/blob/master/examples/demo.ipynb) for more details.\n\n* [Release Notes](https://github.com/JuliaText/Word2Vec.jl/blob/master/NEWS.md)\n\n## Installation\n\n```julia\nPkg.add(\"Word2Vec\")\n```\n\n**Note**: Only linux and OS X are supported.\n\n## Functions\n\nAll exported functions are documented, i.e., we can type `? functionname`\nto get help. For a list of functions, see [here](https://github.com/JuliaText/Word2Vec.jl/blob/master/doc/README.md).\n\n## Examples\n\nWe first download some text corpus, for example http://mattmahoney.net/dc/text8.zip.\n\nSuppose the file ``text8`` is stored in the current working directory.\nWe can train the model with the function ``word2vec``.\n\n```julia\njulia\u003e word2vec(\"text8\", \"text8-vec.txt\", verbose = true)\nStarting training using file text8\nVocab size: 71291\nWords in train file: 16718843\nAlpha: 0.000002  Progress: 100.04%  Words/thread/sec: 350.44k  \n```\n\nNow we can import the word vectors ``text8-vec.txt`` to Julia.\n\n```julia\njulia\u003e model = wordvectors(\"./text8-vec\")\nWordVectors 71291 words, 100-element Float64 vectors\n```\n\nThe vector representation of a word can be obtained using\n``get_vector``.\n\n```julia\njulia\u003e get_vector(model, \"book\")'\n100-element Array{Float64,1}:\n -0.05446138539336186\n  0.001090934639284009\n  0.06498087707990222\n  ⋮\n -0.0024113040415322516\n  0.04755140828570571\n  0.039764719065723826\n```\n\nThe cosine similarity of ``book``, for example, can be computed using\n``cosine_similar_words``.\n\n```julia\njulia\u003e cosine_similar_words(model, \"book\")\n10-element Array{String,1}:\n \"book\"\n \"books\"\n \"diary\"\n \"story\"\n \"chapter\"\n \"novel\"\n \"preface\"\n \"poem\"\n \"tale\"\n \"bible\"\n```\n\nWord vectors have many interesting properties. For example, \n``vector(\"king\") - vector(\"man\") + vector(\"woman\")`` is close to\n``vector(\"queen\")``.\n\n```julia\n5-element Array{String,1}:\n \"queen\"\n \"empress\"\n \"prince\"\n \"princess\"\n \"throne\"\n```\n\n## References\n\n- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean,\n  \"Efficient Estimation of Word Representations in Vector Space\",\n  *In Proceedings of Workshop at ICLR*, 2013.\n  [[pdf]](http://arxiv.org/pdf/1301.3781.pdf)\n\n- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean.\n  \"Distributed Representations of Words and Phrases and their\n  Compositionality\", *In Proceedings of NIPS*, 2013.\n  [[pdf]](http://arxiv.org/pdf/1310.4546.pdf)\n\n- Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig,\n  \"Linguistic Regularities in Continuous Space Word Representations\",\n  *In Proceedings of NAACL HLT*, 2013.\n  [[pdf]](http://research.microsoft.com/pubs/189726/rvecs.pdf)\n\n## Acknowledgements\n\nThe design of the package is inspired by Daniel Rodriguez\n(@danielfrg)'s\n[Python word2vec interface](https://github.com/danielfrg/word2vec).\n\n## Reporting Bugs\n\nPlease [file an issue](https://github.com/JuliaText/Word2Vec.jl/issues/new) to report a bug or request a feature.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJuliaText%2FWord2Vec.jl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJuliaText%2FWord2Vec.jl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJuliaText%2FWord2Vec.jl/lists"}