{"id":26833223,"url":"https://github.com/madcato/word2vec-rb","last_synced_at":"2025-06-29T13:38:06.044Z","repository":{"id":56898391,"uuid":"361673733","full_name":"madcato/word2vec-rb","owner":"madcato","description":"Ruby interface gem to use word2vec arithmetics.","archived":false,"fork":false,"pushed_at":"2022-05-06T06:03:17.000Z","size":48,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-30T10:49:58.316Z","etag":null,"topics":["machine-learning","ml","nlp","ruby","word2vec"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/madcato.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-26T08:26:35.000Z","updated_at":"2025-01-07T12:18:23.000Z","dependencies_parsed_at":"2022-08-21T02:20:28.588Z","dependency_job_id":null,"html_url":"https://github.com/madcato/word2vec-rb","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/madcato/word2vec-rb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madcato%2Fword2vec-rb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madcato%2Fword2vec-rb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madcato%2Fword2vec-rb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madcato%2Fword2vec-rb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/madcato","download_url":"https://codeload.github.com/madcato/word2vec-rb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madcato%2Fword2vec-rb/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262601311,"owners_count":23335234,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","ml","nlp","ruby","word2vec"],"created_at":"2025-03-30T15:28:23.115Z","updated_at":"2025-06-29T13:38:06.023Z","avatar_url":"https://github.com/madcato.png","language":"C","readme":"# word2vec-rb\n\nGem using word2vec functionality from https://code.google.com/archive/p/word2vec/\n\nThis gem was developed using the `.c` files of the Google word2vec as base. Mostly by applying copy-and-paste.\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'word2vec-rb'\n```\n\nAnd then execute:\n\n    $ bundle install\n\nOr install it yourself as:\n\n    $ gem install word2vec-rb\n\n## Usage\n\n### Distance arithmetic: to find the nearest words, try:\n\n```ruby\nrequire 'word2vec'\n\nmodel = Word2vec::Model.load(\"./data/minimal.bin\")\nwords = model.distance(\"from\")\nwords.each do |w| \n  puts \"#{w.first} #{w.last}\"\nend\n```\n\n### Analogy arithmetic: to find the analogy with three words, try:\n\n```ruby\nrequire 'word2vec'\n\nmodel = Word2vec::Model.load(\"./data/minimal.bin\")\nwords = model.analogy(\"spain\", \"madrid\", \"france\")\n# In a well prepared vectors file (high quality), first word would be \"Paris\"\nwords.each do |w| \n  puts \"#{w.first} #{w.last}\"\nend\n```\n\n### Accuray: test accuracy of the vectors:\n\nDefine a file with the analogies to test, format:\n: section heading\nWord1 Word2 Word3 Word4\n\nSample:\n\n    : capital-common-countries\n    Athens Greece Baghdad Iraq\n    Athens Greece Bangkok Thailand\n\n```ruby\nrequire 'word2vec'\n\nmodel = Word2vec::Model.load(file_name)\nmodel.accuracy(\"./data/questions-words.txt\")\n\n# Outputs the results on terminal\n```\n\n### Vocabulary: create a vocabulary file from a train file:\n\n```ruby\nrequire 'word2vec'\n\nWord2vec::Model.build\\_vocab(\"./data/text7\", \"./data/vocab.txt\")\n```\n\nThe output file will have a list of words and its number of appearances separated by line break.\n\n### Tokenizer: create a binary file by tokenizing an input file\n\nThis method requires a vocabulary file precreated.\n\n```ruby\nrequire 'word2vec'\n\nWord2vec::Model.tokenize(\"./data/text7\", \"./data/vocab.txt\", \"./data/tokenized.bin\")\n```\n\nThe output file will contain a sequence of binary identificators of each word of the input file.\n\nRead output file with:\n\n    long long id;\n    fread(\u0026id, sizeof(id), 1, fi);\n\n### Load the **word2vec** output bin file (*vectors.bin*), into ruby array\n\n```ruby\nrequire 'word2vec'\n\nvector_array = Word2vec::load_vectors(\"./data/minimal.bin\")\n```\n\nThe `vector_array` variable will contain an array of pairs with the vocab and the vector the float values of each word.\n\nSet parameter `normalize: true` to normalize the vectors.\n\n```ruby\nrequire 'word2vec'\n\nvector_array = Word2vec::Model.load_vectors(\"./data/minimal.bin\", normalize: true)\n```\n\n## Development\n\nAfter checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n### Build extension \n\n    $ rake build\n\n### Launch tests\n\n    $ rake spec\n\n### Build extension \n\n    $ rake compile\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/madcato/word2vec-rb\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadcato%2Fword2vec-rb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmadcato%2Fword2vec-rb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadcato%2Fword2vec-rb/lists"}