{"id":19162843,"url":"https://github.com/centre-for-humanities-computing/glovpy","last_synced_at":"2026-06-11T21:34:01.639Z","repository":{"id":196838099,"uuid":"697156166","full_name":"centre-for-humanities-computing/glovpy","owner":"centre-for-humanities-computing","description":"Package for interfacing Stanford's C GloVe implementation from Python.","archived":false,"fork":false,"pushed_at":"2023-11-16T10:49:39.000Z","size":15,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-09T23:59:58.607Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/centre-for-humanities-computing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-27T07:04:54.000Z","updated_at":"2024-11-03T15:42:59.000Z","dependencies_parsed_at":"2025-01-03T21:40:50.919Z","dependency_job_id":"c71219a2-b753-42ce-b8a9-8ea0033d8ab8","html_url":"https://github.com/centre-for-humanities-computing/glovpy","commit_stats":null,"previous_names":["centre-for-humanities-computing/glopy"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/centre-for-humanities-computing/glovpy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Fglovpy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Fglovpy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Fglovpy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Fglovpy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/centre-for-humanities-computing","download_url":"https://codeload.github.com/centre-for-humanities-computing/glovpy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Fglovpy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34219510,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T09:13:17.383Z","updated_at":"2026-06-11T21:34:01.622Z","avatar_url":"https://github.com/centre-for-humanities-computing.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# glovpy\nPackage for interfacing Stanford's C GloVe implementation from Python.\n\n## Installation\n\nInstall glovpy from PyPI:\n\n```bash\npip install glovpy\n```\n\nAdditionally the first time you import glopy it will build GloVe from scratch on your system.\n\n## Requirements\nWe highly recommend that you use a Unix-based system, preferably a variant of Debian.\nThe package needs `git`, `make` and a C compiler (`clang` or `gcc`) installed.\n\nOtherwise the implementation is as barebones as it gets, only the standard library and gensim are being used (gensim only for producing KeyedVectors).\n\n## Example Usage\nHere's a quick example of how to train GloVe on 20newsgroups using Gensim's tokenizer.\n\n```python\nfrom gensim.utils import tokenize\nfrom sklearn.datasets import fetch_20newsgroups\n\nfrom glovpy import GloVe\n\ntexts = fetch_20newsgroups().data\ncorpus = [list(tokenize(text, lowercase=True, deacc=True)) for text in texts]\n\nmodel = GloVe(vector_size=25)\nmodel.train(corpus)\n\nfor word, similarity in model.wv.most_similar(\"god\"):\n    print(f\"{word}, sim: {similarity}\")\n```\n\n|   word     |   similarity   |\n|------------|---------------|\n| existence  |  0.9156746864 |\n| jesus      |  0.8746870756 |\n| lord       |  0.8555182219 |\n| christ     |  0.8517201543 |\n| bless      |  0.8298447728 |\n| faith      |  0.8237065077 |\n| saying     |  0.8204566240 |\n| therefore  |  0.8177698255 |\n| desires    |  0.8094088435 |\n| telling    |  0.8083973527 |\n\n## API Reference \n\n### `class glovpy.GloVe(vector_size, window_size, symmetric, distance_weighting, alpha, min_count, iter, initial_learning_rate, threads, memory)`\n\nWrapper around the original C implementation of GloVe.\n\n### Parameters\n\n| Parameter                   | Type              | Description                                                                                      | Default          |\n|------------------------|-------------------|--------------------------------------------------------------------------------------------------|------------------|\n| vector_size            | _int_             | Number of dimensions the trained word vectors should have.                                      | *50*           |\n| window_size            | _int_             | Number of context words to the left (and to the right, if symmetric is True).                   | *15*           |\n| alpha                  | _float_           | Parameter in exponent of weighting function; default 0.75                                       | *0.75*         |\n| symmetric              | _bool_            | If true, both future and past words will be used as context, otherwise only past words will be used. | *True*       |\n| distance_weighting     | _bool_            | If False, do not weight cooccurrence count by distance between words. If True (default), weight the cooccurrence count by inverse of distance between the target word and the context word. | *True* |\n| min_count              | _int_             | Minimum number of times a token has to appear to be kept in the vocabulary.                       | *5*            |\n| iter                   | _int_             | Number of training iterations.                                                                    | *25*           |\n| initial_learning_rate  | _float_           | Initial learning rate for training.                                                               | *0.05*         |\n| threads                | _int_             | Number of threads to use for training.                                                            | *8*            |\n| memory                 | _float_           | Soft limit for memory consumption, in GB. (based on simple heuristic, so not extremely accurate)  | *4.0*           |\n\n### Attributes\n\n| Name | Type | Description |\n|------|------|-------------|\n| wv   | _KeyedVectors_ | Token embeddings in the form of [Gensim keyed vectors](https://radimrehurek.com/gensim/models/keyedvectors.html). |\n\n### Methods\n\n#### `glovpy.GloVe.train(tokens)`\nTrain the model on a stream of texts.\n\n| Parameter | Type | Description |\n|-----------|------|-------------|\n| tokens    | _Iterable[list[str]]_ | Stream of documents in the form of lists of tokens. The stream has to be reusable, as the model needs at least two passes over the corpus. |\n\n### `glovpy.utils.reusable(gen_func)`\nFunction decorator that turns your generator function into an\niterator, thereby making it reusable.\nYou can use this if you want to reuse a generator function so that multiple passes can be made.\n\n### Parameters\n\n| Parameter | Type     | Description                                  |\n|-----------|----------|----------------------------------------------|\n| gen_func  | _Callable_ | Generator function that you want to be reusable. |\n\n### Returns\n\n|  Returns  | Type     | Description                                            |\n|-----------|----------|--------------------------------------------------------|\n| _multigen | _Callable_ | Iterator class wrapping the generator function. |\n\n### Example usage\n\nHere's how to stream a very long file line by line in a reusable manner.\n\n```python\nfrom gensim.utils import tokenize\nfrom glovpy.utils import reusable\nfrom glovpy import GloVe\n\n@reusable\ndef stream_lines():\n    with open(\"very_long_text_file.txt\") as f:\n        for line in f:\n            yield list(tokenize(line))\n\nmodel = GloVe()\nmodel.train(stream_lines())\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcentre-for-humanities-computing%2Fglovpy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcentre-for-humanities-computing%2Fglovpy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcentre-for-humanities-computing%2Fglovpy/lists"}