{"id":13503658,"url":"https://github.com/vzhong/embeddings","last_synced_at":"2025-04-04T18:06:13.619Z","repository":{"id":46949844,"uuid":"83266708","full_name":"vzhong/embeddings","owner":"vzhong","description":"Fast, DB Backed pretrained word embeddings for natural language processing.","archived":false,"fork":false,"pushed_at":"2023-10-23T11:09:11.000Z","size":48,"stargazers_count":222,"open_issues_count":1,"forks_count":31,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T17:09:20.819Z","etag":null,"topics":["deep-learning","neural-network","nlp"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vzhong.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-02-27T04:05:12.000Z","updated_at":"2025-01-23T06:00:16.000Z","dependencies_parsed_at":"2022-09-14T12:30:30.574Z","dependency_job_id":"f9cb95de-0ae2-44d8-9bc0-5e6b9b7ae385","html_url":"https://github.com/vzhong/embeddings","commit_stats":{"total_commits":34,"total_committers":4,"mean_commits":8.5,"dds":0.08823529411764708,"last_synced_commit":"868b117bca4d9ac3e967bba5d895625db02cb2f3"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vzhong%2Fembeddings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vzhong%2Fembeddings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vzhong%2Fembeddings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vzhong%2Fembeddings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vzhong","download_url":"https://codeload.github.com/vzhong/embeddings/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247226213,"owners_count":20904465,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","neural-network","nlp"],"created_at":"2024-07-31T23:00:42.738Z","updated_at":"2025-04-04T18:06:13.611Z","avatar_url":"https://github.com/vzhong.png","language":"Python","funding_links":[],"categories":["Table of Contents"],"sub_categories":[],"readme":"Embeddings\n==========\n\n.. image:: https://readthedocs.org/projects/embeddings/badge/?version=latest\n    :target: http://embeddings.readthedocs.io/en/latest/?badge=latest\n    :alt: Documentation Status\n.. image:: https://travis-ci.org/vzhong/embeddings.svg?branch=master\n    :target: https://travis-ci.org/vzhong/embeddings\n\nEmbeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning.\n\nNOTE: there are several people looking to take over the pip `embeddings` package on Pypi for their own project. As a result, please install from github instead.\n\nInstead of loading a large file to query for embeddings, ``embeddings`` is backed by a database and fast to load and query:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e %timeit GloveEmbedding('common_crawl_840', d_emb=300)\n    100 loops, best of 3: 12.7 ms per loop\n    \n    \u003e\u003e\u003e %timeit GloveEmbedding('common_crawl_840', d_emb=300).emb('canada')\n    100 loops, best of 3: 12.9 ms per loop\n    \n    \u003e\u003e\u003e g = GloveEmbedding('common_crawl_840', d_emb=300)\n    \n    \u003e\u003e\u003e %timeit -n1 g.emb('canada')\n    1 loop, best of 3: 38.2 µs per loop\n\n\nInstallation\n------------\n\n.. code-block:: sh\n\n    pip install git+https://github.com/vzhong/embeddings.git  # from github\n\n\nUsage\n-----\n\nUpon first use, the embeddings are first downloaded to disk in the form of a SQLite database.\nThis may take a long time for large embeddings such as GloVe.\nFurther usage of the embeddings are directly queried against the database.\nEmbedding databases are stored in the ``$EMBEDDINGS_ROOT`` directory (defaults to ``~/.embeddings``). Note that this location is probably **undesirable** if your home directory is on NFS, as it would slow down database queries significantly.\n\n\n.. code-block:: python\n\n    from embeddings import GloveEmbedding, FastTextEmbedding, KazumaCharEmbedding, ConcatEmbedding\n    \n    g = GloveEmbedding('common_crawl_840', d_emb=300, show_progress=True)\n    f = FastTextEmbedding()\n    k = KazumaCharEmbedding()\n    c = ConcatEmbedding([g, f, k])\n    for w in ['canada', 'vancouver', 'toronto']:\n        print('embedding {}'.format(w))\n        print(g.emb(w))\n        print(f.emb(w))\n        print(k.emb(w))\n        print(c.emb(w))\n\n\nDocker\n------\n\nIf you use Docker, an image prepopulated with the Common Crawl 840 GloVe embeddings and Kazuma Hashimoto's character ngram embeddings is available at `vzhong/embeddings \u003chttps://hub.docker.com/r/vzhong/embeddings\u003e`_.\nTo mount volumes from this container, set ``$EMBEDDINGS_ROOT`` in your container to ``/opt/embeddings``.\n\nFor example:\n\n.. code-block:: bash\n\n    docker run --volumes-from vzhong/embeddings -e EMBEDDINGS_ROOT='/opt/embeddings' myimage python train.py\n\n\nContribution\n------------\n\nPull requests welcome!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvzhong%2Fembeddings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvzhong%2Fembeddings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvzhong%2Fembeddings/lists"}