{"id":13564539,"url":"https://github.com/src-d/wmd-relax","last_synced_at":"2025-05-16T14:07:36.157Z","repository":{"id":56415306,"uuid":"85208106","full_name":"src-d/wmd-relax","owner":"src-d","description":"Calculates Word Mover's Distance Insanely Fast","archived":false,"fork":false,"pushed_at":"2023-08-17T14:53:09.000Z","size":146,"stargazers_count":462,"open_issues_count":19,"forks_count":79,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-05-05T05:05:08.194Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/src-d.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-16T14:56:00.000Z","updated_at":"2025-05-04T13:34:09.000Z","dependencies_parsed_at":"2024-06-18T18:38:05.596Z","dependency_job_id":null,"html_url":"https://github.com/src-d/wmd-relax","commit_stats":{"total_commits":93,"total_committers":13,"mean_commits":7.153846153846154,"dds":"0.20430107526881724","last_synced_commit":"419f10e163c2a54a48be723da95914a4a99a3cfb"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fwmd-relax","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fwmd-relax/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fwmd-relax/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fwmd-relax/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/src-d","download_url":"https://codeload.github.com/src-d/wmd-relax/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254544146,"owners_count":22088807,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:01:32.720Z","updated_at":"2025-05-16T14:07:36.123Z","avatar_url":"https://github.com/src-d.png","language":"Python","readme":"Fast Word Mover's Distance [![Build Status](https://travis-ci.com/src-d/wmd-relax.svg?branch=master)](https://travis-ci.com/src-d/wmd-relax) [![PyPI](https://img.shields.io/pypi/v/wmd.svg)](https://pypi.python.org/pypi/wmd) [![codecov](https://codecov.io/github/src-d/wmd-relax/coverage.svg)](https://codecov.io/gh/src-d/wmd-relax)\n==========================\n\nCalculates Word Mover's Distance as described in\n[From Word Embeddings To Document Distances](http://www.cs.cornell.edu/~kilian/papers/wmd_metric.pdf)\nby Matt Kusner, Yu Sun, Nicholas Kolkin and Kilian Weinberger.\n\n\u003cimg src=\"doc/wmd.png\" alt=\"Word Mover's Distance\" width=\"200\"/\u003e\n\nThe high level logic is written in Python, the low level functions related to\nlinear programming are offloaded to the bundled native extension. The native\nextension can be built as a generic shared library not related to Python at all.\n**Python 2.7 and older are not supported.** The heavy-lifting is done by\n[google/or-tools](https://github.com/google/or-tools).\n\n\n### Installation\n\n```\npip3 install wmd\n```\nTested on Linux and macOS.\n\n### Usage\n\nYou should have the embeddings numpy array and the nbow model - that is,\nevery sample is a weighted set of items, and every item is embedded.\n\n```python\nimport numpy\nfrom wmd import WMD\n\nembeddings = numpy.array([[0.1, 1], [1, 0.1]], dtype=numpy.float32)\nnbow = {\"first\":  (\"#1\", [0, 1], numpy.array([1.5, 0.5], dtype=numpy.float32)),\n        \"second\": (\"#2\", [0, 1], numpy.array([0.75, 0.15], dtype=numpy.float32))}\ncalc = WMD(embeddings, nbow, vocabulary_min=2)\nprint(calc.nearest_neighbors(\"first\"))\n```\n```\n[('second', 0.10606599599123001)]\n```\n\n`embeddings` must support `__getitem__` which returns an item by it's\nidentifier; particularly, `numpy.ndarray` matches that interface.\n`nbow` must be iterable - returns sample identifiers - and support\n`__getitem__` by those identifiers which returns tuples of length 3.\nThe first element is the human-readable name of the sample, the\nsecond is an iterable with item identifiers and the third is `numpy.ndarray`\nwith the corresponding weights. All numpy arrays must be float32. The return\nformat is the list of tuples with sample identifiers and relevancy\nindices (lower the better).\n\nIt is possible to use this package with [spaCy](https://github.com/explosion/spaCy):\n\n```python\nimport spacy\nimport wmd\n\nnlp = spacy.load('en_core_web_md')\nnlp.add_pipe(wmd.WMD.SpacySimilarityHook(nlp), last=True)\ndoc1 = nlp(\"Politician speaks to the media in Illinois.\")\ndoc2 = nlp(\"The president greets the press in Chicago.\")\nprint(doc1.similarity(doc2))\n```\n\nBesides, see another [example](spacy_example.py) which finds similar Wikipedia\npages.\n\n### Building from source\n\nEither build it as a Python package:\n\n```\npip3 install git+https://github.com/src-d/wmd-relax\n```\n\nor use CMake:\n\n```\ngit clone --recursive https://github.com/src-d/wmd-relax\ncmake -D CMAKE_BUILD_TYPE=Release .\nmake -j\n```\n\nPlease note the `--recursive` flag for `git clone`. This project uses source{d}'s\nfork of [google/or-tools](https://github.com/google/or-tools) as the git submodule.\n\n### Tests\n\nTests are in `test.py` and use the stock `unittest` package.\n\n### Documentation\n\n```\ncd doc\nmake html\n```\n\nThe files are in `doc/doxyhtml` and `doc/html` directories.\n\n### Contributions\n\n...are welcome! See [CONTRIBUTING](CONTRIBUTING.md) and [code of conduct](CODE_OF_CONDUCT.md).\n\n### License\n[Apache 2.0](LICENSE.md)\n\n#### README {#ignore_this_doxygen_anchor}\n","funding_links":[],"categories":["Python","Software"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrc-d%2Fwmd-relax","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsrc-d%2Fwmd-relax","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrc-d%2Fwmd-relax/lists"}