{"id":23409425,"url":"https://github.com/france-travail/embcompare","last_synced_at":"2026-03-07T17:03:42.608Z","repository":{"id":108905164,"uuid":"567166911","full_name":"France-Travail/embcompare","owner":"France-Travail","description":"A simple python tool for embedding comparison","archived":false,"fork":false,"pushed_at":"2024-03-25T04:06:53.000Z","size":29246,"stargazers_count":7,"open_issues_count":2,"forks_count":0,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-03-30T10:04:31.539Z","etag":null,"topics":["comparison-tool","embedding-python","embedding-vectors","embeddings","embeddings-similarity","embeddings-word2vec","pypi-package","python-package","python3","streamlit-dashboard"],"latest_commit_sha":null,"homepage":"https://oss-pole-emploi.github.io/embcompare/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/France-Travail.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-17T08:02:17.000Z","updated_at":"2024-06-28T09:48:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"54f00c58-dde6-4584-81e2-eade4b628b47","html_url":"https://github.com/France-Travail/embcompare","commit_stats":null,"previous_names":["oss-pole-emploi/emcompare","france-travail/embcompare"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/France-Travail%2Fembcompare","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/France-Travail%2Fembcompare/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/France-Travail%2Fembcompare/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/France-Travail%2Fembcompare/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/France-Travail","download_url":"https://codeload.github.com/France-Travail/embcompare/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251319791,"owners_count":21570456,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["comparison-tool","embedding-python","embedding-vectors","embeddings","embeddings-similarity","embeddings-word2vec","pypi-package","python-package","python3","streamlit-dashboard"],"created_at":"2024-12-22T15:54:30.732Z","updated_at":"2026-03-07T17:03:42.574Z","avatar_url":"https://github.com/France-Travail.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://pypi.org/project/embcompare\"\u003e\n        \u003cimg src=\"https://img.shields.io/pypi/v/embcompare\" alt=\"EmbCompare package version\" /\u003e\n    \u003c/a\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/pyversions/embcompare\" alt=\"Python supported versions\" /\u003e\n    \u003ca href=\"https://github.com/OSS-Pole-Emploi/embcompare/actions/workflows/package-unit-tests.yml\"\u003e\n        \u003cimg src=\"https://github.com/OSS-Pole-Emploi/embcompare/actions/workflows/package-unit-tests.yml/badge.svg\" alt=\"unit tests status\" /\u003e\n    \u003c/a\u003e\n    \u003cimg src=\"https://img.shields.io/badge/coverage-%3E90%25-green\" alt=\"Badge indicating more than 90% coverage\" /\u003e\n    \u003ca href=\"https://www.gnu.org/licenses/agpl-3.0\"\u003e\n        \u003cimg src=\"https://img.shields.io/pypi/l/embcompare\" alt=\"License AGPL-3\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/PyCQA/bandit\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/security-bandit-yellow.svg\" alt=\"Security thanks to bandit package\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/psf/black\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/formatting-black-black\" alt=\"Black formatting\" /\u003e\n    \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n    \u003ch1\u003eEmbCompare\u003c/h1\u003e\n    \u003cp \u003e\u003cb\u003eA simple python tool for embedding comparison\u003c/b\u003e\u003c/p\u003e\n\u003c/div\u003e\n\nEmbCompare is a small python package highly inspired by the [Embedding Comparator tool](https://github.com/mitvis/embedding-comparator) \nthat helps you compare your embeddings both visually and numerically.\n\n### Key features : \n- **Visual comparison** : GUI for comparison of two embeddings\n- **Numerical comparison** : Calculation of comparison indicators between two embeddings for monitoring purposes\n\n\u003e EmbCompare keeps things simples. All computations are made in memory and the package does not bring any embedding storage management.\n\u003e\n\u003e If you need a tool to store, compare and track your experiments, you may like the [vectory](https://github.com/pentoai/vectory) project.\n\n## Table of content \u003c!-- omit from toc --\u003e\n- [🛠️ Installation](#️-installation)\n- [👩‍💻 Usage](#-usage)\n  - [Config file](#config-file)\n  - [JSON comparison report generation](#json-comparison-report-generation)\n  - [GUI](#gui)\n- [🐍 Python API](#-python-api)\n  - [Embedding](#embedding)\n  - [EmbeddingComparison](#embeddingcomparison)\n  - [JSON reports](#json-reports)\n    - [EmbeddingReport](#embeddingreport)\n    - [EmbeddingComparisonReport](#embeddingcomparisonreport)\n- [📊 Create your custom streamlit app](#-create-your-custom-streamlit-app)\n\n## 🛠️ Installation\n\n```bash\n# basic install\npip install embcompare\n\n# installation with the gui tool\npip install embcompare[gui]\n```\n\n## 👩‍💻 Usage\n\nEmbCompare provides a CLI with three sub-commands : \n\n- `embcompare add` is used to create or update a yaml file containing all embeddings infos : path, format, labels, term-frequencies, ... ;\n- `embcompare report` is used to generate json reports containing comparison metrics ;\n- `embcompare gui` is used to start a [streamlit](https://streamlit.io/) webapp to compare your embeddings visually.\n\n### Config file\n\nEmbCompare use a yaml file for referencing embeddings and relevant informations. By default, EmbCompare is looking\nfor a file named embcompare.yaml in the current working directory.\n\n```yaml\nembeddings: \n    first_embedding:\n        name: My first embedding\n        path: /abspath/to/firstembedding.json\n        format: json\n        frequencies: /abspath/to/freqs.json\n        frequencies_format: json\n        labels: /abspath/to/labels.pkl\n        labels_format: pkl\n    second_embedding:\n        name: My second embedding\n        path: /abspath/to/secondembedding.json\n        format: word2vec\n        frequencies: /abspath/to/freqs.pkl\n        frequencies_format: pkl\n        labels: /abspath/to/labels.json\n        labels_format: json\n```\n\nThe `embcompare add` command allow to update this file programatically (and even create it if it does not exist).\n\n### JSON comparison report generation\n\nEmbCompare aims to help to compare embedding thanks to numerical metrics that can be used to check if a new\ngenerated embedding is very different from the last one. The command `embcompare report` can be used in two ways : \n- With a single embedding. In this case it generate a small report about the embedding : \n  ```bash\n  embcompare report first_embedding\n  # creates a first_embedding_report.json file containing some infos about the embedding\n  ```\n- With two embeddings. In this case it generate a comparison report about the two embeddings : \n  ```bash\n  embcompare report first_embedding second_embedding\n  # creates a first_embedding_second_embedding_report.json file containing comparison metrics\n  ```\n\n### GUI\n\n![A short video overview of embcompare graphical user interface](.assets/overview.webp)\n\nThe GUI is also very handy to compare embeddings. To start the GUI, use the commande `embcompare gui`. \nIt will launch a streamlit app that will allow you to visually compare the embeddings you added in the configuration file.\n\n## 🐍 Python API\n\nEmbCompare provide several classes to load and compare embeddings. \n### Embedding\n\nThe `Embedding` class is child of the [`gensim.KeyedVectors`](https://radimrehurek.com/gensim/models/keyedvectors.html) class.\n\nIt add few functionalities : \n- You can provide term frequencies so you can filter the elements later\n- You can easily compute all elements nearest neighbors (thanks to \n  [sklearn.neighbors.NearestNeighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html))\n\n```python\nimport json\nimport gensim.downloader as api\nfrom embcompare import Embedding\n\nword_vectors = api.load(\"glove-wiki-gigaword-100\")\nwith open(\"frequencies.json\", \"r\") as f:\n  word_frequencies = json.load(f)\n\nembedding = Embedding.load_from_keyedvectors(word_vectors, frequencies=word_frequencies)\nneigh_dist, neigh_ind = embedding.compute_neighborhoods()\n```\n### EmbeddingComparison\n\nThe `EmbeddingComparison` class is meant to compare two `Embedding` objects : \n\n```python\nfrom embcompare import EmbeddingComparison, load_embedding\n\nemb1 = load_embedding(\"first_emb.bin\", embedding_format=\"fasttext\", frequencies_path=\"freqs.pkl\")\nemb2 = load_embedding(\"second_emb.bin\", embedding_format=\"word2vec\", frequencies_path=\"freqs.pkl\")\n\ncomparison = EmbeddingComparison({\"emb1\": emb1, \"emb2\": emb2}, n_neighbors=25)\ncomparison.neighborhoods_similarities[\"word\"]\n# 0.867\n```\n### JSON reports\n#### EmbeddingReport\nThe `EmbeddingReport` class is used to generate small report about an embedding : \n\n```python\nfrom embcompare import EmbeddingReport, load_embedding\n\nemb1 = load_embedding(\"first_emb.bin\", embedding_format=\"fasttext\", frequencies_path=\"freqs.pkl\")\nreport = EmbeddingReport(emb1)\nreport.to_dict()\n# { \n#   \"vector_size\": 300,\n#   \"mean_frequency\": 0.00012,\n#   \"mean_distance_neighbors\": 0.023,\n#   ...\n# }\n```\n\n#### EmbeddingComparisonReport\nThe `EmbeddingComparisonReport` class is used to generate small comparison report from two embedding : \n\n```python\nfrom embcompare import EmbeddingComparison, EmbeddingComparisonReport, load_embedding\n\nemb1 = load_embedding(\"first_emb.bin\", embedding_format=\"fasttext\", frequencies_path=\"freqs.pkl\")\nemb2 = load_embedding(\"second_emb.bin\", embedding_format=\"word2vec\", frequencies_path=\"freqs.pkl\")\n\ncomparison = EmbeddingComparison({\"emb1\": emb1, \"emb2\": emb2})\nreport = EmbeddingComparisonReport(comparison)\n\nreport.to_dict()\n# {\n#   \"embeddings\" : [\n#     { \n#       \"vector_size\": 300,\n#       \"mean_frequency\": 0.00012,\n#       \"mean_distance_neighbors\": 0.023,\n#       ...\n#     },\n#     ...\n#   ],\n#   \"neighborhoods_similarities_median\": 0.012,\n#   ...\n# }\n```\n\n## 📊 Create your custom streamlit app\n\nThe GUI is built with [streamlit](https://streamlit.io/). We tried to modularized the app so you can \nmore easily reuse some features for your custom streamlit app : \n\n```python\n# embcompare/gui/app.py\n\nfrom embcompare.gui.features import (\n    display_custom_elements_comparison,\n    display_elements_comparison,\n    display_embeddings_config,\n    display_frequencies_comparison,\n    display_neighborhoods_similarities,\n    display_numbers_of_elements,\n    display_parameters_selection,\n    display_spaces_comparison,\n    display_statistics_comparison,\n)\nfrom embcompare.gui.helpers import create_comparison\n\ndef main():\n    \"\"\"Streamlit app for embeddings comparison\"\"\"\n    config_embeddings = config[CONFIG_EMBEDDINGS]\n\n    (\n        tab_infos,\n        tab_stats,\n        tab_spaces,\n        tab_neighbors,\n        tab_compare,\n        tab_compare_custom,\n        tab_frequencies,\n    ) = st.tabs(\n        [\n            \"Infos\",\n            \"Statistics\",\n            \"Spaces\",\n            \"Similarities\",\n            \"Elements\",\n            \"Search elements\",\n            \"Frequencies\",\n        ]\n    )\n\n    # Embedding selection (inside the sidebar)\n    with st.sidebar:\n        parameters = display_parameters_selection(config_embeddings)\n\n    # Display informations about embeddings\n    with tab_infos:\n        display_embeddings_config(\n            config_embeddings, parameters.emb1_id, parameters.emb2_id\n        )\n\n    comparison = create_comparison(\n        config_embeddings,\n        emb1_id=parameters.emb1_id,\n        emb2_id=parameters.emb2_id,\n        n_neighbors=parameters.n_neighbors,\n        max_emb_size=parameters.max_emb_size,\n        min_frequency=parameters.min_frequency,\n    )\n\n    # Display number of element in both embedding and common elements\n    with tab_infos:\n        display_numbers_of_elements(comparison)\n\n    # Display statistics\n    with tab_stats:\n        display_statistics_comparison(comparison)\n\n    if not comparison.common_keys:\n        st.warning(\"The embeddings have no element in common\")\n        st.stop()\n\n    # Comparison below are based on common elements comparison\n    with tab_spaces:\n        display_spaces_comparison(comparison)\n\n    with tab_neighbors:\n        display_neighborhoods_similarities(comparison)\n\n    with tab_compare:\n        display_elements_comparison(comparison)\n\n    with tab_compare_custom:\n        display_custom_elements_comparison(comparison)\n\n    with tab_frequencies:\n        display_frequencies_comparison(comparison)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrance-travail%2Fembcompare","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffrance-travail%2Fembcompare","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrance-travail%2Fembcompare/lists"}