{"id":19162755,"url":"https://github.com/centre-for-humanities-computing/embedding-explorer","last_synced_at":"2025-08-20T04:32:37.584Z","repository":{"id":170901068,"uuid":"610330460","full_name":"centre-for-humanities-computing/embedding-explorer","owner":"centre-for-humanities-computing","description":"Tools for interactive visual exploration of semantic embeddings.","archived":false,"fork":false,"pushed_at":"2024-09-06T13:05:08.000Z","size":3160,"stargazers_count":29,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-12-11T15:47:20.856Z","etag":null,"topics":["clustering","embedding","embeddings","interactive","knowledge-graph","machine-learning","networks","nlp","projection","semantic"],"latest_commit_sha":null,"homepage":"https://centre-for-humanities-computing.github.io/embedding-explorer/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/centre-for-humanities-computing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-06T14:59:38.000Z","updated_at":"2024-11-16T21:34:38.000Z","dependencies_parsed_at":"2024-11-16T10:03:26.377Z","dependency_job_id":null,"html_url":"https://github.com/centre-for-humanities-computing/embedding-explorer","commit_stats":null,"previous_names":["centre-for-humanities-computing/embedding-explorer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Fembedding-explorer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Fembedding-explorer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Fembedding-explorer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Fembedding-explorer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/centre-for-humanities-computing","download_url":"https://codeload.github.com/centre-for-humanities-computing/embedding-explorer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230394228,"owners_count":18218707,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","embedding","embeddings","interactive","knowledge-graph","machine-learning","networks","nlp","projection","semantic"],"created_at":"2024-11-09T09:13:04.163Z","updated_at":"2024-12-19T07:06:43.019Z","avatar_url":"https://github.com/centre-for-humanities-computing.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg align=\"left\" width=\"82\" height=\"82\" src=\"assets/logo.svg\"\u003e\n\n# embedding-explorer\nTools for interactive visual exploration of semantic embeddings.\n\n[Documentation](https://centre-for-humanities-computing.github.io/embedding-explorer/index.html)\n\n### New in version 0.6.0\n\nYou can now pass a custom Neofuzz process to the explorer if you have specific requirements.\n\n```python\nfrom embedding_explorer import show_network_explorer\nfrom neofuzz import char_ngram_process\n\nprocess = char_ngram_process()\nshow_network_explorer(corpus=corpus, embeddings=embeddings, fuzzy_search=process)\n```\n\n## Installation\n\nInstall embedding-explorer from PyPI:\n\n```bash\npip install embedding-explorer\n```\n\n## [Semantic Explorer](https://centre-for-humanities-computing.github.io/embedding-explorer/semantic_networks.html)\n\nembedding-explorer comes with a web application built for exploring semantic relations in a corpus with the help of embeddings.\nIn this section I will show a couple of examples of running the app with different embedding models and corpora.\n\n### [Static Word Embeddings](https://centre-for-humanities-computing.github.io/embedding-explorer/semantic_networks.html#exploring-associations-in-static-word-embedding-models)\nLet's say that you would like to explore semantic relations by investigating word embeddings generated with Word2Vec.\nYou can do this by passing the vocabulary of the model and the embedding matrix to embedding-explorer.\n\nFor this example I will use Gensim, which can be installed from PyPI:\n\n```bash\npip install gensim\n```\n\nWe will download GloVe Twitter 25 from gensim's repositories. \n```python\nfrom gensim import downloader\nfrom embedding_explorer import show_network_explorer\n\nmodel = downloader.load(\"glove-twitter-25\")\nvocabulary = model.index_to_key\nembeddings = model.vectors\nshow_network_explorer(corpus=vocabulary, embeddings=embeddings)\n```\n\nThis will open a new browser window with the Explorer, where you can enter seed words and set the number of associations that you would\nlike to see on the screen.\n\n![Screenshot of the Explorer](assets/glove_screenshot.png)\n\n## [Dynamic Embedding Models](https://centre-for-humanities-computing.github.io/embedding-explorer/semantic_networks.html#exploring-corpora-with-dynamic-embedding-models)\n\nIf you want to explore relations in a corpus using let's say a sentence transformer, which creates contextually aware embeddings,\nyou can do so by specifying a scikit-learn compatible vectorizer model instead of passing along an embedding matrix.\n\nOne clear advantage here is that you can input arbitrary sequences as seeds instead of a predetermined set of texts.\n\nWe are going to use the package `embetter` for embedding documents.\n\n```bash\npip install embetter[sentence-trf]\n```\n\nI decided to examine four-grams in the 20newsgroups dataset. We will limit the number of four-grams to 4000 so we only see the most relevant ones.\n\n```python\nfrom embetter.text import SentenceEncoder\nfrom embedding_explorer import show_network_explorer\nfrom sklearn.datasets import fetch_20newsgroups\nfrom sklearn.feature_extraction.text import CountVectorizer\n\ncorpus = fetch_20newsgroups(\n    remove=(\"headers\", \"footers\", \"quotes\"),\n).data\n# We will use CountVectorizer for obtaining the possible n-grams\nfour_grams = (\n    CountVectorizer(\n        stop_words=\"english\", ngram_range=(4, 4), max_features=4000\n    )\n    .fit(corpus)\n    .get_feature_names_out()\n)\n\nmodel = SentenceEncoder()\nshow_network_explorer(corpus=four_grams, vectorizer=model)\n```\n\n![Screenshot of the Explorer](assets/trf_screenshot.png)\n\n## [Projection and Clustering](https://centre-for-humanities-computing.github.io/embedding-explorer/projection_clustering.html#projection-and-clustering)\n:star2: New in version 0.5.0 \n\nIn embedding-explorer you can now inspect corpora or embeddings by projecting them into 2D space,\nand optionally clustering observations.\n\nIn this example I'm going to demonstrate how to visualize 20 Newsgroups using various projection and clustering methods in embedding-explorer.\nWe are going to use sentence transformers to encode texts.\n\n```python\nfrom embetter.text import SentenceEncoder\nfrom sklearn.datasets import fetch_20newsgroups\n\nfrom embedding_explorer import show_clustering\n\nnewsgroups = fetch_20newsgroups(\n    remove=(\"headers\", \"footers\", \"quotes\"),\n)\ncorpus = newsgroups.data\n\nshow_clustering(corpus=corpus, vectorizer=SentenceEncoder())\n```\n\nIn the app you can whether or how you want to reduce embedding dimensionality, how you want to cluster the embeddings, and also how you intend to project them onto the 2D plane.\n\n![Screenshot of the Clustering parameters](assets/clustering_params.png)\n\nAfter this you can investigate the semantic structure of your corpus interactively.\n\n![Screenshot of the Clustering](assets/clustering_app.png)\n\n## [Dashboard](https://centre-for-humanities-computing.github.io/embedding-explorer/dashboards.html)\n\nIf you have multiple models to examine the same corpus or multiple corpora, that you want to examine with the same model, then\nyou can create a dashboard containing all of these options, that users will be able to click on and that takes them to the appropriate explorer page.\n\nFor this we will have to assemble these options into a list of `Card` objects, that contain the information about certain pages.\n\nIn the following example I will set up two different sentence transformers with the same corpus from the previous example.\n\n```python\nfrom embetter.text import SentenceEncoder\nfrom embedding_explorer import show_dashboard\nfrom embedding_explorer.cards import NetworkCard, ClusteringCard\n\ncards = [\n    NetworkCard(\"MiniLM\", corpus=four_grams, vectorizer=SentenceEncoder(\"all-MiniLM-L12-v2\")),\n    NetworkCard(\"MPNET\", corpus=four_grams, vectorizer=SentenceEncoder(\"all-mpnet-base-v2\")),\n]\nshow_dashboard(cards)\n```\n\n![Screenshot of the Dashboard](assets/dashboard_screenshot.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcentre-for-humanities-computing%2Fembedding-explorer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcentre-for-humanities-computing%2Fembedding-explorer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcentre-for-humanities-computing%2Fembedding-explorer/lists"}