{"id":13577375,"url":"https://github.com/MaartenGr/Concept","last_synced_at":"2025-04-05T11:32:37.503Z","repository":{"id":37500360,"uuid":"421798263","full_name":"MaartenGr/Concept","owner":"MaartenGr","description":"Concept Modeling: Topic Modeling on Images and Text","archived":false,"fork":false,"pushed_at":"2024-11-04T13:40:47.000Z","size":5093,"stargazers_count":196,"open_issues_count":12,"forks_count":16,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-11-04T14:35:48.805Z","etag":null,"topics":["computer-vision","image-processing","nlp","topic-modeling"],"latest_commit_sha":null,"homepage":"https://maartengr.github.io/Concept/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MaartenGr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-27T11:55:22.000Z","updated_at":"2024-11-04T13:40:51.000Z","dependencies_parsed_at":"2023-01-17T14:45:56.742Z","dependency_job_id":null,"html_url":"https://github.com/MaartenGr/Concept","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaartenGr%2FConcept","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaartenGr%2FConcept/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaartenGr%2FConcept/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaartenGr%2FConcept/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MaartenGr","download_url":"https://codeload.github.com/MaartenGr/Concept/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223186598,"owners_count":17102495,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","image-processing","nlp","topic-modeling"],"created_at":"2024-08-01T15:01:20.921Z","updated_at":"2024-11-05T14:31:02.109Z","avatar_url":"https://github.com/MaartenGr.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"[![PyPI - Python](https://img.shields.io/badge/python-v3.6+-blue.svg)](https://pypi.org/project/concept/)\n[![PyPI - PyPi](https://img.shields.io/pypi/v/Concept)](https://pypi.org/project/concept/)\n[![docs](https://img.shields.io/badge/docs-Passing-green.svg)](https://maartengr.github.io/concept/)\n[![PyPI - License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/MaartenGr/concept/blob/master/LICENSE)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1XHwQPT2itZXu1HayvGoj60-xAXxg9mqe?usp=sharing)\n\n# Concept\n\n\u003cimg src=\"images/logo.png\" width=\"25%\" height=\"25%\" align=\"right\" /\u003e\n\n**Concept** is a technique that leverages CLIP and BERTopic-based techniques to perform Concept Modeling on images.\n\nSince topics are part of conversations and text, they do not represent the context of images well. Therefore, these clusters of images are \nreferred to as 'Concepts' instead of the traditional 'Topics'.\n\nThus, **Concept Modeling** takes inspiration from topic modeling techniques \nto cluster images, find common concepts and model them both visually \nusing images and textually using topic representations.\n\n## Installation\n\nInstallation, with sentence-transformers, can be done using [pypi](https://pypi.org/project/concept/):\n\n```bash\npip install concept\n```\n\n## Quick Start\nFirst, we need to download and extract 25.000 images from Unsplash used in the sentence-transformers \nexample:\n\n```python\nimport os\nimport glob\nimport zipfile\nfrom tqdm import tqdm\nfrom sentence_transformers import util\n\n# 25k images from Unsplash\nimg_folder = 'photos/'\nif not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0:\n    os.makedirs(img_folder, exist_ok=True)\n    \n    photo_filename = 'unsplash-25k-photos.zip'\n    if not os.path.exists(photo_filename):   #Download dataset if does not exist\n        util.http_get('http://sbert.net/datasets/'+photo_filename, photo_filename)\n        \n    #Extract all images\n    with zipfile.ZipFile(photo_filename, 'r') as zf:\n        for member in tqdm(zf.infolist(), desc='Extracting'):\n            zf.extract(member, img_folder)\nimg_names = list(glob.glob('photos/*.jpg'))\n```\n\nNext, we only need to pass images to **Concept**:\n\n```python\nfrom concept import ConceptModel\nconcept_model = ConceptModel()\nconcepts = concept_model.fit_transform(img_names)\n```\n\nThe resulting concepts can be visualized through `concept_model.visualize_concepts()`:\n\n\u003cimg src=\"images/concepts_without_topics.jpg\" width=\"100%\" height=\"100%\" align=\"center\" /\u003e\n\nHowever, to get the full experience, we need to label the concept clusters with topics. To do this, \nwe need to create a vocabulary. We are going to feed our model with 50.000 nouns from the English \nvocabulary: \n\n```python\nimport random\nimport nltk\nnltk.download(\"wordnet\")\nfrom nltk.corpus import wordnet as wn\n\nall_nouns = [word for synset in wn.all_synsets('n') for word in synset.lemma_names() if \"_\" not in word]\nselected_nouns = random.sample(all_nouns, 50_000)\n```\n\nThen, we can pass in the resulting `selected_nouns` to **Concept**:\n\n```python\nfrom concept import ConceptModel\n\nconcept_model = ConceptModel()\nconcepts = concept_model.fit_transform(img_names, docs=selected_nouns)\n```\n\nAgain, the resulting concepts can be visualized. This time however, we can also see the generated topics \nthrough `concept_model.visualize_concepts()`:\n\n\u003cimg src=\"images/concepts.jpg\" width=\"100%\" height=\"100%\" align=\"center\" /\u003e\n\n**NOTE**: Use `Concept(embedding_model=\"clip-ViT-B-32-multilingual-v1\")` to select a model that supports 50+ languages.\n\n## Search Concepts\nWe can quickly search for specific concepts by embedding a search term and finding the cluster embeddings \nthat best represent them. As an example, let us search for the term `beach` and see what we can find. \nTo do this, we simply run the following:\n\n```python\n\u003e\u003e\u003e concept_model.find_concepts(\"beach\")\n[(100, 0.277577825349102),\n (53, 0.27431058773894657),\n (95, 0.25973751319723837),\n (77, 0.2560122597417548),\n (97, 0.25361988261846297)]\n```\n\nEach tuple contains two values, the first is the concept cluster and the second the similarity to the \nsearch term. The top 5 similar topics are returned. \n\nNow, let us visualize those concepts to see how well the search function works:\n\n```python\nconcept_model.visualize_concepts(concepts=[100, 53, 95, 77, 97])\n``` \n\n\u003cimg src=\"images/search.jpg\" width=\"100%\" height=\"100%\" align=\"center\" /\u003e\n\n ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMaartenGr%2FConcept","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMaartenGr%2FConcept","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMaartenGr%2FConcept/lists"}