{"id":18682711,"url":"https://github.com/maxent-ai/zeroshot_topics","last_synced_at":"2025-06-24T19:40:36.571Z","repository":{"id":43661137,"uuid":"430299751","full_name":"maxent-ai/zeroshot_topics","owner":"maxent-ai","description":"Topic Inference with Zeroshot models","archived":false,"fork":false,"pushed_at":"2023-06-12T21:32:59.000Z","size":58,"stargazers_count":61,"open_issues_count":7,"forks_count":7,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-12T04:37:32.603Z","etag":null,"topics":["bert","data-science","huggingface","hypernymy-extraction","keybert","keyword-extraction","knowledge-graph","labelled-data","labelling","linguistics","machine-learning","nli","nlp","taxonomy","text","text-classification","transformers","weak-supervision","weakly-supervised-learning","zeroshot-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxent-ai.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-11-21T07:20:11.000Z","updated_at":"2024-05-21T13:46:07.000Z","dependencies_parsed_at":"2023-09-26T10:18:05.151Z","dependency_job_id":null,"html_url":"https://github.com/maxent-ai/zeroshot_topics","commit_stats":{"total_commits":17,"total_committers":3,"mean_commits":5.666666666666667,"dds":0.3529411764705882,"last_synced_commit":"f9a58ae5d10c938784ca5e74f0d445d3cb313fb6"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/maxent-ai/zeroshot_topics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxent-ai%2Fzeroshot_topics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxent-ai%2Fzeroshot_topics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxent-ai%2Fzeroshot_topics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxent-ai%2Fzeroshot_topics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxent-ai","download_url":"https://codeload.github.com/maxent-ai/zeroshot_topics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxent-ai%2Fzeroshot_topics/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261744848,"owners_count":23203283,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","data-science","huggingface","hypernymy-extraction","keybert","keyword-extraction","knowledge-graph","labelled-data","labelling","linguistics","machine-learning","nli","nlp","taxonomy","text","text-classification","transformers","weak-supervision","weakly-supervised-learning","zeroshot-learning"],"created_at":"2024-11-07T10:12:37.988Z","updated_at":"2025-06-24T19:40:36.544Z","avatar_url":"https://github.com/maxent-ai.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"zeroshot_topics\n===============\n\n.. image:: https://static.pepy.tech/personalized-badge/zeroshot_topics?period=total\u0026units=international_system\u0026left_color=black\u0026right_color=orange\u0026left_text=Downloads\n\n.. contents:: **Table of Contents**\n    :backlinks: none\n\n\nIntroduction\n------------\n\nHand-labelled training sets are expensive and time consuming to create usually. \nSome datasets call for domain expertise (eg: medical/finance datasets etc). \nGiven these factors around costs and inflexibility of hand-labelling it would be nice \nif there are tools which can help us get started quickly with minimal labelled dataset - enter weak supervision. \n\n**But what if you do not have any labelled data at all? is there a way to still label your data automatically in some way?**\nThat's where **zeroshot_topics** might be useful! to help you to be up and running quickly. \n\n*zeroshot_topics* let's you do exactly that! it leverages the power of zeroshot-classifiers, transformers \u0026 knowledge graphs to automatically suggest labels/topics from your text data. all you need to do is point it towards your data. \n\nAlgorithm\n---------\n\nThe algorithm contains, 4 stages: \n\n.. image:: assets/zstm.png\n\n1. **Keyword \u0026 Keyphrase extraction**: This is done with the help of `KeyBERT \u003chttps://github.com/MaartenGr/KeyBERT\u003e`_. but really any sort of keyword extractor can be used.\n2. **Keyword/Keyphrase expansion via knowledge graphs/Taxanomy**: Then we expand the important keywords we discovered by using some sort of taxanomy/knowledge graph like wordnet, conceptnet etc. \n3. **Trace the Hypernyms for the keywords**: Identify the Hypernyms(the root/parent word) and use this as the psuedo-label for the zeroshot classifier. \n4. **Zeroshot classification**: Use the Hypernyms and documents to label via zeroshot classifiers. \n\nNote: Currently, this tends to work well on short-texts in general, in the future I intend to experiment and see how we can support long texts as well. \n\nInstallation\n------------\n\nzeroshot_topics is distributed on `PyPI \u003chttps://pypi.org\u003e`_ as a universal\nwheel and is available on Linux/macOS and Windows and supports\nPython 3.7+ and PyPy.\n\n.. code-block:: bash\n\n    $ pip install zeroshot_topics\n\nUsage\n------\n\n.. code-block:: python \n\n    from zeroshot_topics import ZeroShotTopicFinder\n\n    zsmodel = ZeroShotTopicFinder()\n    \n    text = \"\"\"can you tell me anything else okay great tell me everything you know about George_Washington. \n    he was the first president he was well he I'm trying to well he fought in the Civil_War he was a general \n    in the Civil_War and chopped down his father's cherry tree when he was a little boy he that's it.\"\"\"\n    \n    zsmodel.find_topic(text, n_topic=2)\n\n    # Output - Topics: ['War', 'Head Of State']\n    \n\nRoadmap\n-------\n\nSome things that i plan to add in the coming days, if there's some interest in this work by the community. \n\n- Support custom keyword extractors.\n- Support Custom Knowledge-graphs \u0026 taxonomy.\n- Support Custom Zeroshot-classifiers in the pipeline.\n- Add Usecase examples \u0026 improve documentation.\n- Optimise the overall library and make it a faster.\n- Support Long Text documents.\n\nLicense\n-------\n\nzeroshot_topics is distributed under the terms of\n\n- `MIT License \u003chttps://choosealicense.com/licenses/mit\u003e`_\n- `Apache License, Version 2.0 \u003chttps://choosealicense.com/licenses/apache-2.0\u003e`_\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxent-ai%2Fzeroshot_topics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxent-ai%2Fzeroshot_topics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxent-ai%2Fzeroshot_topics/lists"}