{"id":15175474,"url":"https://github.com/cohere-ai/sandbox-topically","last_synced_at":"2025-04-13T10:58:18.236Z","repository":{"id":62644420,"uuid":"550786386","full_name":"cohere-ai/sandbox-topically","owner":"cohere-ai","description":"Topic modeling helpers using managed language models from Cohere. Name text clusters using large GPT models.","archived":false,"fork":false,"pushed_at":"2022-12-15T07:08:21.000Z","size":3338,"stargazers_count":219,"open_issues_count":4,"forks_count":18,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-03-27T02:11:50.093Z","etag":null,"topics":["machine-learning","nlp","python","topic-modeling"],"latest_commit_sha":null,"homepage":"https://discord.gg/co-mmunity","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cohere-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-10-13T10:22:26.000Z","updated_at":"2025-01-23T15:48:47.000Z","dependencies_parsed_at":"2023-01-29T02:15:13.904Z","dependency_job_id":null,"html_url":"https://github.com/cohere-ai/sandbox-topically","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cohere-ai%2Fsandbox-topically","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cohere-ai%2Fsandbox-topically/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cohere-ai%2Fsandbox-topically/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cohere-ai%2Fsandbox-topically/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cohere-ai","download_url":"https://codeload.github.com/cohere-ai/sandbox-topically/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248703196,"owners_count":21148117,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","nlp","python","topic-modeling"],"created_at":"2024-09-27T12:39:14.616Z","updated_at":"2025-04-13T10:58:18.217Z","avatar_url":"https://github.com/cohere-ai.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"```\n################################################################################\n#    ____      _                     ____                  _ _                 #\n#   / ___|___ | |__   ___ _ __ ___  / ___|  __ _ _ __   __| | |__   _____  __  #\n#  | |   / _ \\| '_ \\ / _ \\ '__/ _ \\ \\___ \\ / _` | '_ \\ / _` | '_ \\ / _ \\ \\/ /  #\n#  | |__| (_) | | | |  __/ | |  __/  ___) | (_| | | | | (_| | |_) | (_) \u003e  \u003c   #\n#   \\____\\___/|_| |_|\\___|_|  \\___| |____/ \\__,_|_| |_|\\__,_|_.__/ \\___/_/\\_\\  #\n#                                                                              #\n# This project is part of Cohere Sandbox, Cohere's Experimental Open Source    #\n# offering. This project provides a library, tooling, or demo making use of    #\n# the Cohere Platform. You should expect (self-)documented, high quality code  #\n# but be warned that this is EXPERIMENTAL. Therefore, also expect rough edges, #\n# non-backwards compatible changes, or potential changes in functionality as   #\n# the library, tool, or demo evolves. Please consider referencing a specific   #\n# git commit or version if depending upon the project in any mission-critical  #\n# code as part of your own projects.                                           #\n#                                                                              #\n# Please don't hesitate to raise issues or submit pull requests, and thanks    #\n# for checking out this project!                                               #\n#                                                                              #\n################################################################################\n```\n\n**Maintainer:** [jalammar](https://github.com/jalammar) \\\n**Project maintained until at least:** 2023-04-30\n\n# A picture is worth a thousand sentences\n\n\u003cimg src=\"./assets/topic-modeling-picture-thousand-texts.png\" /\u003e\nWhen you want to explore thousands or millions of texts (messages, emails, news headlines), topic modeling tools help you make sense of them rapidly and visually.\n\n# Topically\n\nTopically is a \\[work-in-progress\\] suite of tools that help make sense of text collections (messages, articles, emails, news headlines) using large language models.\n\nTopically's first feature is to name clusters of short texts based on their content. For example, here are news headlines from the machinelearning and investing subreddits, and the names suggested for them by topically:\n\n\u003cimg src=\"./assets/topically-name_cluster.png\" /\u003e\n\n\n# Usage Example\nUse Topically to name clusters in the course of topic modeling\n\n```python\nimport topically\n\napp = topically.Topically('cohere_api_key')\n\nexample_texts = [\n# Three headlines from the machine learning subreddit\n\"[Project] From books to presentations in 10s with AR + ML\",\n\"[D] A Demo from 1993 of 32-year-old Yann LeCun showing off the World's first Convolutional Network for Text Recognition\",\n\"[R] First Order Motion Model applied to animate paintings\",\n\n# Three headlines from the investing subreddit\n\"Robinhood and other brokers literally blocking purchase of $GME, $NOK, $BB, $AMC; allow sells\",\n\"United Airlines stock down over 5% premarket trading\",\n\"Bitcoin was nearly $20,000 a year ago today\"]\n\n# We know the first three texts belong to one topic (topic 0), the last three belong to another topic (topic 1)\nexample_topics = [0, 0, 0, 1, 1, 1]\n\ntopics_of_examples, topic_names_dict = app.name_topics((example_texts, example_topics)) #Optional:  num_generations=5\ntopics_of_examples # Run again to get new suggested names. More text examples should result in better names.\n\n```\n\nOutput:\n```\n['Text recognition',\n 'Text recognition',\n 'Text recognition',\n 'Stock Market Closing Bell',\n 'Stock Market Closing Bell',\n 'Stock Market Closing Bell']\n ```\n \nIn this simple example, we know the cluster assignments. In actual applications, a topic modeling library like BERTopic can cluster the texts for us, and then we can name them with topically. \n\n# Usage Example: Topically + BERTopic\nUse Topically to name clusters in the course of topic modeling with tools like BERTopic. Get the cluster assignments from BERTopic, and name the clusters with topically. This improves on the keyword topic labels (and can build upon them).\n\n\n\u003cimg src=\"./assets/topically_name_topics.png\" /\u003e\n\n\nHere's example code and a colab notebook demonstrating this.\n\n\u003ca href=\"https://colab.research.google.com/github/cohere-ai/sandbox-topically/blob/main/notebooks/Intro%20-%20Topically%20with%20BERTopic.ipynb\" target=\"_parent\\\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e\n\nCode excerpt:\n\n```python\n\nfrom bertopic import BERTopic\nfrom topically import Topically\n\n# Load and initialize BERTopic to use KMeans clustering with 8 clusters only.\ncluster_model = KMeans(n_clusters=8)\ntopic_model = BERTopic(hdbscan_model=cluster_model)\n\n# df is a dataframe. df['title'] is the column of text we're modeling\ndf['topic'], probabilities = topic_model.fit_transform(df['title'], embeds)\n\n# Load topically\napp = Topically('cohere_api_key')\n\n# name clusters\ndf['topic_name'], topic_names = app.name_topics((df['title'], df['topic']))\n\ndf[['title', 'topic', 'topic_name']]\n```\n\n\n\u003cimg src=\"./assets/topically-name_topics-example.png\" /\u003e\n\n# Installation\n\nYou can install topically from pypi:\n\n`pip install topically`\n\nOptionally, you can also install topically with BERTopic:\n\n`pip install topically[bertopic]`\n\n\n# How it works\n\nTopically uses a generative language model (GPT) to assign a name to the text cluster. It sends a request to [Cohere](https://cohere.ai/)'s managed model (get an [API key](https://dashboard.cohere.ai/welcome/register?utm_source=github\u0026utm_medium=content\u0026utm_campaign=sandbox\u0026utm_content=topically) and use it for free for prototyping).\n\nTo generate the titles, topically uses a couple of bundled prompts. To get the best names for your use case, it's best to edit the prompt to add more information about the context, and add good cluster names for 3-5 of your clusters.\n\nThis works best on short texts (given the context length limitations of GPT models). If you're working with long texts, you may experiment with excerpts or summaries of the texts.\n\n# Architecture Overview\nTopically is pretty simple and early in its life. At the moment, it's made up of two main class:\n\n### `Topically`\nThis class maintains the client to the [Cohere](https://cohere.ai/) platform, and exposes the main interaction point with Topically (name_topics, at the moment). It lives in app.py.\n\n### `ClusterNamer`\nThis class deals with preparing the prompts and calling the Generate endpoint to generate suggested topic names. It lives in cluster_namers.py.\n\n# Get support\nIf you have any questions or comments, please file an issue or reach out to us on [Discord](https://discord.gg/co-mmunity).\n\n# Contributors\nIf you would like to contribute to this project, please read `CONTRIBUTORS.md`\nin this repository, and sign the Contributor License Agreement before submitting\nany pull requests. A link to sign the Cohere CLA will be generated the first time \nyou make a pull request to a Cohere repository.\n\n# License\nTopically has an MIT license, as found in the LICENSE file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcohere-ai%2Fsandbox-topically","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcohere-ai%2Fsandbox-topically","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcohere-ai%2Fsandbox-topically/lists"}