{"id":32647568,"url":"https://github.com/jacobmarks/clustering-plugin","last_synced_at":"2025-10-31T05:55:32.406Z","repository":{"id":222902317,"uuid":"734209115","full_name":"jacobmarks/clustering-plugin","owner":"jacobmarks","description":"Compute clustering on your data in a visual, intuitive way with FiftyOne and Sklearn!","archived":false,"fork":false,"pushed_at":"2024-04-05T21:44:31.000Z","size":61,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-04-16T07:21:07.621Z","etag":null,"topics":["clustering-algorithm","computer-vision","data-centric-ai","data-visualization","embeddings","fiftyone","python","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jacobmarks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-21T06:07:49.000Z","updated_at":"2024-04-05T03:48:48.000Z","dependencies_parsed_at":null,"dependency_job_id":"5ca42a4e-d369-4a2a-acf9-ab56a344e529","html_url":"https://github.com/jacobmarks/clustering-plugin","commit_stats":null,"previous_names":["jacobmarks/clustering-runs-plugin","jacobmarks/clustering-plugin"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jacobmarks/clustering-plugin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fclustering-plugin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fclustering-plugin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fclustering-plugin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fclustering-plugin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jacobmarks","download_url":"https://codeload.github.com/jacobmarks/clustering-plugin/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacobmarks%2Fclustering-plugin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281937758,"owners_count":26586774,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-31T02:00:07.401Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering-algorithm","computer-vision","data-centric-ai","data-visualization","embeddings","fiftyone","python","unsupervised-learning"],"created_at":"2025-10-31T05:55:26.819Z","updated_at":"2025-10-31T05:55:32.394Z","avatar_url":"https://github.com/jacobmarks.png","language":"Python","readme":"# Clustering Plugin for FiftyOne\n\n![GPT4V_labels](https://github.com/jacobmarks/clustering-runs-plugin/assets/12500356/27b0ace4-ecd8-4f9d-821a-614fd597be7f)\n\nThis plugin provides a FiftyOne App that allows you to cluster your dataset using a variety of algorithms:\n\n- [K-Means](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)\n- [Birch](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Birch.html#sklearn.cluster.Birch)\n- [Agglomerative](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering)\n- [HDBSCAN](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.HDBSCAN)\n\nIt also serves as a proof of concept for adding new \"types\" of runs to FiftyOne!!!\n\n## Installation\n\n```bash\nfiftyone plugins download https://github.com/jacobmarks/clustering-plugin\n```\n\nYou will also need to have `scikit-learn` installed:\n\n```bash\npip install -U scikit-learn\n```\n\n## Usage\n\n### Clustering\n\nOnce you have the plugin installed, you can generate clusters for your dataset using the\n`compute_clusters` operator:\n\n![compute_clusters_from_scratch](https://github.com/jacobmarks/clustering-runs-plugin/assets/12500356/c701d40a-ddf4-47a7-bb5d-8f026a54bb6e)\n\nThe specific arguments depend on the `method` you choose — `kmeans`, `birch`, or `agglomerative`.\n\nHere, we are generating clusters at the same time as we are generating the embeddings, but you can also generate clusters from existing embeddings:\n\n![compute_clusters_from_embeddings](https://github.com/jacobmarks/clustering-runs-plugin/assets/12500356/950c10d7-9d7e-4876-a2ea-66574e594607)\n\nYou can generate clusters for:\n\n- Your entire dataset\n- A view of your dataset\n- Currently selected samples in the App\n\nAdditionally, you can run the operator in:\n\n- Real-time, or\n- In the background, as a delegated operation\n\nOnce you have generated clusters, you can view information about the clusters in the App with the `get_clustering_run_info` operator:\n\n![get_cluster_info](https://github.com/jacobmarks/clustering-runs-plugin/assets/12500356/63660858-091f-4a94-865e-a3fb41c2c2c6)\n\n### Visualizing Clusters\n\nIt can be insightful to use clustering in conjunction with `compute_visualization` to visualize the clusters:\n\n![visualize_clusters](https://github.com/jacobmarks/clustering-runs-plugin/assets/12500356/2c48fdcb-c59c-4b46-a27f-a248a6974d4c)\n\n### Labeling Clusters\n\nOnce you have generated clusters, you can also use the magic of multimodal AI to automatically\nassign short descriptions, or labels to each cluster!\n\nThis is achieved by randomly selecting a few samples from each cluster, and prompting\nGPT-4V to generate a description for the cluster from the samples.\n\nTo use this functionality, you must have an API key for OpenAI's GPT-4V API, and you must set it in your environment as `OPENAI_API_KEY`.\n\n```bash\nexport OPENAI_API_KEY=your-api-key\n```\n\nThen, you can label the clusters using the `label_clusters_with_gpt4v` operator.\nThis might take a minute or so, depending on the number of clusters, but it is worth it!\nIt is recommended to delegate the execution of this operation, and then launch it via\n\n```bash\nfiftyone delegated launch\n```\n\nThen you can view the labels in the App!\n\n![GPT4V_labels](https://github.com/jacobmarks/clustering-runs-plugin/assets/12500356/27b0ace4-ecd8-4f9d-821a-614fd597be7f)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacobmarks%2Fclustering-plugin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjacobmarks%2Fclustering-plugin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacobmarks%2Fclustering-plugin/lists"}