{"id":25423926,"url":"https://github.com/dref360/spectral-metric","last_synced_at":"2025-09-01T07:32:53.091Z","repository":{"id":37202031,"uuid":"192968494","full_name":"Dref360/spectral-metric","owner":"Dref360","description":"Code for the CVPR 2019 paper : Spectral Metric for Dataset Complexity Assessment","archived":false,"fork":false,"pushed_at":"2024-03-21T15:35:41.000Z","size":1795,"stargazers_count":45,"open_issues_count":11,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-07T20:50:12.701Z","etag":null,"topics":["dataset-analysis","spectral-clustering"],"latest_commit_sha":null,"homepage":"https://dref360.github.io/spectral-metric","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Dref360.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-06-20T18:25:26.000Z","updated_at":"2024-11-28T16:49:52.000Z","dependencies_parsed_at":"2024-03-19T22:26:46.677Z","dependency_job_id":"ad154831-2839-48f5-bf87-3f77a970bed4","html_url":"https://github.com/Dref360/spectral-metric","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/Dref360/spectral-metric","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dref360%2Fspectral-metric","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dref360%2Fspectral-metric/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dref360%2Fspectral-metric/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dref360%2Fspectral-metric/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Dref360","download_url":"https://codeload.github.com/Dref360/spectral-metric/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dref360%2Fspectral-metric/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273088753,"owners_count":25043556,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset-analysis","spectral-clustering"],"created_at":"2025-02-16T22:47:13.727Z","updated_at":"2025-09-01T07:32:53.044Z","avatar_url":"https://github.com/Dref360.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003ch1 align=\"center\"\u003e\n      Spectral Metric\n    \u003c/h1\u003e\n    \u003cbr\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://dref360.github.io/spectral-metric\"\u003e\n    Documentation\n  \u003c/a\u003e\n\u003c/p\u003e\n\nThis library provides an implementation of CSG, from CVPR 2019 paper: [Spectral Metric for Dataset Complexity Assessment](https://arxiv.org/abs/1905.07299).\n\n\u003e [!NOTE]  \n\u003e CSG is a measure that estimates the complexity of a dataset by combining probability product kernel (Jebara et al.) and Graph Theory. By doing so, one can estimate the complexity of their dataset without training a model.\n\nFor the experiment part of the repo, please see [./experiments/README.md](./experiments/README.md)\n\n**Spectral metric in action**:\n\n1. [🤗 HuggingFace Space](https://huggingface.co/spaces/Dref360/spectral-metric)\n2. [In-depth analysis of CLINC-150](https://github.com/Dref360/spectral-metric/blob/master/notebooks/clinc_oos.ipynb)\n\n**Installation**\n\n`pip install spectral-metric`\n\n## How to use\n\nThis library works with two arrays, the features and the labels. The features are ideally normalized and have\nlow-dimensionality. In the paper, we use t-SNE to reduce the dimensionality.\n\n```python\nfrom spectral_metric.estimator import CumulativeGradientEstimator\nfrom spectral_metric.visualize import make_graph\n\nX, y = ...  # Your dataset with shape [N, ?], [N]\nestimator = CumulativeGradientEstimator(M_sample=250, k_nearest=5)\nestimator.fit(data=X, target=y)\ncsg = estimator.csg  # The actual complexity values.\nestimator.evals, estimator.evecs  # The eigenvalues and vectors.\n\n# You can plot the dataset with:\nmake_graph(estimator.difference, title=\"Your dataset\", classes=[\"A\", \"B\", \"C\"])\n```\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./images/example.png\" width=\"50%\"\u003e\n\u003c/p\u003e\n\n# Results\n\nWe can compare multiple datasets without training any classifier.\nFor example, we can plot the eigenvalues of the datasets, the\nhigher the values are, the harder the dataset is.\n\n![](./images/evals.png)\n\n**Note:** The actual CSG is based on the gradient of the eigenvalues,\nthis is done to overcome issues where the first classes are easy to separate, but not the last ones.\n\nPlease refer to the paper for more details.\n\n## Support\n\nFor support, please submit an issue!\n\n\n# Contributing\n\nWe are open to contributions, please submit an issue or a pull request.\n\nTo get yourself a running environment you will need [Poetry](https://python-poetry.org/), our package manager.\n\n```bash\n# Install the package and the development dependencies\npoetry install \n\n# Format the code\nmake format\n\n# Test with flake8, mypy and pytest\nmake test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdref360%2Fspectral-metric","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdref360%2Fspectral-metric","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdref360%2Fspectral-metric/lists"}