{"id":13693359,"url":"https://github.com/sdv-dev/SDMetrics","last_synced_at":"2025-05-02T21:31:53.891Z","repository":{"id":38235016,"uuid":"248772756","full_name":"sdv-dev/SDMetrics","owner":"sdv-dev","description":"Metrics to evaluate quality and efficacy of synthetic datasets.","archived":false,"fork":false,"pushed_at":"2025-04-11T17:43:05.000Z","size":2820,"stargazers_count":229,"open_issues_count":67,"forks_count":47,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-14T01:52:06.229Z","etag":null,"topics":["metrics","quality","synthetic-data"],"latest_commit_sha":null,"homepage":"https://docs.sdv.dev/sdmetrics","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sdv-dev.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-03-20T14:15:48.000Z","updated_at":"2025-04-11T17:43:09.000Z","dependencies_parsed_at":"2024-03-14T12:52:27.100Z","dependency_job_id":"c9790477-b984-44c2-bc79-e3c074f3857f","html_url":"https://github.com/sdv-dev/SDMetrics","commit_stats":{"total_commits":568,"total_committers":17,"mean_commits":"33.411764705882355","dds":0.8045774647887324,"last_synced_commit":"93c311cce8ee4a0df47fdff7df200c278feed96b"},"previous_names":[],"tags_count":42,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FSDMetrics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FSDMetrics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FSDMetrics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FSDMetrics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sdv-dev","download_url":"https://codeload.github.com/sdv-dev/SDMetrics/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252108900,"owners_count":21696156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["metrics","quality","synthetic-data"],"created_at":"2024-08-02T17:01:08.813Z","updated_at":"2025-05-02T21:31:53.883Z","avatar_url":"https://github.com/sdv-dev.png","language":"Python","funding_links":[],"categories":["Metrics and dataset evaluation","Evaluation \u0026 Benchmarking"],"sub_categories":["Tabular"],"readme":"\u003cdiv align=\"center\"\u003e\n\u003cbr/\u003e\n\u003cp align=\"center\"\u003e\n    \u003ci\u003eThis repository is part of \u003ca href=\"https://sdv.dev\"\u003eThe Synthetic Data Vault Project\u003c/a\u003e, a project from \u003ca href=\"https://datacebo.com\"\u003eDataCebo\u003c/a\u003e.\u003c/i\u003e\n\u003c/p\u003e\n\n[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)\n[![PyPI Shield](https://img.shields.io/pypi/v/sdmetrics.svg)](https://pypi.python.org/pypi/sdmetrics)\n[![Downloads](https://pepy.tech/badge/sdmetrics)](https://pepy.tech/project/sdmetrics)\n[![Tests](https://github.com/sdv-dev/SDMetrics/workflows/Run%20Tests/badge.svg)](https://github.com/sdv-dev/SDMetrics/actions?query=workflow%3A%22Run+Tests%22+branch%3Amain)\n[![Coverage Status](https://codecov.io/gh/sdv-dev/SDMetrics/branch/main/graph/badge.svg)](https://codecov.io/gh/sdv-dev/SDMetrics)\n[![Slack](https://img.shields.io/badge/Community-Slack-blue?style=plastic\u0026logo=slack)](https://bit.ly/sdv-slack-invite)\n[![Tutorial](https://img.shields.io/badge/Demo-Get%20started-orange?style=plastic\u0026logo=googlecolab)](https://bit.ly/sdmetrics-demo)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14279167.svg)](https://doi.org/10.5281/zenodo.14279167)\n\n\u003cdiv align=\"left\"\u003e\n\u003cbr/\u003e\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/sdv-dev/SDV\"\u003e\n\u003cimg align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/SDMetrics-DataCebo.png\"\u003e\u003c/img\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n# Overview\n\nThe SDMetrics library evaluates synthetic data by comparing it to the real data that you're trying to mimic. It includes a variety of metrics to capture different aspects of the data, for example **quality and privacy**. It also includes reports that you can run to generate insights, visualize data and share with your team.\n\nThe SDMetrics library is **model-agnostic**, meaning you can use any synthetic data. The library does not need to know how you created the data. \n\n\u003cimg align=\"center\" src=\"docs/images/column_comparison.png\"\u003e\u003c/img\u003e\n\n# Install\n\nInstall SDMetrics using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.\n\n```bash\npip install sdmetrics\n```\n\n```bash\nconda install -c conda-forge sdmetrics\n```\n\nFor more information about using SDMetrics, visit the [SDMetrics Documentation](https://docs.sdv.dev/sdmetrics).\n\n# Usage\n\nGet started with **SDMetrics Reports** using some demo data,\n\n```python\nfrom sdmetrics import load_demo\nfrom sdmetrics.reports.single_table import QualityReport\n\nreal_data, synthetic_data, metadata = load_demo(modality='single_table')\n\nmy_report = QualityReport()\nmy_report.generate(real_data, synthetic_data, metadata)\n```\n```\nCreating report: 100%|██████████| 4/4 [00:00\u003c00:00,  5.22it/s]\n\nOverall Quality Score: 82.84%\n\nProperties:\nColumn Shapes: 82.78%\nColumn Pair Trends: 82.9%\n```\n\nOnce you generate the report, you can drill down on the details and visualize the results.\n\n```python\nmy_report.get_visualization(property_name='Column Pair Trends')\n```\n\u003cimg align=\"center\" src=\"docs/images/column_pairs.png\"\u003e\u003c/img\u003e\n\nSave the report and share it with your team.\n```python\nmy_report.save(filepath='demo_data_quality_report.pkl')\n\n# load it at any point in the future\nmy_report = QualityReport.load(filepath='demo_data_quality_report.pkl')\n```\n\n**Want more metrics?** You can also manually apply any of the metrics in this library to your data.\n\n```python\n# calculate whether the synthetic data respects the min/max bounds\n# set by the real data\nfrom sdmetrics.single_column import BoundaryAdherence\n\nBoundaryAdherence.compute(\n    real_data['start_date'],\n    synthetic_data['start_date']\n)\n```\n```\n0.8503937007874016\n```\n\n```python\n# calculate whether the synthetic data is new or whether it's an exact copy of the real data\nfrom sdmetrics.single_table import NewRowSynthesis\n\nNewRowSynthesis.compute(\n    real_data,\n    synthetic_data,\n    metadata\n)\n```\n```\n1.0\n```\n\n# What's next?\n\nTo learn more about the reports and metrics, visit the [SDMetrics Documentation](https://docs.sdv.dev/sdmetrics). \n\n---\n\n\n\u003cdiv align=\"center\"\u003e\n\u003ca href=\"https://datacebo.com\"\u003e\u003cimg align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png\"\u003e\u003c/img\u003e\u003c/a\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\u003cbr/\u003e\n\n[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](\nhttps://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we\ncreated [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.\nToday, DataCebo is the proud developer of SDV, the largest ecosystem for\nsynthetic data generation \u0026 evaluation. It is home to multiple libraries that support synthetic\ndata, including:\n\n* 🔄 Data discovery \u0026 transformation. Reverse the transforms to reproduce realistic data.\n* 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,\n  multi table and time series data.\n* 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data\n  generation models.\n\n[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully\nintegrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries\nfor specific needs.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsdv-dev%2FSDMetrics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsdv-dev%2FSDMetrics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsdv-dev%2FSDMetrics/lists"}