{"id":51261601,"url":"https://github.com/europanite/data-analysis-stability-evaluator","last_synced_at":"2026-06-29T12:32:10.249Z","repository":{"id":363689213,"uuid":"1262877471","full_name":"europanite/data-analysis-stability-evaluator","owner":"europanite","description":"data-analysis-stability-evaluator","archived":false,"fork":false,"pushed_at":"2026-06-09T23:33:53.000Z","size":34,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-10T01:13:14.672Z","etag":null,"topics":["data-analysis","data-analysis-project","data-analysis-reliability","data-analysis-stability","data-analytics","small-data-change"],"latest_commit_sha":null,"homepage":"https://europanite.github.io/data-analysis-stability-evaluator/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/europanite.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-08T12:07:05.000Z","updated_at":"2026-06-09T23:33:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/europanite/data-analysis-stability-evaluator","commit_stats":null,"previous_names":["europanite/data-analysis-stability-evaluator"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/europanite/data-analysis-stability-evaluator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/europanite%2Fdata-analysis-stability-evaluator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/europanite%2Fdata-analysis-stability-evaluator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/europanite%2Fdata-analysis-stability-evaluator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/europanite%2Fdata-analysis-stability-evaluator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/europanite","download_url":"https://codeload.github.com/europanite/data-analysis-stability-evaluator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/europanite%2Fdata-analysis-stability-evaluator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34927675,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-29T02:00:05.398Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-analysis-project","data-analysis-reliability","data-analysis-stability","data-analytics","small-data-change"],"created_at":"2026-06-29T12:32:08.541Z","updated_at":"2026-06-29T12:32:10.244Z","avatar_url":"https://github.com/europanite.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [data-analysis-stability-evaluator](https://github.com/europanite/data-analysis-stability-evaluator \"data-analysis-stability-evaluator\")\n\n[![CI](https://github.com/europanite/data-analysis-stability-evaluator/actions/workflows/ci.yml/badge.svg)](https://github.com/europanite/data-analysis-stability-evaluator/actions/workflows/ci.yml)\n[![CodeQL Advanced](https://github.com/europanite/data-analysis-stability-evaluator/actions/workflows/codeql.yml/badge.svg)](https://github.com/europanite/data-analysis-stability-evaluator/actions/workflows/codeql.yml)\n[![pages-build-deployment](https://github.com/europanite/data-analysis-stability-evaluator/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/europanite/data-analysis-stability-evaluator/actions/workflows/pages/pages-build-deployment)\n[![Publish Python package](https://github.com/europanite/data-analysis-stability-evaluator/actions/workflows/publish.yml/badge.svg)](https://github.com/europanite/data-analysis-stability-evaluator/actions/workflows/publish.yml)\n\n`data-analysis-stability-evaluator` is a Python package for data analysis stability evaluation against small data changes.\n\nIt is designed for practical data analysis projects where the risk is not only model accuracy, but also whether conclusions, aggregate values, rankings, feature summaries, profile reports, and prediction outputs change too much when the input data changes slightly.\n\n## Why this package exists\n\nSmall data changes can unexpectedly change analysis results.\n\nExamples:\n\n- a few rows are removed\n- missing values slightly increase\n- numeric values contain small noise\n- categorical values shift\n- input data distribution changes\n- analysis outputs depend too strongly on fragile assumptions\n\nThis package helps detect those risks by comparing baseline data, perturbed data, and analysis outputs.\n\nA low risk score does not prove that an analysis is correct. It only means that the selected outputs were not highly sensitive to the tested perturbations.\n\n## What it checks\n\nThe package supports four stability layers.\n\n### 1. Data profile stability\n\n- schema changes\n- row-count changes\n- missingness shifts\n- numeric distribution shifts\n- categorical distribution shifts\n\n### 2. Perturbation-based sensitivity\n\n- row removal\n- row duplication\n- missing-value injection\n- numeric noise injection\n- categorical value swaps\n\n### 3. Analysis-output stability\n\n- run the same analysis function on baseline and perturbed data\n- flatten nested outputs into comparable scalar metrics\n- compare numbers, rates, group summaries, rankings, and flags\n\n### 4. Prediction stability\n\n- compare prediction vectors\n- compute disagreement rate\n- compute numeric prediction drift\n\n## Installation\n\nFrom PyPI:\n\n```bash\npython -m pip install data-analysis-stability-evaluator\n```\n\nFrom TestPyPI:\n\n```bash\npython -m pip install \\\n  --index-url https://test.pypi.org/simple/ \\\n  --extra-index-url https://pypi.org/simple/ \\\n  data-analysis-stability-evaluator\n```\n\n## Quick start\n\nCreate sample CSV files:\n\n```bash\nanalysis-stability sample-data --out data\n```\n\nThis writes:\n\n```text\ndata/baseline.csv\ndata/candidate.csv\n```\n\nCompare the two CSV files:\n\n```bash\nanalysis-stability profile data/baseline.csv data/candidate.csv --out reports/profile_stability.json\n```\n\nCreate a perturbed copy of a CSV file:\n\n```bash\nanalysis-stability perturb data/baseline.csv \\\n  --out data/perturbed.csv \\\n  --row-drop-rate 0.02 \\\n  --missing-rate 0.01 \\\n  --numeric-noise-rate 0.02\n```\n\nRun the built-in example:\n\n```bash\nanalysis-stability example --out reports/example\n```\n\n## Python API example\n\n```python\nimport pandas as pd\n\nfrom data_analysis_stability_evaluator import (\n    DataProfiler,\n    PerturbationConfig,\n    StabilityEvaluator,\n    perturb_dataframe,\n)\n\nbaseline = pd.read_csv(\"data/baseline.csv\")\n\nperturbed = perturb_dataframe(\n    baseline,\n    PerturbationConfig(\n        row_drop_rate=0.02,\n        missing_rate=0.01,\n        numeric_noise_rate=0.02,\n        random_seed=42,\n    ),\n)\n\nprofile_report = DataProfiler.compare(baseline, perturbed)\nprint(profile_report.risk_score)\nprint(profile_report.to_frame())\n```\n\nYou can also test whether your own analysis function is stable.\n\n```python\nimport pandas as pd\n\nfrom data_analysis_stability_evaluator import PerturbationConfig, StabilityEvaluator\n\ndf = pd.read_csv(\"data/baseline.csv\")\n\n\ndef analysis(data: pd.DataFrame) -\u003e dict:\n    return {\n        \"row_count\": len(data),\n        \"mean_revenue\": data[\"revenue\"].mean(),\n        \"conversion_rate\": data[\"converted\"].mean(),\n        \"segment_share\": data[\"segment\"].value_counts(normalize=True).to_dict(),\n    }\n\n\nconfig = PerturbationConfig(\n    row_drop_rate=0.02,\n    missing_rate=0.01,\n    numeric_noise_rate=0.02,\n    random_seed=42,\n)\n\nevaluator = StabilityEvaluator(\n    analysis_fn=analysis,\n    config=config,\n    n_runs=20,\n)\n\nreport = evaluator.evaluate(df)\n\nprint(report.summary)\nprint(report.details.sort_values(\"score\", ascending=False).head())\n```\n\n## Risk score interpretation\n\nThe risk score is a diagnostic score, not a mathematical guarantee.\n\nRecommended initial interpretation:\n\n| Score | Meaning |\n|---:|---|\n| 0.00-0.05 | Stable under tested perturbations |\n| 0.05-0.15 | Some sensitivity; review affected metrics |\n| 0.15+ | Unstable; conclusions may depend on small input changes |\n\nUsers should choose perturbation settings and thresholds that match their own domain risk.\n\nThe repository root contains project-level files such as GitHub Actions, Docker Compose, documentation, and repository metadata.\n\nThe `service/` directory is the Python package project root. Build, test, lint, and package commands are run from `service/`.\n\n## Local development\n\nCreate and activate a virtual environment from the repository root:\n\n```bash\npython3 -m venv env\nsource env/bin/activate\npython -m pip install --upgrade pip\n```\n\nInstall the package in editable mode with development tools:\n\n```bash\ncd service\npython -m pip install -e \".[dev]\"\n```\n\nRun tests and lint:\n\n```bash\npytest\nruff check src tests examples\n```\n\n## Docker Compose workflow\n\nDocker Compose commands are run from the repository root.\n\nRun tests:\n\n```bash\ndocker compose run --rm tests\n```\n\nRun the example:\n\n```bash\ndocker compose run --rm example\n```\n\nOpen a shell:\n\n```bash\ndocker compose run --rm shell\n```\n\nThe Docker build context is `./service`, and the container working directory is `/workspace/service`.\n\n## Requirements management\n\nRuntime dependencies are defined in `service/requirements.in`.\n\nDevelopment and release dependency inputs are:\n\n```text\nservice/requirements-dev.in\nservice/requirements-release.in\n```\n\nPinned requirement files are generated with:\n\n```bash\n./scripts/freeze-requirements.sh\n```\n\nInstall the pinned release environment:\n\n```bash\npython -m pip install -r service/requirements-release.txt\n```\n\n## Build a local distribution\n\nFrom the repository root:\n\n```bash\nsource env/bin/activate\npython -m pip install --upgrade -r service/requirements-release.txt\n\ncd service\npytest\nruff check src tests examples\n\nrm -rf dist build *.egg-info src/*.egg-info\npython -m build\npython -m twine check dist/*\n```\n\nExpected output:\n\n```text\ndist/data_analysis_stability_evaluator-\u003cversion\u003e.tar.gz\ndist/data_analysis_stability_evaluator-\u003cversion\u003e-py3-none-any.whl\n```\n\nInspect the wheel:\n\n```bash\npython -m zipfile -l dist/*.whl | grep -E 'data_analysis_stability_evaluator|entry_points|METADATA'\n```\n\n## Test the wheel in a clean environment\n\nFrom `service/`:\n\n```bash\npython3 -m venv /tmp/dase-wheel-test\nsource /tmp/dase-wheel-test/bin/activate\n\npython -m pip install --upgrade pip\npython -m pip install dist/*.whl\n\nanalysis-stability --help\nanalysis-stability sample-data --out /tmp/dase-data\nanalysis-stability profile /tmp/dase-data/baseline.csv /tmp/dase-data/candidate.csv --out /tmp/dase-report.json\n\npython - \u003c\u003c'PY'\nfrom data_analysis_stability_evaluator import DataProfiler, StabilityEvaluator, PerturbationConfig\nprint(\"import ok\")\nPY\n\ndeactivate\n```\n\nIf this passes, the local wheel is installable and the CLI entry point works.\n\n## TestPyPI publishing\n\nThis project uses PyPI Trusted Publishing through GitHub Actions.\n\nNo local API token is required when Trusted Publishing is configured correctly.\n\nThe workflow file is:\n\n```text\n.github/workflows/publish.yml\n```\n\nThe Python package project is:\n\n```text\nservice/pyproject.toml\n```\n\n```text\nActions\n→ Publish Python package\n→ Run workflow\n→ target: testpypi\n```\n\nAfter publishing to TestPyPI, verify installation:\n\n```bash\npython3 -m venv /tmp/dase-testpypi\nsource /tmp/dase-testpypi/bin/activate\n\npython -m pip install --upgrade pip\npython -m pip install \\\n  --index-url https://test.pypi.org/simple/ \\\n  --extra-index-url https://pypi.org/simple/ \\\n  data-analysis-stability-evaluator\n\nanalysis-stability --help\nanalysis-stability sample-data --out /tmp/dase-testpypi-data\nanalysis-stability profile /tmp/dase-testpypi-data/baseline.csv /tmp/dase-testpypi-data/candidate.csv --out /tmp/dase-testpypi-report.json\n\ndeactivate\n```\n\nThen run the workflow with:\n\n```text\ntarget: pypi\n```\n\nor publish through the release process described in `docs/release.md`.\n\n## Versioning note\n\nPackage versions cannot be uploaded twice to PyPI or TestPyPI.\n\n## License\n\nApache License 2.0.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feuropanite%2Fdata-analysis-stability-evaluator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feuropanite%2Fdata-analysis-stability-evaluator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feuropanite%2Fdata-analysis-stability-evaluator/lists"}