{"id":15358606,"url":"https://github.com/kgdunn/process-improve","last_synced_at":"2026-06-02T12:00:44.568Z","repository":{"id":57454920,"uuid":"207122113","full_name":"kgdunn/process-improve","owner":"kgdunn","description":"Python toolkit for analysis of industrial process data; multivariate analysis, designed experiments, process monitoring.","archived":false,"fork":false,"pushed_at":"2026-05-29T08:26:48.000Z","size":7639,"stargazers_count":14,"open_issues_count":45,"forks_count":12,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-05-29T09:27:08.288Z","etag":null,"topics":["batch-data-analysis","batch-process","chemometrics","design-of-experiments","experimental-design","latent-variables","multiblock-structures","multivariate-analysis","multivariate-statistics","partial-least-squares","partial-least-squares-regression","pca","pls","principal-component-analysis-pca","principal-components-regression","process-monitoring","statistical-process-control"],"latest_commit_sha":null,"homepage":"https://kgdunn.github.io/process-improve/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kgdunn.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":"SECURITY_AUDIT.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-09-08T14:09:58.000Z","updated_at":"2026-05-29T08:26:50.000Z","dependencies_parsed_at":"2025-05-16T22:22:44.107Z","dependency_job_id":"299bf8f2-2fb2-42d7-9992-671914068eb6","html_url":"https://github.com/kgdunn/process-improve","commit_stats":null,"previous_names":["kgdunn/process_improve"],"tags_count":64,"template":false,"template_full_name":null,"purl":"pkg:github/kgdunn/process-improve","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kgdunn%2Fprocess-improve","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kgdunn%2Fprocess-improve/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kgdunn%2Fprocess-improve/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kgdunn%2Fprocess-improve/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kgdunn","download_url":"https://codeload.github.com/kgdunn/process-improve/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kgdunn%2Fprocess-improve/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33820643,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batch-data-analysis","batch-process","chemometrics","design-of-experiments","experimental-design","latent-variables","multiblock-structures","multivariate-analysis","multivariate-statistics","partial-least-squares","partial-least-squares-regression","pca","pls","principal-component-analysis-pca","principal-components-regression","process-monitoring","statistical-process-control"],"created_at":"2024-10-01T12:41:56.239Z","updated_at":"2026-06-02T12:00:44.562Z","avatar_url":"https://github.com/kgdunn.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# process-improve\n\n**Multivariate analysis, designed experiments, and process monitoring for Python.**\nBuilt for chemometrics, manufacturing, and pharma data - the methods that scikit-learn skips.\n\n[![PyPI version](https://img.shields.io/pypi/v/process-improve.svg)](https://pypi.org/project/process-improve/)\n[![Python versions](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fkgdunn%2Fprocess-improve%2Fmain%2Fpyproject.toml\u0026label=python)](https://pypi.org/project/process-improve/)\n[![Downloads](https://static.pepy.tech/badge/process-improve)](https://pepy.tech/project/process-improve)\n[![Downloads per month](https://static.pepy.tech/badge/process-improve/month)](https://pepy.tech/project/process-improve)\n[![CI](https://github.com/kgdunn/process-improve/actions/workflows/run-tests.yml/badge.svg?branch=main\u0026event=push)](https://github.com/kgdunn/process-improve/actions/workflows/run-tests.yml?query=branch%3Amain)\n[![codecov](https://codecov.io/gh/kgdunn/process-improve/branch/main/graph/badge.svg)](https://codecov.io/gh/kgdunn/process-improve)\n[![Docs](https://img.shields.io/badge/docs-kgdunn.github.io-blue.svg)](https://kgdunn.github.io/process-improve/)\n[![License](https://img.shields.io/pypi/l/process-improve.svg)](LICENSE)\n\n---\n\n## What it does\n\n`process-improve` provides production-grade implementations of the methods\npractitioners actually use on real plant and lab data:\n\n- **PCA** with SVD and NIPALS, plus native missing-value handling via Trimmed\n  Score Regression\n- **PLS** regression with a fully sklearn-compatible API, VIP scores, and\n  cross-validated diagnostics\n- **TPLS** - PLS for *T-shaped (multi-block) data structures*\n- **Outlier detection** combining Hotelling's T² and SPE with an ESD-based test\n- **Designed experiments** - full-factorial, fractional-factorial, and\n  response-surface designs, plus a multi-stage DOE strategy recommender\n- **Process monitoring** - Shewhart, CUSUM, and Holt-Winters control charts\n- **Batch data analysis** - alignment, feature extraction, and multivariate\n  batch monitoring (MBPCA / MBPLS)\n- **Interactive Plotly diagnostics** bound directly to every fitted model\n\nOutputs are `pandas`-native: scores, loadings, and predictions keep your row\nand column labels.\n\nIt is the companion package to the online textbook\n[Process Improvement using Data](https://learnche.org/pid), and powers the\nstatistical engine behind [factori.al](https://factori.al).\n\n## Why not scikit-learn?\n\nscikit-learn answers *\"what fits the data?\"* - `process-improve` answers\n*\"is this batch normal, which variable went off, and how confident am I in the\nprediction?\"* The two libraries are designed to be used together;\n`process-improve` follows sklearn conventions (`fit`, `predict`, `score`, the\n`_` suffix on fitted attributes) and drops into existing pipelines.\n\n| Capability                                       | scikit-learn | process-improve |\n| ------------------------------------------------ | :----------: | :-------------: |\n| PCA, PLS with sklearn-style API                  |       ✓      |        ✓        |\n| Missing-data fitting (NIPALS / TSR)              |       -      |        ✓        |\n| Hotelling's T² + SPE outlier limits              |       -      |        ✓        |\n| Variable-level score contributions               |       -      |        ✓        |\n| Cross-validated coefficient confidence intervals |       -      |        ✓        |\n| Multi-block models (TPLS)                         |       -      |        ✓        |\n| Designed experiments (DoE)                        |       -      |        ✓        |\n| Control charts (Shewhart / CUSUM / Holt-Winters)  |       -      |        ✓        |\n| Batch process monitoring (MBPCA / MBPLS)          |       -      |        ✓        |\n| Plotly diagnostics built in                       |       -      |        ✓        |\n| Labeled `DataFrame` outputs                       |    partial   |        ✓        |\n\n## Installation\n\n```bash\npip install process-improve\n```\n\nRequires Python 3.10 or newer. Built on `numpy`, `pandas`, `scipy`,\n`scikit-learn`, `statsmodels`, `plotly`, and `pyDOE3`.\n\n## Quick start\n\n### PCA - Principal Component Analysis\n\n```python\nimport pandas as pd\nfrom process_improve.multivariate.methods import PCA, MCUVScaler\n\nX = pd.read_csv(\"your_data.csv\", index_col=0)\nX_scaled = MCUVScaler().fit_transform(X)\n\npca = PCA(n_components=3).fit(X_scaled)\nprint(pca.r2_cumulative_)         # cumulative R² per component\npca.score_plot()                  # interactive Plotly figure\n\n# Flag outliers using combined T² and SPE limits at 95% confidence\noutliers = pca.detect_outliers(conf_level=0.95)\n\n# Which variables drove the first observation off?\ncontrib = pca.score_contributions(pca.scores_.iloc[0].values)\n```\n\n### PLS - Projection to Latent Structures\n\n```python\nfrom process_improve.multivariate.methods import PLS, MCUVScaler\n\n# Scale X and Y separately\nscaler_x = MCUVScaler().fit(X)\nscaler_y = MCUVScaler().fit(Y)\nX_s, Y_s = scaler_x.transform(X), scaler_y.transform(Y)\n\npls = PLS(n_components=3).fit(X_s, Y_s)\nprint(pls.beta_coefficients_)     # regression coefficients (K x M)\nprint(pls.r2_cumulative_)         # cumulative R² for Y\nprint(pls.vip())                  # VIP scores per X variable\n\n# Predict new observations, with diagnostics on the prediction\nresult = pls.predict(scaler_x.transform(X_new))\nresult.y_hat                      # point predictions\nresult.spe                        # squared prediction error\nresult.hotellings_t2              # Hotelling's T² for new observations\n\n# Cross-validated component selection\ncv_select = PLS.select_n_components(X_s, Y_s, max_components=6)\nprint(cv_select.n_components)     # recommended number of components\nprint(cv_select.rmsecv)           # RMSECV per component count\n\n# Cross-validation with beta-coefficient confidence intervals\ncv = pls.cross_validate(X_s, Y_s, cv=\"loo\")\nprint(cv.beta_ci_lower, cv.beta_ci_upper)   # 95% CI for each beta\nprint(cv.significant)                       # betas significantly != 0\nprint(cv.q_squared)                         # cross-validated R² (Q²)\n```\n\n### DOE - multi-stage experimental strategy\n\n```python\nfrom process_improve.experiments.factor import Factor, Response\nfrom process_improve.experiments.strategy import recommend_strategy\n\nfactors = [\n    Factor(name=\"Temperature\", low=25, high=40, units=\"degC\"),\n    Factor(name=\"pH\", low=5.0, high=7.5),\n    Factor(name=\"Glucose\", low=10, high=50, units=\"g/L\"),\n]\nstrategy = recommend_strategy(\n    factors=factors,\n    responses=[Response(name=\"Yield\", goal=\"maximize\", units=\"g/L\")],\n    budget=40,\n    domain=\"fermentation\",\n)\nfor s in strategy[\"stages\"]:\n    print(s[\"stage_number\"], s[\"design_type\"], s[\"estimated_runs\"])\n```\n\nLonger, fully-worked versions of each example live in the\n[Quickstart guide](https://kgdunn.github.io/process-improve/quickstart.html)\nand the `process_improve/notebooks_examples/` folder.\n\nNew to designed experiments? The\n[**Applied DoE tutorial**](https://kgdunn.github.io/process-improve/applied_doe/index.html)\nis an eight-module worked-solution series.\n\n## API design\n\nPCA and PLS follow scikit-learn conventions: `fit()` returns `self`, fitted\nattributes end with a trailing underscore (`scores_`, `loadings_`, `spe_`,\n`hotellings_t2_`, `r2_cumulative_`, ...), and `predict()` returns an\n`sklearn.utils.Bunch` with named fields (`y_hat`, `spe`, `hotellings_t2`, ...).\nInputs are accepted as `pandas.DataFrame`, and index/column labels are\npreserved through `fit` and `transform`.\n\n## Documentation \u0026 learning resources\n\n- **API reference \u0026 user guide:** \u003chttps://kgdunn.github.io/process-improve/\u003e\n- **Applied DoE tutorial (8 modules):**\n  \u003chttps://kgdunn.github.io/process-improve/applied_doe/index.html\u003e\n- **Companion textbook:** [Process Improvement using Data](https://learnche.org/pid)\n- **Hosted experiment-design tool:** [factori.al](https://factori.al)\n- **Local docs build:** `cd docs \u0026\u0026 make html`\n\n## Citing process-improve\n\nIf you use this package in academic work, please cite it:\n\n```bibtex\n@software{dunn_process_improve,\n  author  = {Dunn, Kevin G.},\n  title   = {{process-improve: Multivariate Analysis for Process Improvement}},\n  year    = {2026},\n  version = {v1.21.4},\n  url     = {https://github.com/kgdunn/process-improve}\n}\n```\n\nA `CITATION.cff` file is included, so GitHub renders a *\"Cite this\nrepository\"* button in the sidebar.\n\n## Contributing\n\nBug reports, feature requests, and pull requests are welcome. See\n[CONTRIBUTING.md](CONTRIBUTING.md) for development setup, testing, and code\nstyle. Bugs and feature requests can be filed on the\n[issue tracker](https://github.com/kgdunn/process-improve/issues).\n\n## License\n\nMIT - see [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkgdunn%2Fprocess-improve","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkgdunn%2Fprocess-improve","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkgdunn%2Fprocess-improve/lists"}