https://github.com/omnipotence-eth/manufacturing-quality-analytics

SQL + Python pipeline for semiconductor NCR analysis — supplier performance, defect Pareto, yield trends
https://github.com/omnipotence-eth/manufacturing-quality-analytics

analytics data-analysis etl manufacturing matplotlib pandas postgresql python quality sql

Last synced: 3 months ago
JSON representation

SQL + Python pipeline for semiconductor NCR analysis — supplier performance, defect Pareto, yield trends

Host: GitHub
URL: https://github.com/omnipotence-eth/manufacturing-quality-analytics
Owner: omnipotence-eth
License: mit
Created: 2026-04-07T23:20:03.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-04-09T11:08:12.000Z (3 months ago)
Last Synced: 2026-04-09T12:16:01.492Z (3 months ago)
Topics: analytics, data-analysis, etl, manufacturing, matplotlib, pandas, postgresql, python, quality, sql
Language: Jupyter Notebook
Size: 1.06 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

README

          





# Manufacturing Quality Analytics

**SQL + Python pipeline for semiconductor NCR analysis — supplier performance, defect Pareto, yield trends**




[![CI](https://github.com/omnipotence-eth/manufacturing-quality-analytics/actions/workflows/ci.yml/badge.svg)](https://github.com/omnipotence-eth/manufacturing-quality-analytics/actions/workflows/ci.yml)

[![Python](https://img.shields.io/badge/python-3.11+-3776AB?style=flat-square&logo=python&logoColor=white)](https://python.org)

[![PostgreSQL](https://img.shields.io/badge/PostgreSQL-18-336791?style=flat-square&logo=postgresql&logoColor=white)](https://www.postgresql.org)

[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-D7FF64?style=flat-square&logo=ruff&logoColor=black)](https://github.com/astral-sh/ruff)

[![License: MIT](https://img.shields.io/badge/license-MIT-22C55E?style=flat-square)](LICENSE)




[Architecture](#architecture)  ·  [Key Findings](#key-findings)  ·  [SQL Highlights](#sql-highlights)  ·  [Visualizations](#visualizations)  ·  [Quick Start](#quick-start)  ·  [Contributing](CONTRIBUTING.md)






---

## What is This?

A production-grade data analytics pipeline that ingests 2,500 simulated Non-Conformance Reports (NCRs) from a semiconductor manufacturing operation, loads them into PostgreSQL, and answers 10 business questions with SQL — then renders 7 professional visualizations in a Jupyter notebook.

The dataset is synthetically generated from real manufacturing quality domain knowledge (GM Quality + Shield AI supplier quality experience): realistic supplier defect curves, lot sizes drawn from log-normal distributions, a corrective action improvement trajectory baked into the worst-performing supplier, and shift-level quality variation matching industry norms.

> **What this demonstrates**: SQL fluency (CTEs, window functions, Pareto, self-joins), a clean Python data pipeline, manufacturing domain expertise, and production engineering habits — the exact combination DA/DE interviews test.

---

## Why

Manufacturing quality data lives in spreadsheets and disconnected databases. This project demonstrates how to build a proper analytics pipeline — PostgreSQL for structured storage, Python for ETL and statistical analysis, and SQL queries that answer real questions about defect rates, supplier performance, and process capability. Built from experience in automotive and aerospace quality engineering, it models the kind of NCR (non-conformance report) analysis that quality teams actually need but rarely have automated.

---

## Architecture

```mermaid

graph LR

    A["generate_synthetic.py\n2,500 NCRs"] --> B["data/raw/\nquality_records.csv"]

    B --> C["load_data.py\nSQLAlchemy · psycopg2"]

    C --> D[("PostgreSQL 18\nmanufacturing_qa")]

    D --> E["sql/queries.sql\n10 business queries"]

    D --> F["analysis.ipynb\nSQLAlchemy connection"]

    E -.->|reference| F

    F --> G["7 Visualizations\nMatplotlib · Seaborn"]

    G --> H["visuals/*.png"]

```

### Data model

| Column | Type | Description |

|--------|------|-------------|

| `ncr_number` | `varchar` | Unique NCR ID — `NCR-202301-0001` format |

| `supplier_name` / `supplier_tier` | `varchar` / `int` | Supplier identity and qualification tier (1–3) |

| `production_line` / `shift` | `varchar` | Where and when the defect was found |

| `defect_code` / `defect_type` | `varchar` | 15-category defect taxonomy (D001–C015) |

| `quantity_received` / `quantity_rejected` | `int` | Lot size and rejection volume |

| `defect_rate` | `float` | Rejection rate for this NCR event |

| `opened_date` / `closed_date` | `date` | NCR lifecycle timestamps |

| `days_to_close` | `int` | Disposition cycle time |

| `disposition` | `varchar` | Use As Is / Rework / Return to Supplier / Scrap |

| `first_pass` | `bool` | Whether the lot passed first inspection |

---

## Key Findings

1. **FastTrack Supply** leads defect rate at **6.9%** — 8.1× higher than best-in-class AeroParts Manufacturing (0.85%). Corrective action plan implemented July 2023 drove a measurable improvement through H2 2023.

2. **Dimensional and Surface Finish defects** account for **40% of all rejected units** — Pareto-validated. These two categories are the only ones that warrant dedicated inspection protocols.

3. **Night shift** runs **1.4× higher defect rate** than Day shift across all production lines. The gap is largest in Electronics Integration (LINE\_B), flagging a staffing or training gap on nights.

4. **4 of 7 suppliers** triggered the rolling 30-day repeat-offender threshold (3+ NCRs in 30 days), indicating systemic lot-level problems rather than random variation — mandatory CAP criteria met.

---

## SQL Highlights

### Rolling 30-day repeat offender detection

```sql

-- Suppliers with 3+ NCRs in any rolling 30-day window

WITH windowed AS (

    SELECT

        supplier_name,

        opened_date,

        COUNT(*) OVER (

            PARTITION BY supplier_name

            ORDER BY opened_date

            RANGE BETWEEN INTERVAL '29 days' PRECEDING AND CURRENT ROW

        ) AS ncrs_in_30d_window

    FROM quality_records

)

SELECT DISTINCT

    supplier_name,

    MAX(ncrs_in_30d_window) OVER (PARTITION BY supplier_name) AS max_ncrs_in_any_30d

FROM windowed

WHERE ncrs_in_30d_window >= 3

ORDER BY max_ncrs_in_any_30d DESC;

```

### Composite supplier scorecard (PERCENT_RANK + CTE)

```sql

-- Weighted composite: defect 50%, response time 30%, FPY 20%

scored AS (

    SELECT *,

        ROUND((100.0 * (1 - PERCENT_RANK() OVER (ORDER BY defect_rate_pct DESC)))::numeric, 1) AS defect_score,

        ROUND((100.0 * (1 - PERCENT_RANK() OVER (ORDER BY avg_close_days DESC)))::numeric, 1)  AS response_score,

        ROUND((PERCENT_RANK() OVER (ORDER BY fpy_pct) * 100)::numeric, 1)                      AS fpy_score

    FROM supplier_stats

)

SELECT *,

    ROUND(0.50 * defect_score + 0.30 * response_score + 0.20 * fpy_score, 1) AS composite_score,

    RANK() OVER (ORDER BY (...) DESC) AS overall_rank

FROM scored;

```

### Pareto with running cumulative total

```sql

ROUND(

    100.0 * SUM(total_rejected) OVER (

        ORDER BY total_rejected DESC

        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

    ) / SUM(total_rejected) OVER (),

2) AS cumulative_pct

```

**All 10 queries:** defect rate by supplier, Pareto of defect types, monthly trend with rolling average, yield by production line, average time to disposition, repeat offender window function, defect rate by shift, co-occurring defect self-join, first pass yield by month, composite supplier scorecard.

---

## Visualizations

### Defect Rate by Supplier

![Defect Rate by Supplier](visuals/01_defect_rate_by_supplier.png)

### Pareto of Defect Types

![Pareto of Defect Types](visuals/02_pareto_defect_types.png)

### Monthly NCR Trend with Rolling Average

![Monthly Trend](visuals/03_monthly_trend.png)

### Defect Rate Heatmap — Production Line × Shift

![Heatmap](visuals/04_heatmap_line_shift.png)

### Supplier Quality Scorecard

![Supplier Scorecard](visuals/05_supplier_scorecard.png)

### First Pass Yield by Month

![First Pass Yield](visuals/06_first_pass_yield.png)

### Shift Quality Comparison

![Shift Comparison](visuals/07_shift_comparison.png)

---

## Quick Start

### Prerequisites

- Python 3.11+

- PostgreSQL 18 (local or Docker)

- conda or pip

### Install

```bash

git clone https://github.com/omnipotence-eth/manufacturing-quality-analytics.git

cd manufacturing-quality-analytics

pip install -r requirements.txt

```

### Configure

```bash

cp .env.example .env

# Edit .env:

# DATABASE_URL=postgresql://user:password@localhost:5432/manufacturing_qa

```

### Run

```bash

# 1. Generate synthetic dataset (2,500 NCRs → data/raw/quality_records.csv)

python src/generate_synthetic.py

# 2. Create database and load data

#    createdb manufacturing_qa  (if not already created)

python src/load_data.py

# 3. Open the analysis notebook

jupyter lab notebooks/analysis.ipynb

# Run all cells — charts export automatically to visuals/

# 4. Run standalone SQL queries

psql $DATABASE_URL -f sql/queries.sql

```

### Tests

```bash

pytest -q

```

---

## Tech Stack

View full stack




| Layer | Technology | Notes |

|-------|-----------|-------|

| **Data generation** | Python, NumPy | Log-normal lot sizes, realistic supplier defect curves, FastTrack CAP improvement trajectory |

| **Data pipeline** | Pandas, SQLAlchemy 2.x | Type enforcement, null guards, batch insert, schema confirmation |

| **Database** | PostgreSQL 18, psycopg2 | `quality_records` table — 20 columns, 2,500 rows |

| **SQL** | PostgreSQL SQL | CTEs, window functions (`RANGE BETWEEN`, `PERCENT_RANK`, `RANK`), self-joins, `PERCENTILE_CONT` |

| **Analysis** | Jupyter Lab, Pandas | All queries run via SQLAlchemy connection — no CSV re-reads |

| **Visualization** | Matplotlib, Seaborn | `seaborn-v0_8-whitegrid` style, consistent palette, exported PNG at 150 DPI |

| **Config** | python-dotenv | `DATABASE_URL` from `.env` — never hardcoded |

| **Code quality** | Ruff, mypy | Line length 100, `from __future__ import annotations`, typed public signatures |

| **Testing** | pytest | 20 unit tests for synthetic data generator — schema, ranges, domain invariants |

| **CI** | GitHub Actions | lint (ruff check + format) → test (pytest) on every push and PR |

---

## Documentation

| Document | Contents |

|----------|---------|

| [CONTRIBUTING.md](CONTRIBUTING.md) | Branch strategy, ship workflow, audit checklist, commit standards, PR checklist |

| [CHANGELOG.md](CHANGELOG.md) | Version history |

| [SECURITY.md](SECURITY.md) | Security model and vulnerability reporting |

| [sql/queries.sql](sql/queries.sql) | All 10 annotated standalone SQL queries |

---

## License

MIT — see [LICENSE](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/omnipotence-eth/manufacturing-quality-analytics

Awesome Lists containing this project

README