https://github.com/fraware/ai_diffusion_state
China's AI Industrialization Model and the Next Productivity Shock
https://github.com/fraware/ai_diffusion_state
ai-diffusion china econometrics economics-of-ai empirical-economics industrial-policy python
Last synced: 9 days ago
JSON representation
China's AI Industrialization Model and the Next Productivity Shock
- Host: GitHub
- URL: https://github.com/fraware/ai_diffusion_state
- Owner: fraware
- Created: 2026-05-19T14:54:41.000Z (30 days ago)
- Default Branch: main
- Last Pushed: 2026-05-28T18:48:24.000Z (21 days ago)
- Last Synced: 2026-05-28T20:24:10.781Z (21 days ago)
- Topics: ai-diffusion, china, econometrics, economics-of-ai, empirical-economics, industrial-policy, python
- Language: Python
- Homepage:
- Size: 12.4 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
###############################################################################################
# #
# _ ___ ____ _ __ __ _ ____ _ _ #
# / \ |_ _| | _ \(_)/ _|/ _|_ _ ___(_) ___ _ __ / ___|| |_ __ _| |_ ___ #
# / _ \ | | | | | | | |_| |_| | | / __| |/ _ \| '_ \ \___ \| __/ _` | __/ _ \ #
# / ___ \ | | | |_| | | _| _| |_| \__ \ | (_) | | | | ___) | || (_| | || __/ #
# /_/ \_\___| |____/|_|_| |_| \__,_|___/_|\___/|_| |_| |____/ \__\__,_|\__\___| #
# #
# #
###############################################################################################
**China's AI Diffusion State: Pilot Zones, Smart Factories, and Industrial Adoption**
Empirical pipeline for an NBER Economics of AI (China) brief measuring whether China's AI diffusion system—pilot zones, listed smart-factory recognition, hub-centered industrial adoption, and export-relevant manufacturing sectors—forms a coherent institutional chain.
## Core empirical chain
```text
AI pilot zones → listed smart-factory recognition → hub-centered industrial AI diffusion → export-relevant manufacturing sectors
```
## Repository layout
```text
configs/ source registry, keywords, city aliases
data/
seed/ manually verified small tables (committed)
raw/ downloaded sources (gitignored)
interim/ parsed tables (gitignored)
processed/ analysis-ready tables (gitignored, built by scripts)
docs/
DATA_CONTRACTS.md
source_notes/
scripts/ numbered pipeline entry points
src/diffusion_state/
tests/
outputs/
tables/
figures/
```
## Quickstart
```bash
git clone https://github.com/fraware/ai_diffusion_state.git
cd ai_diffusion_state
python -m venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# macOS/Linux
source .venv/bin/activate
make setup
make seed
make test
```
`make setup` installs the package and dependencies (including `beautifulsoup4` for HTML parsing). Run it once per environment before `make build` or `make test`.
On Windows without GNU Make: `pip install -e .[dev]` then `python scripts/00_build_seed_tables.py` and `pytest -q`.
Git Bash path note: use `cd ~/ai_diffusion_state` or `cd /c/Users/mateo/ai_diffusion_state`, not `cd c:\Users\...` (backslashes are stripped).
## Pipeline commands
| Command | Purpose |
|---|---|
| `make setup` | Editable install with dev dependencies |
| `make seed` | Build verified tables (pilot zones) from `data/seed/` |
| `make fetch` | Snapshot public HTML sources (network) |
| `make build` | All analysis-ready tables available without raw downloads |
| `make parse` | Parse smart-factory HTML when `data/raw/` is populated |
| `make baci` | Download CEPII BACI HS17 (if needed) and build export outcomes |
| `make panel` | Merge city-year analysis panel |
| `make analysis` | Baseline regressions and event study |
| `make test` | Schema and validation tests |
Target end state:
```bash
make fetch && make build && make test && make analysis && make paper
```
## Data governance
1. Never edit files under `data/raw/`.
2. Do not commit large proprietary downloads.
3. Every processed row must trace to a `source_url` or raw file.
4. Inferred fields require confidence flags (smart-factory pipeline).
5. Manual fixes go in seed or correction tables, not generated CSVs.
6. Scripts must be idempotent; paper tables come from scripts, not notebooks.
## Current status
| Phase | Status |
|---|---|
| 0 — Repo activation | CI, tests, gitignore, Makefile |
| 1 — Pilot zones | 17-zone seed → `pilot_zones.csv` with canonical schema |
| 2 — Smart factories | 2024/2025 parsed (235+274); clean + city panels; city geo memo |
| 3 — BACI exports | HS17 2017–2024 built from CEPII 202601 (`make baci`) |
| 4 — City controls | Ingestion module ready; requires EPS/NBS export in `data/raw/city_controls/` |
| 5 — Baseline analysis | Overlap tables, adoption models, event-study figure, memo v1 |
| 6 — Paper integration | Outline, appendices, red-team memo, `make paper` manifest |
## Priority order (if time is scarce)
1. Pilot zones
2. 2024 smart-factory list
3. City-year adoption table
4. Descriptive pilot-zone overlap
5. BACI export outcomes
6. City controls
7. Event study
## Documentation
- [Research design](docs/research_design.md)
- [Data dictionary](docs/data_dictionary.md) (legacy columns; see [DATA_CONTRACTS](docs/DATA_CONTRACTS.md))
- [Pilot zone source notes](docs/source_notes/pilot_zones.md)
- [Engineering brief](docs/engineering_brief.md)
- [Reproducibility](docs/REPRODUCIBILITY.md)
- [Paper outline](paper/outline.md)
- [Paper results memo](paper/results_memo.md)
- [Red-team memo](paper/red_team_memo.md)
## License and data
Raw BACI and EPS/NBS city statistics require separate download or license. See `configs/sources.yml` and `.env.example`.
## Atlas IIDS patent evidence (appendix robustness; post-submission workstream)
The PCS measurement paper is **submission-ready**. The IIDS patent layer is real but **appendix-only**. The tiered patent geography layer is frozen at approximately **65.4%** city fill. Exact publication-number applicant-address geocoding remains unavailable. Publication-ready patent F1 and causal patent claims remain blocked. Atlas true-vision work is a **separate post-submission workstream**—not required for the NBER upload.
| Command | Where |
|---------|--------|
| `make atlas-iids-workflow` | Control laptop — phase dashboard |
| `make atlas-iids-preflight` | Control laptop — disk/credential checks |
| `bash scripts/cloud_vm_bootstrap.sh` | Fresh Ubuntu VM — one-shot setup |
| `make atlas-iids-cloud STEP=...` | Cloud VM — download + convert |
| `make atlas-iids-import-copyback ARCHIVE=...` | Control laptop — after `scp` |
| `make atlas-iids-control-evidence-chain` | Control laptop — after geography file |
Playbooks (post-submission):
- [IIDS clean-restart runbook](docs/ATLAS_IIDS_CLEAN_RESTART_RUNBOOK.md)
- [IIDS execution checklist](docs/ATLAS_IIDS_EXECUTION_CHECKLIST.md)
- [Patent raw data README](data/raw/patents/README.md)