{"id":51179109,"url":"https://github.com/duc-v-le/stata-empirical-methods","last_synced_at":"2026-06-27T06:01:45.020Z","repository":{"id":367653841,"uuid":"1281774048","full_name":"duc-v-le/stata-empirical-methods","owner":"duc-v-le","description":"Reproducible applied-econometrics portfolio in Stata: typeset Reference + Guide books and 12 documented .do scripts (DiD, IV, RD, synthetic control, GMM).","archived":false,"fork":false,"pushed_at":"2026-06-27T00:35:44.000Z","size":2831,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-27T00:38:35.736Z","etag":null,"topics":["causal-inference","difference-in-differences","econometrics","panel-data","reproducible-research","stata"],"latest_commit_sha":null,"homepage":"https://duc-v-le.github.io","language":"Stata","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/duc-v-le.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-26T23:02:10.000Z","updated_at":"2026-06-27T00:35:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/duc-v-le/stata-empirical-methods","commit_stats":null,"previous_names":["duc-v-le/stata-empirical-methods"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/duc-v-le/stata-empirical-methods","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duc-v-le%2Fstata-empirical-methods","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duc-v-le%2Fstata-empirical-methods/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duc-v-le%2Fstata-empirical-methods/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duc-v-le%2Fstata-empirical-methods/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/duc-v-le","download_url":"https://codeload.github.com/duc-v-le/stata-empirical-methods/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duc-v-le%2Fstata-empirical-methods/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34843147,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-27T02:00:06.362Z","response_time":126,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["causal-inference","difference-in-differences","econometrics","panel-data","reproducible-research","stata"],"created_at":"2026-06-27T06:01:44.494Z","updated_at":"2026-06-27T06:01:45.015Z","avatar_url":"https://github.com/duc-v-le.png","language":"Stata","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Stata skill portfolio — [Duc V. Le](https://duc-v-le.github.io)\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20952641.svg)](https://doi.org/10.5281/zenodo.20952641)\n\nA small, **fully runnable** Stata project that demonstrates the empirical workflow\napplied economists use end to end: load → clean → model → diagnose → export. It pairs 12\ndocumented `.do` scripts with two typeset reference books, spanning data management,\nregression, panel/DiD and event studies, time series, IV/2SLS, limited-dependent-variable\nmodels, staggered DiD, regression discontinuity, synthetic control, and dynamic-panel GMM.\n\nEvery script was executed on **Stata 19.5 SE** and runs clean; the figures and tables in\n`output/` are the real results.\n\n**The two typeset books are in [`books/`](books/):** the [Reference](books/Stata-Reference.pdf)\n(method, code, and results for all 12 scripts) and the [Guide](books/Stata-Guide.pdf)\n(a line-by-line walkthrough). Each book is archived with a citable DOI —\nReference: [10.5281/zenodo.20950265](https://doi.org/10.5281/zenodo.20950265) ·\nGuide: [10.5281/zenodo.20951436](https://doi.org/10.5281/zenodo.20951436).\n\n## How to run\nFrom inside this folder, in Stata:\n```stata\ndo 00_run_all.do          // runs the whole portfolio\n```\nOr in batch from a shell:\n```bash\nSTATA=\"/Applications/StataNow19SE/StataSE19.app/Contents/MacOS/stata-se\"\n\"$STATA\" -b do 00_run_all.do\n```\nEach script writes a log to `logs/\u003cname\u003e.log` and any tables/figures to `output/`. (Batch mode\nalso drops a session log named after the do-file in this folder — the canonical per-script logs\nlive in `logs/`; the top-level `*.log` can be deleted after a run.)\n\n### One-time package setup\nThe panel/table/plot scripts use five community (SSC) packages. Install them all with:\n```stata\ndo packages/install_packages.do\n```\nSee [`packages/PACKAGES.md`](packages/PACKAGES.md) for what each package does and why it's\nneeded; `packages/installed_manifest.txt` is the provenance snapshot. Everything else uses\nbase Stata (no install needed).\n\n## File map (run order)\n| File | What it teaches |\n|---|---|\n| `00_run_all.do` | Master script — runs everything in order |\n| `01_basics.do` | Load (`sysuse`), `describe`/`summarize`/`tabulate`, `generate`/`egen`, labels, `if`, graphs, save `.dta` |\n| `02_data_management.do` | `merge`, `reshape` (wide↔long), `collapse`, dates, string functions, `duplicates` |\n| `03_regression.do` | OLS, robust SE, factor variables (`i.`/`c.`/`##`), `margins`, postestimation (`test`, `estat hettest`, `vif`), `esttab` → LaTeX + CSV |\n| `04_panel_did_eventstudy.do` | `xtset`, fixed effects (`xtreg`, `reghdfe`), difference-in-differences, dynamic event study + `coefplot` |\n| `05_timeseries.do` | `tsset`, lag/diff operators (`L.`/`D.`), `ac`, Newey–West HAC SE, level-break test |\n| `06_import_public_data.do` | `import`/`export delimited`, importing a **real FRED series** live via `import fred` (CSV fallback) |\n| `07_iv_2sls.do` | Instrumental variables / 2SLS (`ivregress`), weak-IV / endogeneity / overid diagnostics, IV with FE (`ivreghdfe`) |\n| `08_logit_probit.do` | Binary outcomes: LPM vs logit vs probit, odds ratios, **average marginal effects** (`margins`), predicted-probability plot, classification + ROC |\n| `09_staggered_did_csdid.do` | Staggered-adoption DiD (Callaway–Sant'Anna `csdid`): group-time ATTs, event-study/simple/group aggregations, vs. the biased naive two-way FE |\n| `10_rdd_regression_discontinuity.do` | Sharp regression discontinuity (`rdrobust`): local-polynomial estimate + robust CI, bandwidth selection, `rdplot`, manipulation test (`rddensity`) |\n| `11_synthetic_control.do` | Synthetic control (`synth`): donor-weighted counterfactual, treated-vs-synthetic path, placebo-based inference (`synth_runner`) |\n| `12_dynamic_panel_gmm.do` | Dynamic panel GMM: Arellano–Bond (`xtabond`) \u0026 Blundell–Bond system GMM (`xtdpdsys`); pooled OLS/FE bracket the truth, AR(2) validity test |\n\nSupporting folders: `books/` (the two typeset PDFs — the Reference and the Guide), `packages/`\n(dependency installer + guide + manifest), `output/` (figures, tables,\ndatasets), `data/` (downloaded public data). Each script also writes a run log to a local\n`logs/` folder (created automatically at run time).\n\n## Outputs (`output/`)\n- `02_collapse_table.tex` — collapse summary (mean/SD price \u0026 mpg by car origin)\n- `03_regression_table.tex`, `.csv` — publication regression table (esttab)\n- `04_did_table.tex` — DiD estimates across three specifications\n- `04_event_study.png` — dynamic treatment-effect path (flat pre-trend, rising lags)\n- `07_iv_table.tex` — OLS vs 2SLS vs IV+FE (IV recovers the true slope; OLS biased)\n- `07_iv_diag_table.tex` — IV diagnostics (first-stage F, Durbin–Wu–Hausman, overid)\n- `08_logit_pr.png` — predicted P(foreign) across mpg\n- `09_csdid_event.png` — Callaway–Sant'Anna event study (pre vs post)\n- `09_csdid_table.tex` — overall ATT (Callaway–Sant'Anna) vs naive two-way FE\n- `10_rdplot.png` — annotated sharp RD plot (binned means + local fit each side; grey double-arrow\n  just right of the cutoff = the `rdrobust` jump τ̂=1.938 with limit dots; dashed control-fit\n  counterfactual; cutoff marked)\n- `10_rd_table.tex` — RD estimate + robust CI + manipulation-test p\n- `11_synth_path.png`, `11_synth_placebo.png` — synthetic control: treated vs synthetic, and placebo gaps\n- `12_dynpanel_table.tex` — dynamic panel: persistence ρ across pooled OLS, within FE, difference GMM, system GMM\n- `01_*`, `03_rvfplot`, `05_tsline`, `05_acf`, `06_unrate` — figures\n- `*.dta` — saved datasets (incl. `fred_unrate.dta`)\n\n## Data provenance\n- **U.S. unemployment rate (FRED: `UNRATE`)** — `06_import_public_data.do` pulls this **live** from\n  FRED's API using Stata's native `import fred` (941 monthly observations, 1948-01 through 2026-05;\n  public domain). The importer needs a free key, set once with `set fredkey YOUR_KEY, permanently` — the\n  key lives in user's Stata config, **NOT in this repo**. With no key configured the script falls back to\n  `data/UNRATE.csv` (the same series, downloaded from FRED via `curl` on 2026-06-23).\n- All other data are either Stata's bundled `auto` dataset (`sysuse auto`) or simulated\n  in-script with a fixed `set seed` (so results are fully reproducible).\n\n## Python/R → Stata cheat sheet\n| Task | pandas / R | Stata |\n|---|---|---|\n| Load CSV | `pd.read_csv` / `read_csv` | `import delimited \"f.csv\", varnames(1) clear` |\n| Inspect | `df.head()` / `head()` | `list in 1/5`, `describe`, `codebook` |\n| New column | `df[\"x\"]=...` / `mutate` | `generate x = ...` (`replace` to overwrite) |\n| Group aggregate | `groupby().agg` | `collapse (mean)... , by(g)` or `egen ..., by(g)` |\n| Reshape | `pivot`/`melt` / `pivot_longer` | `reshape wide`/`reshape long` |\n| Join | `merge` | `merge 1:1 key using \"f.dta\"` |\n| OLS robust | `smf.ols(...).fit(cov_type=\"HC1\")` / `feols` | `regress y x, vce(robust)` |\n| Fixed effects | `PanelOLS` / `feols(... | fe)` | `xtreg ..., fe` or `reghdfe ..., absorb()` |\n| Marginal effects | `.get_margeff()` / `margins` | `margins, dydx(x)` |\n| Lag | `.shift()` / `lag()` | `L.x` (after `tsset`) |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduc-v-le%2Fstata-empirical-methods","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduc-v-le%2Fstata-empirical-methods","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduc-v-le%2Fstata-empirical-methods/lists"}