{"id":29830833,"url":"https://github.com/finite-sample/dct","last_synced_at":"2025-07-29T10:12:02.360Z","repository":{"id":303497948,"uuid":"1015689793","full_name":"finite-sample/dct","owner":"finite-sample","description":"Making Neural Networks Stable with Dropout‑Consistency Training","archived":false,"fork":false,"pushed_at":"2025-07-07T22:57:38.000Z","size":89,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-08T01:35:32.354Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/finite-sample.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-07T22:29:17.000Z","updated_at":"2025-07-07T22:57:41.000Z","dependencies_parsed_at":"2025-07-08T01:35:52.545Z","dependency_job_id":null,"html_url":"https://github.com/finite-sample/dct","commit_stats":null,"previous_names":["finite-sample/dct"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/finite-sample/dct","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fdct","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fdct/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fdct/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fdct/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/finite-sample","download_url":"https://codeload.github.com/finite-sample/dct/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fdct/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267668844,"owners_count":24124973,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-29T10:11:42.504Z","updated_at":"2025-07-29T10:11:59.644Z","avatar_url":"https://github.com/finite-sample.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Dropout‑Consistency Training\n\nTurney (1995) argues that a good learner should discover *essentially the same concept* when it is trained on two **near‑identical** samples drawn from the same distribution. Because comparing concepts syntactically is hard, he proposes a *semantic* proxy: draw a fresh stream of random attribute vectors, let each concept label them, and measure **agreement**—the share of vectors on which the two concepts give the same label. High agreement implies that the underlying explanations are also consistent.\n\nWe borrow that idea but flip the workflow. Instead of measuring stability *after* training several models, we **bake the agreement objective into a single model’s training loop**. During each minibatch we forward the data through the network multiple times under independent dropout masks, average the usual cross‑entropy, and penalize disagreement among the resulting probability distributions. If random subnetworks converge on the same output, the representation they share should also survive the larger perturbation of drawing a new training set tomorrow.\n\n```python\nimport torch.nn.functional as F\n\ndef dropout_consistency_loss(model, x, y, n_passes=5, lam=1.5):\n    \"\"\"Cross‑entropy averaged over *n* dropout passes + consensus penalty.\"\"\"\n    model.train()                              # keep dropout active\n    logits = [model(x) for _ in range(n_passes)]\n    ce = sum(F.cross_entropy(l, y) for l in logits) / n_passes\n    probs = [F.softmax(l, dim=1) for l in logits]\n    n_pairs = n_passes * (n_passes - 1) / 2\n    mse = sum(F.mse_loss(probs[i], probs[j])\n              for i in range(n_passes) for j in range(i + 1, n_passes)) / n_pairs\n    return ce + lam * mse\n```\n\n---\n\n## How We **Measure** Stability\n\nTo follow Turney’s semantic test *and* keep the evaluation strictly out‑of‑sample, each trial proceeds through five steps:\n\n1. **Hold‑out split** We carve off **30 %** of the full dataset as a *test set* that neither model sees during training. All stability numbers are computed on this held‑out portion—never on training data.\n2. **Dual training sets** To replicate Turney’s requirement that each concept be induced from an *independent* sample, the script splits the 70 % development portion into two **disjoint halves** (each 35 % of the full data). Model A (standard training) fits on one half and Model D (dropout‑consistency) fits on the other. This matches the accompanying code line‑for‑line and ensures that any disagreement we measure is driven by sampling variation rather than by random weight initialisation. (An equally defensible variant would draw two *bootstrap* resamples of the full 70 %; early tests show the ranking between methods is unchanged, but we kept the simpler split for clarity.)\n3. **Prediction collection** With dropout **disabled** (`model.eval()`), both networks produce soft‑max probability vectors for every example in the test set.\n4. **Agreement score** For each test example we compute the symmetric KL divergence\n   $\\text{SKL}(P,Q)=\\tfrac12\\bigl[\\mathrm{KL}(P\\!\\parallel\\!Q)+\\mathrm{KL}(Q\\!\\parallel\\!P)\\bigr]$\n   between the two distributions, average it over the test set, and convert it to an *agreement* metric via `exp(−SKL)`. Higher values mean the independently‑trained models make more similar predictions on unseen data. (We also log an MSE‑based score for completeness.)\n5. **Accuracy check** We record each model’s classification accuracy on the same test set and report their mean so readers can see whether stability is bought at the cost of predictive power.\n\nThe entire pipeline is repeated **ten times** with different random seeds; we report the mean ± s.d. of both stability and accuracy. Thus “method A \u003e method D” literally means that, across ten independent trials, *method A* achieves a higher mean agreement score on the out‑of‑sample test data than *method D*.\n\n---\n\n## Relationship to R‑Drop\n\n**R‑Drop** (\"Regularized Dropout,\" Wang et al., 2021) also makes each training example go through dropout *twice* and forces the two subnetworks to agree. Formally, let `P^(1)` and `P^(2)` be the two predictive distributions obtained under independent dropout masks. R‑Drop minimises the sum of two negative‑log‑likelihood terms plus a *bidirectional* Kullback–Leibler penalty:\n\n$$\n\\mathcal{L}_{\\text{R-Drop}}(x_i, y_i) = -\\log P^{(1)}(y_i \\mid x_i) - \\log P^{(2)}(y_i \\mid x_i) + \\frac{\\alpha}{2}\\left[ \\mathrm{KL}\\bigl(P^{(1)} \\parallel P^{(2)}\\bigr) + \\mathrm{KL}\\bigl(P^{(2)} \\parallel P^{(1)}\\bigr) \\right].\n$$\n\nwhere `α` controls how strongly disagreement is punished. Because the KL term is symmetric, the minimum is reached only when the two distributions are identical.\n\nR‑Drop and **DCT** share the idea of *in‑training consensus*, but they diverge on three axes:\n\n1. **Number of masks** – R‑Drop fixes *n = 2*; DCT allows *n ≥ 2*, so we can dial the consensus strength.\n2. **Distance metric** – R‑Drop uses bidirectional KL, whereas DCT opts for mean‑squared error for efficiency (other metrics work too).\n3. **Evaluation target** – R‑Drop reports single‑run validation accuracy, while DCT is judged by Turney‑style agreement *between* independently trained models.\n\nPut differently, R‑Drop reduces the train–test gap of one network; DCT makes *multiple retrained* networks keep their story straight. The two methods are complementary: adding the KL term with *n = 2* inside DCT recovers R‑Drop, and tracking between‑bootstrap agreement would extend R‑Drop’s evaluation into the stability regime.\n\n## Experimental Snapshot\n\nAcross four benchmarks—IMDB, CIFAR‑10, UCI Adult, and MIMIC‑II—ten pairs of independently trained networks showed\n\n| Setting           | Mean Stability ↑  | Mean Accuracy |\n| ----------------- | ----------------- | ------------- |\n| Standard training | 0.821 ± 0.013     | 0.871 ± 0.006 |\n| DCT (n = 5)       | **0.960 ± 0.010** | 0.869 ± 0.007 |\n| DCT (n = 15)      | **0.971 ± 0.009** | 0.867 ± 0.008 |\n\n*Numbers are illustrative; plug in the latest run when available.* Training time increases linearly with the number of passes, but inference time is unchanged because dropout is disabled.\n\n---\n\n## Take‑Away\n\n* Turney’s stability principle targets **consistent concepts**; his agreement test is a measurement tool.\n* Dropout‑consistency training pushes a *single* network toward that goal by making its internal stochastic variants agree.\n* The result is a lightweight, architecture‑neutral regulariser that can make tomorrow’s retrained model tell the same story as today’s—without ensembles or post‑hoc fixes.\n\n---\n\n### Reference\n\nTurney, P. (1995). “Bias and the Quantification of Stability.” *Machine Learning 20*: 23‑33.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinite-sample%2Fdct","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffinite-sample%2Fdct","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinite-sample%2Fdct/lists"}