{"id":50541691,"url":"https://github.com/bringhurst/stt","last_synced_at":"2026-06-03T20:30:44.198Z","repository":{"id":359029952,"uuid":"1243110992","full_name":"bringhurst/stt","owner":"bringhurst","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-29T04:51:22.000Z","size":811,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-05-29T06:23:06.726Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bringhurst.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-19T03:59:44.000Z","updated_at":"2026-05-29T04:51:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/bringhurst/stt","commit_stats":null,"previous_names":["bringhurst/stt"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bringhurst/stt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bringhurst%2Fstt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bringhurst%2Fstt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bringhurst%2Fstt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bringhurst%2Fstt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bringhurst","download_url":"https://codeload.github.com/bringhurst/stt/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bringhurst%2Fstt/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33878990,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-03T02:00:06.370Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-03T20:30:43.485Z","updated_at":"2026-06-03T20:30:44.190Z","avatar_url":"https://github.com/bringhurst.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Surface Tension Transformers\n\nMinimal, CPU-friendly experiments for testing whether small geometric constraints improve Transformer representation geometry.\n\nThis repo intentionally starts tiny: a synthetic next-token task, a small Transformer encoder, and measurable regularizers for attention head diversity, representation repulsion, and sparse activations.\n\nCurrent finding: LoRA fine-tuning `Qwen/Qwen2.5-0.5B` with representation repulsion improves held-out representation geometry on WikiText-2. In the confirmed run, `repulsion=2.0` improved effective rank by about `+22.8%` and isotropy by about `-36.7%` with about `+3.1%` eval loss. Sampled gossip self-stabilization at `tau=0.5`, `gossip=5` now matches most of the short-run geometry and conflict-task stability gain with lower language-model and task-B penalties.\n\n## Setup\n\n```bash\npoetry install\n```\n\n## Run Tests\n\n```bash\npoetry run pytest\n```\n\n## Code Quality\n\n```bash\npoetry run ruff check .\npoetry run ty check\n```\n\n## Run Experiments\n\n```bash\npoetry run stt-experiment --steps 80 --variants baseline diversity repulsion sparse combined\n```\n\n## Run LoRA Experiments\n\nUse the tiny GPT-2 checkpoint for a quick wiring smoke test:\n\n```bash\npoetry run stt-lora --model sshleifer/tiny-gpt2 --steps 5 --variants baseline combined\n```\n\nUse Qwen 0.5B for the first meaningful small-model run on Apple Silicon:\n\n```bash\npoetry run stt-lora \\\n  --model Qwen/Qwen2.5-0.5B \\\n  --device auto \\\n  --steps 100 \\\n  --max-length 128 \\\n  --batch-size 1 \\\n  --grad-accum 8 \\\n  --variants baseline diversity combined\n```\n\nFor dose-response checks, override regularizer weights explicitly:\n\n```bash\npoetry run stt-lora \\\n  --model Qwen/Qwen2.5-0.5B \\\n  --steps 20 \\\n  --variants repulsion \\\n  --repulsion-weight 1.0\n```\n\nRun gossip self-stabilization with sampled thresholded anti-consensus pressure. The current useful setting is lower-threshold and stronger-weighted than the default because `tau=0.85` produced a very small raw loss on Qwen hidden states.\n\n```bash\npoetry run stt-lora \\\n  --model Qwen/Qwen2.5-0.5B \\\n  --device auto \\\n  --steps 100 \\\n  --max-length 96 \\\n  --batch-size 1 \\\n  --eval-batches 12 \\\n  --grad-accum 4 \\\n  --learning-rate 2e-4 \\\n  --variants baseline repulsion gossip \\\n  --sweep gossip=5.0 \\\n  --repulsion-weight 2.0 \\\n  --gossip-tau 0.5 \\\n  --gossip-k 8 \\\n  --max-gossip-vectors 192 \\\n  --seeds 0 1 2 \\\n  --text-file data/wikitext2_corpus.txt \\\n  --output-dir runs\n```\n\nThe current sweep parser supports one swept parameter at a time. Use fixed overrides like `--gossip-tau`, `--gossip-k`, and `--max-gossip-vectors` for the other gossip settings.\n\nIn the 100-step, 3-seed WikiText check, `gossip tau=0.5 weight=5` improved effective rank by `+4.67%` and isotropy by `-11.21%` with `+1.73%` eval loss. Fixed `repulsion=2.0` improved effective rank by `+5.96%` and isotropy by `-12.84%` with `+3.45%` eval loss.\n\nRun multi-seed sweeps and persist results:\n\n```bash\npoetry run stt-lora \\\n  --model Qwen/Qwen2.5-0.5B \\\n  --steps 100 \\\n  --eval-batches 16 \\\n  --seeds 0 1 2 \\\n  --variants baseline repulsion \\\n  --sweep repulsion=0,0.1,0.3,1.0 \\\n  --output-dir runs\n```\n\nAnalyze a saved run:\n\n```bash\npoetry run stt-analyze runs/\u003ctimestamp\u003e/results.json\n```\n\nAnalyze multiple continual runs as one paired seed table:\n\n```bash\npoetry run stt-analyze \\\n  runs/\u003ctimestamp-a\u003e/results.json \\\n  runs/\u003ctimestamp-b\u003e/results.json\n```\n\nConfirmed WikiText geometry command:\n\n```bash\nHF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1 poetry run stt-lora \\\n  --model Qwen/Qwen2.5-0.5B \\\n  --device auto \\\n  --steps 300 \\\n  --max-length 128 \\\n  --batch-size 1 \\\n  --eval-batches 32 \\\n  --grad-accum 4 \\\n  --learning-rate 2e-4 \\\n  --variants baseline repulsion \\\n  --sweep repulsion=1.5,2.0,2.5 \\\n  --seeds 0 1 2 3 4 \\\n  --text-file data/wikitext2_corpus.txt \\\n  --output-dir runs\n```\n\n## Run Continual-Learning Experiments\n\nTrain one LoRA adapter on task A, then continue training on task B and measure task-A backward transfer:\n\n```bash\npoetry run stt-continual \\\n  --model Qwen/Qwen2.5-0.5B \\\n  --device auto \\\n  --phase-steps 150 \\\n  --max-length 128 \\\n  --batch-size 1 \\\n  --eval-batches 16 \\\n  --grad-accum 4 \\\n  --variants baseline repulsion \\\n  --sweep repulsion=1.5,2.0 \\\n  --seeds 0 1 2 \\\n  --task-a-file data/wikitext2_task_a.txt \\\n  --task-b-file data/wikitext2_task_b.txt \\\n  --output-dir runs\n```\n\nFor a stronger interference test, use the synthetic conflicting-facts pair:\n\n```bash\npoetry run stt-continual \\\n  --model Qwen/Qwen2.5-0.5B \\\n  --device auto \\\n  --phase-steps 150 \\\n  --max-length 128 \\\n  --batch-size 1 \\\n  --eval-batches 16 \\\n  --grad-accum 4 \\\n  --variants baseline repulsion \\\n  --sweep repulsion=1.0,1.5,2.0 \\\n  --seeds 0 1 2 \\\n  --task-a-file data/conflict_task_a.txt \\\n  --task-b-file data/conflict_task_b.txt \\\n  --output-dir runs\n```\n\nCurrent conflict-task result: `gossip tau=0.5 weight=5` improved `backward_transfer_a` more reliably than fixed `repulsion=2.0` while preserving B-task learning. Over seeds `0 1 2 3 4 5` with `phase_steps=150`, gossip improved `backward_transfer_a` by about `-12.25%`, changed `learning_b` by about `-0.06%`, and changed `eval_b_after_b` by about `+0.50%`; repulsion changed `backward_transfer_a` by about `+2.95%`, `learning_b` by about `-0.67%`, and `eval_b_after_b` by about `+5.86%`.\n\nSmall neighborhood checks around the current gossip setting found weaker results for `tau=0.4` and `gossip_weight=7`, so `tau=0.5`, `gossip_weight=5` remains the current best setting.\n\nSee `docs/continual-tasks.md` for task-pair details.\n\nThe repo also includes a second conflict family, `data/conflict2_task_a.txt` and `data/conflict2_task_b.txt`, that conflicts on numeric quotas, routes, permissions, windows, and cause/action rules instead of profile attributes. Use it to test whether the current gossip result transfers beyond the first synthetic template.\n\nCombined 6-seed conflict2 results show partial transfer: gossip improved `backward_transfer_a` by `-6.74%` with `learning_b +0.04%`, while fixed repulsion improved `backward_transfer_a` by `-5.40%` with `learning_b +0.05%`. This is positive for transfer, but not a clean gossip-over-repulsion win because fixed repulsion had slightly better `eval_b_after_b` and `retention_ratio`.\n\n## Run Accretion Experiments\n\nGenerate A/B/C accretion task files:\n\n```bash\npoetry run python -m stt.accretion_data --output-dir data --num-entities 256 --seed 0\n```\n\nRun the A→B_related→C_conflict scaffold. The generator also writes\n`data/accretion_task_b_rehearsal.txt` as a positive-control B condition with exact\nA fact rehearsal plus related context.\n\n```bash\npoetry run stt-accretion \\\n  --model sshleifer/tiny-gpt2 \\\n  --device cpu \\\n  --phase-steps 2 \\\n  --max-length 64 \\\n  --batch-size 1 \\\n  --eval-batches 2 \\\n  --grad-accum 1 \\\n  --variants baseline gossip \\\n  --sweep gossip=1.0 \\\n  --gossip-tau 0.5 \\\n  --task-a-file data/accretion_task_a.txt \\\n  --task-b-file data/accretion_task_b_related.txt \\\n  --task-c-file data/accretion_task_c_conflict.txt \\\n  --output-dir runs\n```\n\nSee `docs/accretion.md` for metric interpretation and the Qwen command. See\n`docs/routed-accretion.md` for the fixed routed-update experiment that evaluates\nthe predeclared `A + 0.9B + 0.25C` composition against blind sequential training.\nSee `docs/oracle-composition.md` for oracle scalar composition and the\n`stt-oracle-route` group-routing kill test.\n\nCurrent Qwen accretion status: `B_related` is near-neutral and shows gossip preserving A better than baseline, while `B_rehearsal` is the positive-control condition and produces positive baseline accretion. In the 6-seed rehearsal run, baseline `accretion_a_after_b=+0.1514`, gossip `+0.1576`, and repulsion `+0.1437`.\n\nCurrent routed-update status: fixed `A + 0.9B + 0.25C` is a strong A/B retention baseline, not a Pareto win. With corrected phase-local C-learning, it wins A/B interference on `6/6` seeds across `B_related`, `B_related_strong`, and `B_rehearsal`; it wins accretion on `6/6`, `6/6`, and `2/6` respectively; C-learning preservation is `0/6` in all three conditions because the published adapter only applies `0.25C`. See `docs/routed-accretion.md` for run paths and metrics.\n\nThe corrected local route sweep over `B=0.85..1.00` and `C=0.20..0.30` moves best frontier routes to `C=0.30` for all three conditions, but still has `0/6` C-learning preservation. The next route calibration should test larger C scales.\n\nThe first larger-C `B_related` calibration shows the best scalar frontier route near `0.90B+0.40C`, while C preservation only appears near `C=0.90..1.00`, where the route collapses back toward blind sequential and loses most A/B-retention benefit. Splitting C by LoRA tensor family improves frontier only slightly (`0.90B+0.60C_A+0.40C_B`, frontier `+0.5731`) and layer-band C routing improves it slightly again (`0.90B+0.25C_E+0.40C_M+0.60C_L`, frontier `+0.5796`), but both still have `0/6` C preservation. The fixed-route family currently shows a hard A/B-retention versus C-learning tradeoff rather than a Pareto solution.\n\nThe oracle group-route kill tests are also negative for C preservation. Layer/block oracle routing on `B_related` (`runs/20260522T171513300150Z/results.json`) wins A/B interference on `3/3` seeds but preserves C learning on `0/3`; module routing on seed `0` (`runs/20260522T175056822104Z/results.json`) improves frontier over layer routing but still misses C preservation. A cheaper tensor-level seed-0 kill test (`runs/20260522T195930408676Z/results.json`, `eval_batches=4`, binary C scales after the full tensor run timed out) also wins A/B retention but misses C preservation. Post-hoc adapter routing should be considered exhausted for this setup; the next productive direction is training-time constraints, compatibility regularization, or replay-lite.\n\n`forgetting_a` is still emitted as a compatibility alias for `backward_transfer_a`.\n\nThe command prints final task loss plus representation metrics:\n\n- `head_similarity`: lower means attention heads are less redundant.\n- `effective_rank`: higher means hidden states use more dimensions.\n- `isotropy`: lower means less directional collapse.\n- `active_fraction`: lower means sparser activations.\n- `eval_diversity_loss`, `eval_repulsion_loss`, and `eval_sparse_loss`: raw unweighted STT components for checking whether a regularizer has a useful scale.\n- `eval_gossip_loss`: raw sampled thresholded gossip loss.\n\n## Design\n\nThe code is split into small modules:\n\n- `stt.model`: tiny Transformer with returned attention maps and hidden states.\n- `stt.losses`: STT regularizers.\n- `stt.metrics`: geometry metrics.\n- `stt.data`: deterministic synthetic sequence task.\n- `stt.experiment`: training loop and CLI.\n- `stt.lora_experiment`: LoRA fine-tuning CLI for pretrained causal LMs.\n- `stt.analyze`: baseline-relative summaries for persisted LoRA experiment records.\n- `stt.continual`: sequential A-then-B LoRA continual-learning experiments.\n- `stt.accretion`: sequential A-then-B-then-C compatibility experiments.\n- `stt.routed_accretion`: fixed routed-update A-then-B-then-C experiments.\n\nSee `docs/experiment-design.md` for the current research framing, metric interpretation, and Apple Silicon notes.\n\nThis is not intended to prove the full STT thesis. It is a first measurable scaffold: dream big, measure tiny.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbringhurst%2Fstt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbringhurst%2Fstt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbringhurst%2Fstt/lists"}