{"id":47973129,"url":"https://github.com/zerotonin/digimuh","last_synced_at":"2026-05-26T07:00:16.521Z","repository":{"id":347953323,"uuid":"1195736451","full_name":"zerotonin/digimuh","owner":"zerotonin","description":"Heat-stress repeatability pipeline for dairy cows: broken-stick breakpoints, profile-RSS CIs, and ICC(1,1) with stackable corrections — no cows 🐄 were harmed, though several were significantly digitised 💾.","archived":false,"fork":false,"pushed_at":"2026-05-26T00:34:42.000Z","size":4130,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-26T02:26:35.197Z","etag":null,"topics":["animal-behaviour","broken-stick-regression","dairy-cattle","heat-stress","intraclass-correlation","open-science","precision-livestock-farming","python","repeatability","reproducible-research","time-series-analysis"],"latest_commit_sha":null,"homepage":"https://zerotonin.github.io/digimuh/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zerotonin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-30T02:30:49.000Z","updated_at":"2026-05-26T00:34:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zerotonin/digimuh","commit_stats":null,"previous_names":["zerotonin/digimuh"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/zerotonin/digimuh","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zerotonin%2Fdigimuh","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zerotonin%2Fdigimuh/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zerotonin%2Fdigimuh/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zerotonin%2Fdigimuh/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zerotonin","download_url":"https://codeload.github.com/zerotonin/digimuh/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zerotonin%2Fdigimuh/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33508317,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T03:12:49.672Z","status":"ssl_error","status_checked_at":"2026-05-26T03:12:47.976Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["animal-behaviour","broken-stick-regression","dairy-cattle","heat-stress","intraclass-correlation","open-science","precision-livestock-farming","python","repeatability","reproducible-research","time-series-analysis"],"created_at":"2026-04-04T10:50:37.810Z","updated_at":"2026-05-26T07:00:16.515Z","avatar_url":"https://github.com/zerotonin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DigiMuh\n\n[![Tests](https://github.com/zerotonin/digimuh/actions/workflows/tests.yml/badge.svg)](https://github.com/zerotonin/digimuh/actions/workflows/tests.yml)\n[![Docs](https://github.com/zerotonin/digimuh/actions/workflows/docs.yml/badge.svg)](https://zerotonin.github.io/digimuh/)\n[![Release](https://github.com/zerotonin/digimuh/actions/workflows/release.yml/badge.svg)](https://github.com/zerotonin/digimuh/releases)\n[![Python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue.svg)](https://www.python.org/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20389795.svg)](https://doi.org/10.5281/zenodo.20389795)\n[![Uses reRandomStats](https://img.shields.io/badge/uses-reRandomStats%20v0.2.0-009E73.svg)](https://doi.org/10.5281/zenodo.20387255)\n\n\u003e **🚧 Repository under construction** — core ingestion pipeline is functional;\n\u003e analysis modules, views, and query helpers are coming next.\n\n**DigiMuh** consolidates ~8.9 GB of heterogeneous dairy-cow CSV sensor data into\na single normalised SQLite database.  The data spans 3.5 years (April 2021 –\nSeptember 2024) of continuous monitoring from multiple on-farm systems:\n\n| System | What it measures |\n|---|---|\n| **smaXtec** bolus | Rumen temperature, pH, activity, motility, rumination, water intake, estrus/calving indices |\n| **smaXtec** barn sensors | Barn temperature, humidity, THI |\n| **HerdePlus** | Milking events, MLP test-day results, calving/lactation records |\n| **HerdePlus** diseases | Health events and diagnoses |\n| **Gouna** | Respiration frequency |\n| **BCS** | Body condition scores |\n| **LoRaWAN** | Environmental sensor battery/current |\n| **HOBO** | Weather station (temperature, humidity, solar radiation, wind, wetness) |\n| **DWD** | German Weather Service THI and enthalpy |\n\nThe database uses a **star schema**: four dimension tables (`animals`, `sensors`,\n`barns`, `source_files`) and twelve fact tables, connected by integer foreign\nkeys.  Every row carries a `file_id` for full provenance tracing back to the\noriginal CSV.\n\nSee [`docs/database_structure.md`](docs/database_structure.md) for the full\nschema and [`docs/column_dictionary.md`](docs/column_dictionary.md) for a\ndescription of every column.\n\n\n## Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/zerotonin/digimuh.git\ncd digimuh\n\n# Option A: conda (recommended)\nconda env create -f environment.yml\nconda activate digimuh\n\n# Option B: pip\n# reRandomStats is not on PyPI — install it from the v0.2.0 git tag first,\n# then the editable install picks it up locally to satisfy the dependency.\npip install \"git+https://github.com/zerotonin/reRandomStats.git@v0.2.0\"\npip install -e \".[dev]\"\n```\n\n\n## Quick start\n\n```bash\n# 1. Smoke test with 5 files per folder (~1 min)\ndigimuh-ingest /path/to/DigiMuh-Export --db cow_test.db --test-n 5\n\n# 2. Full ingestion (~2–3 hours)\nrm cow_test.db\ndigimuh-ingest /path/to/DigiMuh-Export --db cow.db\n\n# 3. Query the database\npython -c \"\nimport sqlite3\ncon = sqlite3.connect('cow.db')\ncur = con.execute('SELECT COUNT(*) FROM smaxtec_derived')\nprint(f'smaxtec_derived rows: {cur.fetchone()[0]:,}')\n\"\n```\n\n\n## Expected input layout\n\nThe ingestion script expects the DigiMuh CSV export directory to have this\nstructure:\n\n```\nDigiMuh-Export_2021-04-01_2024-09-30/\n├── output_allocations/\n│   └── allocations.csv\n├── outputs_bcs/\n│   └── {animal_id}_bcs_{date_range}.csv  ×715\n├── outputs_gouna/\n│   └── {animal_id}_gouna_{date_range}.csv  ×91\n├── outputs_herdeplus_mlp_gemelk_kalbung/\n│   └── {animal_id}_herdeplus_{date_range}.csv  ×965\n├── outputs_hobo/\n│   └── hobo_exports_{date_range}.csv\n├── outputs_lorawan/\n│   └── {sensor_name}_LoRaWAN_raw_{date_range}.csv  ×22\n├── outputs_smaxtec_barns/\n│   └── {barn_name}_smaxtec_raw_{date_range}.csv  ×4\n├── outputs_smaxtec_derived/\n│   └── {animal_id}_smaxtec_derived_{date_range}.csv  ×837\n├── outputs_smaxtec_events/\n│   └── {animal_id}_events.csv  ×837\n├── outputs_smaxtec_water_intake/\n│   └── {animal_id}_smaxtec_derived_{date_range}.csv  ×837\n├── herdeplus_diseases.csv\n└── outputs_dwd.csv\n```\n\nAnimal IDs are 15-digit EU ear tag numbers.  The entity identifier is always\nthe first underscore-delimited segment of each filename.\n\n\n## CLI reference\n\n```\ndigimuh-ingest [-h] [--db DB] [--chunk-size N] [--verbose] [--test-n N] root_dir\n```\n\n| Argument | Description |\n|---|---|\n| `root_dir` | Root directory containing all CSV folders |\n| `--db` | Output SQLite path (default: `cow.db`) |\n| `--chunk-size` | Rows per INSERT batch (default: 50 000) |\n| `--test-n N` | Only ingest first N files per folder |\n| `--verbose`, `-v` | Print CREATE TABLE SQL and debug info |\n\n\n## Running tests\n\n```bash\npython -m pytest\n```\n\n\n## Analysis pipeline\n\nAfter ingestion, five analysis scripts are available as CLI commands.  Each\ncreates analysis views on first run, queries the database, and writes results\n(CSV data + figures) to an output directory.\n\n```bash\n# Install with analysis dependencies\npip install -e \".[analysis]\"\n\n# 0. Individual heat stress thresholds (broken-stick regression)\ndigimuh-broken-stick --db cow.db --tierauswahl Tierauswahl.xlsx --out results/broken_stick\n\n# 1. Subclinical ketosis risk — FPR × rumination × milk yield\ndigimuh-ketosis --db cow.db --out results/ketosis\n\n# 2. Heat stress — rumen temp × THI × water × respiration\ndigimuh-heat --db cow.db --out results/heat\n\n# 3. Digestive efficiency — motility × pH → milk composition (time-lagged)\ndigimuh-digestive --db cow.db --out results/digestive\n\n# 4. Circadian disruption — 24h Fourier decomposition as welfare marker\ndigimuh-circadian --db cow.db --out results/circadian\n\n# 5. Motility entropy — rumen HRV analogue via information theory\ndigimuh-entropy --db cow.db --out results/entropy\n```\n\nEach script writes:\n- A CSV of the extracted features (for further analysis in R, Python, etc.)\n- Publication-ready SVG + PNG figures\n- A JSON summary of key results (where applicable)\n\nSee [`docs/database_structure.md`](docs/database_structure.md) for the SQL view\ndefinitions that power these analyses.\n\n\n## Roadmap\n\n- [x] CSV → SQLite ingestion with star schema\n- [x] SQL views for analysis (daily summaries + cross-table joins)\n- [x] Analysis: individual heat stress thresholds (broken-stick regression)\n- [x] Analysis: subclinical ketosis detection (FPR + RF classifier)\n- [x] Analysis: heat stress multi-sensor fusion\n- [x] Analysis: digestive efficiency (motility–pH coupling)\n- [x] Analysis: circadian rhythm disruption index\n- [x] Analysis: motility pattern entropy (novel)\n- [ ] Data validation and quality-check reports\n- [ ] Parallelised entropy computation for full dataset\n- [ ] Sphinx documentation on GitHub Pages\n\n\n## Authors\n\n**Bart R. H. Geurten** — Department of Zoology, University of Otago\n\n## Citation\n\nIf you use DigiMuh in your research, please cite both DigiMuh and the\n`reRandomStats` statistics toolkit it consumes for breakpoint detection\n(broken-stick regression, Davies / Pseudo-Score tests, 4-parameter\nHill fit) and FDR correction.  The version you used should match the\nDOI you cite; full metadata for DigiMuh is in [`CITATION.cff`](CITATION.cff)\nand on the GitHub repo's *Cite this repository* button.\n\n\u003e Geurten, B. R. H. (2026). *DigiMuh: Dairy-cow sensor data ingestion\n\u003e and heat-stress analysis pipeline* (Version 1.0.0) [Software].\n\u003e Zenodo. https://doi.org/10.5281/zenodo.20389795\n\n```bibtex\n@software{geurten_digimuh_v100,\n  author  = {Geurten, Bart R. H.},\n  title   = {{DigiMuh}: Dairy-cow sensor data ingestion and heat-stress\n             analysis pipeline},\n  year    = {2026},\n  version = {1.0.0},\n  doi     = {10.5281/zenodo.20389795},\n  url     = {https://github.com/zerotonin/digimuh},\n  license = {MIT},\n}\n\n@software{geurten_rerandomstats_v020,\n  author  = {Geurten, Bart R. H.},\n  title   = {{reRandomStats}: Re-randomisation Statistics Toolkit},\n  year    = {2026},\n  version = {0.2.0},\n  doi     = {10.5281/zenodo.20387255},\n  url     = {https://github.com/zerotonin/reRandomStats},\n  license = {MIT},\n}\n```\n\n\u003e **Note for Elsevier submissions:** Elsevier Editorial Manager does not\n\u003e parse `@software`; convert to `@misc` at submission time per the lab\n\u003e BibTeX convention.\n\n## License\n\nMIT\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzerotonin%2Fdigimuh","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzerotonin%2Fdigimuh","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzerotonin%2Fdigimuh/lists"}