https://github.com/zerotonin/digimuh
Heat-stress repeatability pipeline for dairy cows: broken-stick breakpoints, profile-RSS CIs, and ICC(1,1) with stackable corrections β no cows π were harmed, though several were significantly digitised πΎ.
https://github.com/zerotonin/digimuh
animal-behaviour broken-stick-regression dairy-cattle heat-stress intraclass-correlation open-science precision-livestock-farming python repeatability reproducible-research time-series-analysis
Last synced: 21 days ago
JSON representation
Heat-stress repeatability pipeline for dairy cows: broken-stick breakpoints, profile-RSS CIs, and ICC(1,1) with stackable corrections β no cows π were harmed, though several were significantly digitised πΎ.
- Host: GitHub
- URL: https://github.com/zerotonin/digimuh
- Owner: zerotonin
- Created: 2026-03-30T02:30:49.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-05-26T00:34:42.000Z (21 days ago)
- Last Synced: 2026-05-26T02:26:35.197Z (21 days ago)
- Topics: animal-behaviour, broken-stick-regression, dairy-cattle, heat-stress, intraclass-correlation, open-science, precision-livestock-farming, python, repeatability, reproducible-research, time-series-analysis
- Language: Python
- Homepage: https://zerotonin.github.io/digimuh/
- Size: 3.94 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# DigiMuh
[](https://github.com/zerotonin/digimuh/actions/workflows/tests.yml)
[](https://zerotonin.github.io/digimuh/)
[](https://github.com/zerotonin/digimuh/releases)
[](https://www.python.org/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/astral-sh/ruff)
[](https://doi.org/10.5281/zenodo.20389795)
[](https://doi.org/10.5281/zenodo.20387255)
> **π§ Repository under construction** β core ingestion pipeline is functional;
> analysis modules, views, and query helpers are coming next.
**DigiMuh** consolidates ~8.9 GB of heterogeneous dairy-cow CSV sensor data into
a single normalised SQLite database. The data spans 3.5 years (April 2021 β
September 2024) of continuous monitoring from multiple on-farm systems:
| System | What it measures |
|---|---|
| **smaXtec** bolus | Rumen temperature, pH, activity, motility, rumination, water intake, estrus/calving indices |
| **smaXtec** barn sensors | Barn temperature, humidity, THI |
| **HerdePlus** | Milking events, MLP test-day results, calving/lactation records |
| **HerdePlus** diseases | Health events and diagnoses |
| **Gouna** | Respiration frequency |
| **BCS** | Body condition scores |
| **LoRaWAN** | Environmental sensor battery/current |
| **HOBO** | Weather station (temperature, humidity, solar radiation, wind, wetness) |
| **DWD** | German Weather Service THI and enthalpy |
The database uses a **star schema**: four dimension tables (`animals`, `sensors`,
`barns`, `source_files`) and twelve fact tables, connected by integer foreign
keys. Every row carries a `file_id` for full provenance tracing back to the
original CSV.
See [`docs/database_structure.md`](docs/database_structure.md) for the full
schema and [`docs/column_dictionary.md`](docs/column_dictionary.md) for a
description of every column.
## Installation
```bash
# Clone the repository
git clone https://github.com/zerotonin/digimuh.git
cd digimuh
# Option A: conda (recommended)
conda env create -f environment.yml
conda activate digimuh
# Option B: pip
# reRandomStats is not on PyPI β install it from the v0.2.0 git tag first,
# then the editable install picks it up locally to satisfy the dependency.
pip install "git+https://github.com/zerotonin/reRandomStats.git@v0.2.0"
pip install -e ".[dev]"
```
## Quick start
```bash
# 1. Smoke test with 5 files per folder (~1 min)
digimuh-ingest /path/to/DigiMuh-Export --db cow_test.db --test-n 5
# 2. Full ingestion (~2β3 hours)
rm cow_test.db
digimuh-ingest /path/to/DigiMuh-Export --db cow.db
# 3. Query the database
python -c "
import sqlite3
con = sqlite3.connect('cow.db')
cur = con.execute('SELECT COUNT(*) FROM smaxtec_derived')
print(f'smaxtec_derived rows: {cur.fetchone()[0]:,}')
"
```
## Expected input layout
The ingestion script expects the DigiMuh CSV export directory to have this
structure:
```
DigiMuh-Export_2021-04-01_2024-09-30/
βββ output_allocations/
β βββ allocations.csv
βββ outputs_bcs/
β βββ {animal_id}_bcs_{date_range}.csv Γ715
βββ outputs_gouna/
β βββ {animal_id}_gouna_{date_range}.csv Γ91
βββ outputs_herdeplus_mlp_gemelk_kalbung/
β βββ {animal_id}_herdeplus_{date_range}.csv Γ965
βββ outputs_hobo/
β βββ hobo_exports_{date_range}.csv
βββ outputs_lorawan/
β βββ {sensor_name}_LoRaWAN_raw_{date_range}.csv Γ22
βββ outputs_smaxtec_barns/
β βββ {barn_name}_smaxtec_raw_{date_range}.csv Γ4
βββ outputs_smaxtec_derived/
β βββ {animal_id}_smaxtec_derived_{date_range}.csv Γ837
βββ outputs_smaxtec_events/
β βββ {animal_id}_events.csv Γ837
βββ outputs_smaxtec_water_intake/
β βββ {animal_id}_smaxtec_derived_{date_range}.csv Γ837
βββ herdeplus_diseases.csv
βββ outputs_dwd.csv
```
Animal IDs are 15-digit EU ear tag numbers. The entity identifier is always
the first underscore-delimited segment of each filename.
## CLI reference
```
digimuh-ingest [-h] [--db DB] [--chunk-size N] [--verbose] [--test-n N] root_dir
```
| Argument | Description |
|---|---|
| `root_dir` | Root directory containing all CSV folders |
| `--db` | Output SQLite path (default: `cow.db`) |
| `--chunk-size` | Rows per INSERT batch (default: 50 000) |
| `--test-n N` | Only ingest first N files per folder |
| `--verbose`, `-v` | Print CREATE TABLE SQL and debug info |
## Running tests
```bash
python -m pytest
```
## Analysis pipeline
After ingestion, five analysis scripts are available as CLI commands. Each
creates analysis views on first run, queries the database, and writes results
(CSV data + figures) to an output directory.
```bash
# Install with analysis dependencies
pip install -e ".[analysis]"
# 0. Individual heat stress thresholds (broken-stick regression)
digimuh-broken-stick --db cow.db --tierauswahl Tierauswahl.xlsx --out results/broken_stick
# 1. Subclinical ketosis risk β FPR Γ rumination Γ milk yield
digimuh-ketosis --db cow.db --out results/ketosis
# 2. Heat stress β rumen temp Γ THI Γ water Γ respiration
digimuh-heat --db cow.db --out results/heat
# 3. Digestive efficiency β motility Γ pH β milk composition (time-lagged)
digimuh-digestive --db cow.db --out results/digestive
# 4. Circadian disruption β 24h Fourier decomposition as welfare marker
digimuh-circadian --db cow.db --out results/circadian
# 5. Motility entropy β rumen HRV analogue via information theory
digimuh-entropy --db cow.db --out results/entropy
```
Each script writes:
- A CSV of the extracted features (for further analysis in R, Python, etc.)
- Publication-ready SVG + PNG figures
- A JSON summary of key results (where applicable)
See [`docs/database_structure.md`](docs/database_structure.md) for the SQL view
definitions that power these analyses.
## Roadmap
- [x] CSV β SQLite ingestion with star schema
- [x] SQL views for analysis (daily summaries + cross-table joins)
- [x] Analysis: individual heat stress thresholds (broken-stick regression)
- [x] Analysis: subclinical ketosis detection (FPR + RF classifier)
- [x] Analysis: heat stress multi-sensor fusion
- [x] Analysis: digestive efficiency (motilityβpH coupling)
- [x] Analysis: circadian rhythm disruption index
- [x] Analysis: motility pattern entropy (novel)
- [ ] Data validation and quality-check reports
- [ ] Parallelised entropy computation for full dataset
- [ ] Sphinx documentation on GitHub Pages
## Authors
**Bart R. H. Geurten** β Department of Zoology, University of Otago
## Citation
If you use DigiMuh in your research, please cite both DigiMuh and the
`reRandomStats` statistics toolkit it consumes for breakpoint detection
(broken-stick regression, Davies / Pseudo-Score tests, 4-parameter
Hill fit) and FDR correction. The version you used should match the
DOI you cite; full metadata for DigiMuh is in [`CITATION.cff`](CITATION.cff)
and on the GitHub repo's *Cite this repository* button.
> Geurten, B. R. H. (2026). *DigiMuh: Dairy-cow sensor data ingestion
> and heat-stress analysis pipeline* (Version 1.0.0) [Software].
> Zenodo. https://doi.org/10.5281/zenodo.20389795
```bibtex
@software{geurten_digimuh_v100,
author = {Geurten, Bart R. H.},
title = {{DigiMuh}: Dairy-cow sensor data ingestion and heat-stress
analysis pipeline},
year = {2026},
version = {1.0.0},
doi = {10.5281/zenodo.20389795},
url = {https://github.com/zerotonin/digimuh},
license = {MIT},
}
@software{geurten_rerandomstats_v020,
author = {Geurten, Bart R. H.},
title = {{reRandomStats}: Re-randomisation Statistics Toolkit},
year = {2026},
version = {0.2.0},
doi = {10.5281/zenodo.20387255},
url = {https://github.com/zerotonin/reRandomStats},
license = {MIT},
}
```
> **Note for Elsevier submissions:** Elsevier Editorial Manager does not
> parse `@software`; convert to `@misc` at submission time per the lab
> BibTeX convention.
## License
MIT