An open API service indexing awesome lists of open source software.

https://github.com/zerotonin/digimuh

Heat-stress repeatability pipeline for dairy cows: broken-stick breakpoints, profile-RSS CIs, and ICC(1,1) with stackable corrections β€” no cows πŸ„ were harmed, though several were significantly digitised πŸ’Ύ.
https://github.com/zerotonin/digimuh

animal-behaviour broken-stick-regression dairy-cattle heat-stress intraclass-correlation open-science precision-livestock-farming python repeatability reproducible-research time-series-analysis

Last synced: 21 days ago
JSON representation

Heat-stress repeatability pipeline for dairy cows: broken-stick breakpoints, profile-RSS CIs, and ICC(1,1) with stackable corrections β€” no cows πŸ„ were harmed, though several were significantly digitised πŸ’Ύ.

Awesome Lists containing this project

README

          

# DigiMuh

[![Tests](https://github.com/zerotonin/digimuh/actions/workflows/tests.yml/badge.svg)](https://github.com/zerotonin/digimuh/actions/workflows/tests.yml)
[![Docs](https://github.com/zerotonin/digimuh/actions/workflows/docs.yml/badge.svg)](https://zerotonin.github.io/digimuh/)
[![Release](https://github.com/zerotonin/digimuh/actions/workflows/release.yml/badge.svg)](https://github.com/zerotonin/digimuh/releases)
[![Python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20389795.svg)](https://doi.org/10.5281/zenodo.20389795)
[![Uses reRandomStats](https://img.shields.io/badge/uses-reRandomStats%20v0.2.0-009E73.svg)](https://doi.org/10.5281/zenodo.20387255)

> **🚧 Repository under construction** β€” core ingestion pipeline is functional;
> analysis modules, views, and query helpers are coming next.

**DigiMuh** consolidates ~8.9 GB of heterogeneous dairy-cow CSV sensor data into
a single normalised SQLite database. The data spans 3.5 years (April 2021 –
September 2024) of continuous monitoring from multiple on-farm systems:

| System | What it measures |
|---|---|
| **smaXtec** bolus | Rumen temperature, pH, activity, motility, rumination, water intake, estrus/calving indices |
| **smaXtec** barn sensors | Barn temperature, humidity, THI |
| **HerdePlus** | Milking events, MLP test-day results, calving/lactation records |
| **HerdePlus** diseases | Health events and diagnoses |
| **Gouna** | Respiration frequency |
| **BCS** | Body condition scores |
| **LoRaWAN** | Environmental sensor battery/current |
| **HOBO** | Weather station (temperature, humidity, solar radiation, wind, wetness) |
| **DWD** | German Weather Service THI and enthalpy |

The database uses a **star schema**: four dimension tables (`animals`, `sensors`,
`barns`, `source_files`) and twelve fact tables, connected by integer foreign
keys. Every row carries a `file_id` for full provenance tracing back to the
original CSV.

See [`docs/database_structure.md`](docs/database_structure.md) for the full
schema and [`docs/column_dictionary.md`](docs/column_dictionary.md) for a
description of every column.

## Installation

```bash
# Clone the repository
git clone https://github.com/zerotonin/digimuh.git
cd digimuh

# Option A: conda (recommended)
conda env create -f environment.yml
conda activate digimuh

# Option B: pip
# reRandomStats is not on PyPI β€” install it from the v0.2.0 git tag first,
# then the editable install picks it up locally to satisfy the dependency.
pip install "git+https://github.com/zerotonin/reRandomStats.git@v0.2.0"
pip install -e ".[dev]"
```

## Quick start

```bash
# 1. Smoke test with 5 files per folder (~1 min)
digimuh-ingest /path/to/DigiMuh-Export --db cow_test.db --test-n 5

# 2. Full ingestion (~2–3 hours)
rm cow_test.db
digimuh-ingest /path/to/DigiMuh-Export --db cow.db

# 3. Query the database
python -c "
import sqlite3
con = sqlite3.connect('cow.db')
cur = con.execute('SELECT COUNT(*) FROM smaxtec_derived')
print(f'smaxtec_derived rows: {cur.fetchone()[0]:,}')
"
```

## Expected input layout

The ingestion script expects the DigiMuh CSV export directory to have this
structure:

```
DigiMuh-Export_2021-04-01_2024-09-30/
β”œβ”€β”€ output_allocations/
β”‚ └── allocations.csv
β”œβ”€β”€ outputs_bcs/
β”‚ └── {animal_id}_bcs_{date_range}.csv Γ—715
β”œβ”€β”€ outputs_gouna/
β”‚ └── {animal_id}_gouna_{date_range}.csv Γ—91
β”œβ”€β”€ outputs_herdeplus_mlp_gemelk_kalbung/
β”‚ └── {animal_id}_herdeplus_{date_range}.csv Γ—965
β”œβ”€β”€ outputs_hobo/
β”‚ └── hobo_exports_{date_range}.csv
β”œβ”€β”€ outputs_lorawan/
β”‚ └── {sensor_name}_LoRaWAN_raw_{date_range}.csv Γ—22
β”œβ”€β”€ outputs_smaxtec_barns/
β”‚ └── {barn_name}_smaxtec_raw_{date_range}.csv Γ—4
β”œβ”€β”€ outputs_smaxtec_derived/
β”‚ └── {animal_id}_smaxtec_derived_{date_range}.csv Γ—837
β”œβ”€β”€ outputs_smaxtec_events/
β”‚ └── {animal_id}_events.csv Γ—837
β”œβ”€β”€ outputs_smaxtec_water_intake/
β”‚ └── {animal_id}_smaxtec_derived_{date_range}.csv Γ—837
β”œβ”€β”€ herdeplus_diseases.csv
└── outputs_dwd.csv
```

Animal IDs are 15-digit EU ear tag numbers. The entity identifier is always
the first underscore-delimited segment of each filename.

## CLI reference

```
digimuh-ingest [-h] [--db DB] [--chunk-size N] [--verbose] [--test-n N] root_dir
```

| Argument | Description |
|---|---|
| `root_dir` | Root directory containing all CSV folders |
| `--db` | Output SQLite path (default: `cow.db`) |
| `--chunk-size` | Rows per INSERT batch (default: 50 000) |
| `--test-n N` | Only ingest first N files per folder |
| `--verbose`, `-v` | Print CREATE TABLE SQL and debug info |

## Running tests

```bash
python -m pytest
```

## Analysis pipeline

After ingestion, five analysis scripts are available as CLI commands. Each
creates analysis views on first run, queries the database, and writes results
(CSV data + figures) to an output directory.

```bash
# Install with analysis dependencies
pip install -e ".[analysis]"

# 0. Individual heat stress thresholds (broken-stick regression)
digimuh-broken-stick --db cow.db --tierauswahl Tierauswahl.xlsx --out results/broken_stick

# 1. Subclinical ketosis risk β€” FPR Γ— rumination Γ— milk yield
digimuh-ketosis --db cow.db --out results/ketosis

# 2. Heat stress β€” rumen temp Γ— THI Γ— water Γ— respiration
digimuh-heat --db cow.db --out results/heat

# 3. Digestive efficiency β€” motility Γ— pH β†’ milk composition (time-lagged)
digimuh-digestive --db cow.db --out results/digestive

# 4. Circadian disruption β€” 24h Fourier decomposition as welfare marker
digimuh-circadian --db cow.db --out results/circadian

# 5. Motility entropy β€” rumen HRV analogue via information theory
digimuh-entropy --db cow.db --out results/entropy
```

Each script writes:
- A CSV of the extracted features (for further analysis in R, Python, etc.)
- Publication-ready SVG + PNG figures
- A JSON summary of key results (where applicable)

See [`docs/database_structure.md`](docs/database_structure.md) for the SQL view
definitions that power these analyses.

## Roadmap

- [x] CSV β†’ SQLite ingestion with star schema
- [x] SQL views for analysis (daily summaries + cross-table joins)
- [x] Analysis: individual heat stress thresholds (broken-stick regression)
- [x] Analysis: subclinical ketosis detection (FPR + RF classifier)
- [x] Analysis: heat stress multi-sensor fusion
- [x] Analysis: digestive efficiency (motility–pH coupling)
- [x] Analysis: circadian rhythm disruption index
- [x] Analysis: motility pattern entropy (novel)
- [ ] Data validation and quality-check reports
- [ ] Parallelised entropy computation for full dataset
- [ ] Sphinx documentation on GitHub Pages

## Authors

**Bart R. H. Geurten** β€” Department of Zoology, University of Otago

## Citation

If you use DigiMuh in your research, please cite both DigiMuh and the
`reRandomStats` statistics toolkit it consumes for breakpoint detection
(broken-stick regression, Davies / Pseudo-Score tests, 4-parameter
Hill fit) and FDR correction. The version you used should match the
DOI you cite; full metadata for DigiMuh is in [`CITATION.cff`](CITATION.cff)
and on the GitHub repo's *Cite this repository* button.

> Geurten, B. R. H. (2026). *DigiMuh: Dairy-cow sensor data ingestion
> and heat-stress analysis pipeline* (Version 1.0.0) [Software].
> Zenodo. https://doi.org/10.5281/zenodo.20389795

```bibtex
@software{geurten_digimuh_v100,
author = {Geurten, Bart R. H.},
title = {{DigiMuh}: Dairy-cow sensor data ingestion and heat-stress
analysis pipeline},
year = {2026},
version = {1.0.0},
doi = {10.5281/zenodo.20389795},
url = {https://github.com/zerotonin/digimuh},
license = {MIT},
}

@software{geurten_rerandomstats_v020,
author = {Geurten, Bart R. H.},
title = {{reRandomStats}: Re-randomisation Statistics Toolkit},
year = {2026},
version = {0.2.0},
doi = {10.5281/zenodo.20387255},
url = {https://github.com/zerotonin/reRandomStats},
license = {MIT},
}
```

> **Note for Elsevier submissions:** Elsevier Editorial Manager does not
> parse `@software`; convert to `@misc` at submission time per the lab
> BibTeX convention.

## License

MIT