https://github.com/worldbank/trend-narrative
Python package for piecewise-linear trend detection and plain-English narrative generation for time series data.
https://github.com/worldbank/trend-narrative
data-science natural-language-generation open-source piecewise-regression time-series trend-detection
Last synced: about 1 month ago
JSON representation
Python package for piecewise-linear trend detection and plain-English narrative generation for time series data.
- Host: GitHub
- URL: https://github.com/worldbank/trend-narrative
- Owner: worldbank
- License: mit
- Created: 2026-02-24T16:57:51.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-04-07T17:42:49.000Z (3 months ago)
- Last Synced: 2026-04-07T19:12:58.009Z (3 months ago)
- Topics: data-science, natural-language-generation, open-source, piecewise-regression, time-series, trend-detection
- Language: Python
- Homepage: https://pypi.org/project/trend-narrative/
- Size: 211 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# trend-narrative
[](LICENSE)
[](https://worldbank.github.io/trend-narrative/)
[](https://www.python.org/)
## Overview
The **trend-narrative** package is a standalone Python library that combines **piecewise-linear trend detection**, **relationship analysis**, and **multilingual narrative generation** for time-series data.
Given a time series — such as annual health spending or GDP figures — this package automatically identifies meaningful trends (e.g., "rising from 2010 to 2015, then declining") and produces a ready-to-use sentence describing them. It can also compare two time series and explain how they move together or apart over time.
Narratives can be generated in **English** and **French**, with an extensible architecture for adding more languages.
This is useful for analysts, researchers, and developers who need to turn numeric data into human-readable summaries without writing custom text logic each time.
## Documentation
Full documentation is available at **[https://worldbank.github.io/trend-narrative/](https://worldbank.github.io/trend-narrative/)**.
## Getting Started
### Prerequisites
- Python >= 3.9
- [uv](https://docs.astral.sh/uv/) (recommended) or pip
### Installation
```bash
uv add trend-narrative
```
**For development** (editable install with test dependencies):
```bash
git clone https://github.com/worldbank/trend-narrative.git
cd trend-narrative
uv sync --extra dev
```
Dependencies: `numpy`, `scipy`, `pwlf`
### Quick Example
```python
import numpy as np
from trend_narrative import InsightExtractor, TrendDetector, get_segment_narrative
x = np.arange(2010, 2022, dtype=float)
y = np.array([100, 110, 120, 130, 140, 150, 140, 130, 120, 110, 100, 90], dtype=float)
extractor = InsightExtractor(x, y, detector=TrendDetector(max_segments=2))
narrative = get_segment_narrative(extractor=extractor, metric="health spending")
print(narrative)
# → "From 2010 to 2015, the health spending showed an upward trend.
# Trend then shifted, reaching a peak in 2015 before reversing into a decline."
```
---
## Usage
### Trend Narratives
#### Path 1 — from raw data
Create an `InsightExtractor` with your chosen detector, then pass it to the
narrative function. Keeping the two steps separate means you can swap in any
custom detector without touching the narrative layer:
```python
import numpy as np
from trend_narrative import InsightExtractor, TrendDetector, get_segment_narrative
x = np.arange(2010, 2022, dtype=float)
y = np.array([100, 110, 120, 130, 140, 150, 140, 130, 120, 110, 100, 90], dtype=float)
extractor = InsightExtractor(x, y, detector=TrendDetector(max_segments=2))
narrative = get_segment_narrative(extractor=extractor, metric="health spending")
```
You can also call the extraction step separately if you need the raw numbers:
```python
suite = extractor.extract_full_suite()
# {"cv_value": 14.2, "segments": [...], "n_points": 12}
```
#### Path 2 — from precomputed data
If you already have segments and a CV value stored (e.g. from a database or
a previous extraction run), pass them directly — no re-fitting required:
```python
from trend_narrative import get_segment_narrative
narrative = get_segment_narrative(
segments=row["segments"],
cv_value=row["cv_value"],
metric="health spending",
)
```
### Multilingual Support
All narrative functions accept a `lang` parameter. The default is `"en"` (English), so existing code works unchanged.
```python
# English — plain strings work for any metric
narrative = get_segment_narrative(extractor=extractor, metric="health spending")
# French — see "Grammatical agreement" below for non-trivial metrics
narrative = get_segment_narrative(
extractor=extractor,
metric={"name": "les dépenses de santé", "plural": True, "feminine": True},
lang="fr",
)
# → "De 2010 à 2015, les dépenses de santé ont affiché une tendance à la hausse.
# La tendance s'est ensuite inversée, atteignant un pic en 2015 avant de s'inverser en déclin."
```
Currently supported: `"en"` (English), `"fr"` (French).
#### Grammatical agreement (French)
French verbs and adjectives must agree with the metric's grammatical **number** (singular/plural) and **gender** (masculine/feminine). When the metric isn't singular masculine, pass it as a dict:
```python
{"name": "les dépenses", "plural": True, "feminine": True} # plural feminine
{"name": "les taux", "plural": True, "feminine": False} # plural masculine
{"name": "la production", "plural": False, "feminine": True} # singular feminine
{"name": "le taux", "plural": False, "feminine": False} # singular masculine
```
The `plural` / `feminine` keys default to `False`. A plain string (e.g. `metric="les dépenses"`) is accepted, but defaults to singular masculine — silently producing wrong agreement like `dépenses **a augmenté**` instead of `dépenses **ont augmenté**`. **The dict form is strongly recommended for any French metric that isn't singular masculine.**
The same applies to `reference_name` and `comparison_name` in `get_relationship_narrative`:
```python
import numpy as np
from trend_narrative import get_relationship_narrative
years = np.array([2010, 2012, 2014, 2016, 2018, 2020], dtype=float)
spending = np.array([100, 120, 140, 160, 180, 200], dtype=float)
inflation = np.array([2.0, 2.3, 2.7, 3.0, 3.4, 3.8], dtype=float)
result = get_relationship_narrative(
reference_years=years, reference_values=spending,
comparison_years=years, comparison_values=inflation,
reference_name={"name": "les dépenses", "plural": True, "feminine": True},
comparison_name={"name": "le taux d'inflation"}, # singular masculine defaults
lang="fr",
)
```
Grammar flags on the dict are silently ignored for languages that don't need them (e.g. English), so the same call shape works across languages.
See [Adding a new language](#adding-a-new-language) below.
### Relationship Narratives
Analyze the relationship between two time series (e.g., spending vs outcomes).
#### Path 1 — from raw data
```python
import numpy as np
from trend_narrative import get_relationship_narrative
result = get_relationship_narrative(
reference_years=np.array([2010, 2012, 2014, 2016, 2018]),
reference_values=np.array([100, 120, 140, 160, 180]),
comparison_years=np.array([2010, 2012, 2014, 2016, 2018]),
comparison_values=np.array([50, 55, 62, 70, 78]),
reference_name="spending",
comparison_name="outcome",
)
print(result["narrative"])
# → "When spending increases, outcome tends to increase in the same year..."
print(result["method"]) # "lagged_correlation", "comovement", or "insufficient_data"
```
#### Path 2 — from precomputed insights
```python
from trend_narrative import get_relationship_narrative
narrative = get_relationship_narrative(
insights=row["relationship_insights"],
reference_name="spending",
comparison_name="outcome",
)
print(narrative["narrative"])
```
#### Separate analysis and narrative generation
Use `analyze_relationship()` when you want to inspect or store the analysis
results separately from narrative generation:
```python
import numpy as np
from trend_narrative import analyze_relationship, get_relationship_narrative
years = np.array([2010, 2012, 2014, 2016, 2018, 2020], dtype=float)
spending = np.array([100, 120, 140, 160, 180, 200], dtype=float)
outcome = np.array([50, 55, 62, 70, 78, 85], dtype=float)
insights = analyze_relationship(
reference_years=years,
reference_values=spending,
comparison_years=years,
comparison_values=outcome,
)
# Store insights in a database, inspect programmatically, etc.
print(insights["method"]) # → "lagged_correlation"
print(insights["best_lag"]) # → {"lag": 1, "correlation": 0.85, "p_value": 0.15, "n_pairs": 4}
print(insights["n_points"]) # → 6
# Generate narrative later from stored insights — no re-analysis needed
result = get_relationship_narrative(
insights=insights,
reference_name="spending",
comparison_name="outcome",
)
print(result["narrative"])
```
The function automatically chooses the analysis method based on data availability:
- **Lagged correlation**: >= 5 points, tests correlations at various lags
- **Comovement**: 3-4 points, describes directional movement within segments
- **Insufficient data**: < 3 points
---
## API Reference
### `get_segment_narrative(segments, cv_value, metric="expenditure", lang="en")`
### `get_segment_narrative(extractor, metric="expenditure", lang="en")`
Generates a narrative for a single time series. Accepts either
precomputed data or an `InsightExtractor` instance. Set `lang="fr"` for French.
`metric` accepts either a plain string or a dict with grammatical
properties: `{"name": str, "plural": bool, "feminine": bool}`. For French,
use the dict form when the metric isn't singular masculine — see
[Grammatical agreement](#grammatical-agreement-french).
- No segments + low CV → *"remained highly stable"*
- No segments + high CV → *"exhibited significant volatility"*
- Single segment → direction + % change sentence
- Multi-segment → transition phrases (peak / trough / continuation)
### `analyze_relationship(...)`
Analyzes the relationship between two time series and returns structured
insights without generating narrative text.
```python
analyze_relationship(
reference_years, # array-like, the "driver" series years
reference_values, # array-like, the "driver" series values
comparison_years, # array-like, the "outcome" series years
comparison_values, # array-like, the "outcome" series values
reference_segments=None, # optional pre-computed segments
correlation_threshold=5, # min points for correlation analysis
max_lag_cap=5, # max lag to test in years
)
```
Returns a dict with:
- `method`: "lagged_correlation", "comovement", or "insufficient_data"
- `n_points`: int, number of points in sparser series
- `segment_details`: list[dict], per-segment analysis (comovement only)
- `best_lag`: dict with lag, correlation, p_value, n_pairs (correlation only)
- `all_lags`: list of all tested lags (correlation only)
- `max_lag_tested`: int, maximum lag tested (correlation only)
- `reference_leads`: bool, whether reference series leads comparison
### `get_relationship_narrative(...)`
Generates a narrative from relationship analysis. Accepts either precomputed
insights or raw data arrays.
```python
get_relationship_narrative(
# Raw data (optional if insights provided)
reference_years=None, # array-like, the "driver" series years
reference_values=None, # array-like, the "driver" series values
comparison_years=None, # array-like, the "outcome" series years
comparison_values=None, # array-like, the "outcome" series values
# Required for narrative — str or dict (see "Grammatical agreement")
reference_name="", # str | dict, display name for reference
comparison_name="", # str | dict, display name for comparison
# Optional parameters
reference_segments=None, # optional pre-computed segments
correlation_threshold=5, # min points for correlation analysis
max_lag_cap=5, # max lag to test in years
reference_format=".2f", # format spec or callable for reference values
comparison_format=".2f", # format spec or callable for comparison values
time_unit="year", # "year", "month", "quarter" for narratives
reference_leads=None, # True/False to override, None to infer
# Precomputed insights
insights=None, # dict from analyze_relationship()
# Language
lang="en", # "en" or "fr"
)
```
Returns a dict with:
- `narrative`: str, human-readable description
- `method`: "lagged_correlation", "comovement", or "insufficient_data"
- `n_points`: int, number of points in sparser series
- `segment_details`: list[dict], per-segment analysis (comovement only)
- `best_lag`: dict with lag details (correlation path only)
- `all_lags`: list of all tested lags (correlation path only)
- `max_lag_tested`: int, maximum lag tested (correlation only)
### `TrendDetector(max_segments=3, threshold=0.05)`
Fits a piecewise-linear model using BIC-optimised segment count, snapping
breakpoints to integer years and local extrema.
| Method | Returns | Description |
|---|---|---|
| `extract_trend(x, y)` | `list[dict]` | Fit model; return per-segment stats |
| `fit_best_model(x, y)` | `pwlf model \| None` | Run both fitting passes |
| `calculate_bic(ssr, n, k)` | `float` | Static BIC helper |
Each segment dict contains: `start_year`, `end_year`, `start_value`,
`end_value`, `slope`, `p_value`.
### `InsightExtractor(x, y, detector=None)`
Combines volatility measurement with trend detection. Pass a custom detector
to control the fitting logic.
| Method | Returns | Description |
|---|---|---|
| `get_volatility()` | `float` | Coefficient of Variation (%) |
| `get_structural_segments()` | `list[dict]` | Delegates to the detector |
| `extract_full_suite()` | `dict` | `{cv_value, segments, n_points}` |
### `consolidate_segments(segments)`
Merges consecutive segments that share the same slope direction. Applied
automatically inside `get_segment_narrative`.
### `millify(n, lang="en")`
Formats large numbers with a human-readable suffix. The decimal separator
and magnitude suffixes come from the language catalog:
- `millify(1_500_000)` → `"1.50 M"`
- `millify(1_500_000, lang="fr")` → `"1,50 M"`
- `millify(3_000_000_000, lang="fr")` → `"3,00 Md"` (milliard — NOT `"B"`,
which in French means 10¹², a false friend with English)
---
## Running Tests
```bash
uv run pytest
# or with coverage:
uv run pytest --cov=trend_narrative --cov-report=term-missing
```
---
## Project Structure
```
trend-narrative/
├── trend_narrative/
│ ├── __init__.py # Public API
│ ├── detector.py # TrendDetector – piecewise-linear fitting
│ ├── extractor.py # InsightExtractor – volatility + trend facade
│ ├── narrative.py # Segment narrative composition
│ ├── relationship_analysis.py # Relationship analysis (language-neutral)
│ ├── relationship_narrative.py # Relationship narrative composition
│ └── translations/ # All localization lives here
│ ├── __init__.py # Catalog access, ICU engine, _unpack_metric,
│ │ # millify, _format_percent, _genitive,
│ │ # _resolve_time_unit, _time_unit_comparison
│ ├── en.py # English catalog (data only)
│ └── fr.py # French catalog (data only)
├── tests/
│ ├── test_detector.py
│ ├── test_extractor.py
│ ├── test_narrative.py
│ ├── test_relationship_analysis.py
│ ├── test_relationship_narrative.py
│ └── test_translations.py # i18n primitives + catalog + integration
├── pyproject.toml
└── README.md
```
---
## Adding a New Language
To add a new language (e.g. Spanish):
1. **Copy** `trend_narrative/translations/en.py` to `trend_narrative/translations/es.py`.
2. **Translate** every value in the `STRINGS` dict. Take care with nested keys:
- `number_format` (`decimal_sep`, `percent_template`, `suffixes`) — Spanish uses `,` for decimals, like French.
- `time_units` and `time_unit_genders` — singular/plural pairs and grammatical genders for each unit.
- `time_unit_fallback_plural_suffix` — what to append to unknown units when `count > 1` (English uses `"s"`; languages without a one-size-fits-all rule should use `""`).
3. **Register** the new module in `trend_narrative/translations/__init__.py`:
```python
from . import en, fr, es # add the new import
_REGISTRY: dict[str, dict[str, object]] = {
"en": en.STRINGS,
"fr": fr.STRINGS,
"es": es.STRINGS, # add the new entry
}
```
4. **Implement `_genitive_`** in `translations/__init__.py` if your language has article contractions or elision (Spanish: `de + el → del`, `de + la` stays). Dispatch in `_genitive`:
```python
if lang == "es":
return _genitive_es(name)
```
`SUPPORTED_LANGUAGES` updates automatically. A catalog-parity check fires at import time (`_assert_catalog_parity`) and raises `ImportError` if your catalog is missing any top-level keys present in English — so half-finished catalogs fail loud rather than producing wrong output in production.
---
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on how to contribute to this project.
## Code of Conduct
This project follows the [Contributor Covenant Code of Conduct](CODE_OF_CONDUCT.md).
## Contact
For questions, feedback, or collaboration enquiries, please reach out to [ysuzuki2@worldbank.org](mailto:ysuzuki2@worldbank.org) and [wlu4@worldbank.org](mailto:wlu4@worldbank.org).
## License
This project is licensed under the MIT License together with the [World Bank IGO Rider](WB-IGO-RIDER.md). The Rider is purely procedural: it reserves all privileges and immunities enjoyed by the World Bank, without adding restrictions to the MIT permissions. Please review both files before using, distributing or contributing.