{"id":50501949,"url":"https://github.com/worldbank/trend-narrative","last_synced_at":"2026-06-02T12:30:48.004Z","repository":{"id":340600310,"uuid":"1165895990","full_name":"worldbank/trend-narrative","owner":"worldbank","description":"Python package for piecewise-linear trend detection and plain-English narrative generation for time series data.","archived":false,"fork":false,"pushed_at":"2026-04-07T17:42:49.000Z","size":216,"stargazers_count":6,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-07T19:12:58.009Z","etag":null,"topics":["data-science","natural-language-generation","open-source","piecewise-regression","time-series","trend-detection"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/trend-narrative/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/worldbank.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-24T16:57:51.000Z","updated_at":"2026-04-01T16:17:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/worldbank/trend-narrative","commit_stats":null,"previous_names":["yukinko-iwasaki/trend-narrative"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/worldbank/trend-narrative","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbank%2Ftrend-narrative","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbank%2Ftrend-narrative/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbank%2Ftrend-narrative/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbank%2Ftrend-narrative/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/worldbank","download_url":"https://codeload.github.com/worldbank/trend-narrative/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbank%2Ftrend-narrative/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33822812,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","natural-language-generation","open-source","piecewise-regression","time-series","trend-detection"],"created_at":"2026-06-02T12:30:47.123Z","updated_at":"2026-06-02T12:30:47.997Z","avatar_url":"https://github.com/worldbank.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# trend-narrative\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)\n[![GitHub Pages](https://img.shields.io/badge/docs-GitHub%20Pages-blue)](https://worldbank.github.io/trend-narrative/)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/)\n\n## Overview\n\nThe **trend-narrative** package is a standalone Python library that combines **piecewise-linear trend detection**, **relationship analysis**, and **multilingual narrative generation** for time-series data.\n\nGiven a time series — such as annual health spending or GDP figures — this package automatically identifies meaningful trends (e.g., \"rising from 2010 to 2015, then declining\") and produces a ready-to-use sentence describing them. It can also compare two time series and explain how they move together or apart over time.\n\nNarratives can be generated in **English** and **French**, with an extensible architecture for adding more languages.\n\nThis is useful for analysts, researchers, and developers who need to turn numeric data into human-readable summaries without writing custom text logic each time.\n\n## Documentation\n\nFull documentation is available at **[https://worldbank.github.io/trend-narrative/](https://worldbank.github.io/trend-narrative/)**.\n\n## Getting Started\n\n### Prerequisites\n\n- Python \u003e= 3.9\n- [uv](https://docs.astral.sh/uv/) (recommended) or pip\n\n### Installation\n\n```bash\nuv add trend-narrative\n```\n\n**For development** (editable install with test dependencies):\n\n```bash\ngit clone https://github.com/worldbank/trend-narrative.git\ncd trend-narrative\nuv sync --extra dev\n```\n\nDependencies: `numpy`, `scipy`, `pwlf`\n\n### Quick Example\n\n```python\nimport numpy as np\nfrom trend_narrative import InsightExtractor, TrendDetector, get_segment_narrative\n\nx = np.arange(2010, 2022, dtype=float)\ny = np.array([100, 110, 120, 130, 140, 150, 140, 130, 120, 110, 100, 90], dtype=float)\n\nextractor = InsightExtractor(x, y, detector=TrendDetector(max_segments=2))\nnarrative = get_segment_narrative(extractor=extractor, metric=\"health spending\")\nprint(narrative)\n# → \"From 2010 to 2015, the health spending showed an upward trend.\n#    Trend then shifted, reaching a peak in 2015 before reversing into a decline.\"\n```\n\n---\n\n## Usage\n\n### Trend Narratives\n\n#### Path 1 — from raw data\n\nCreate an `InsightExtractor` with your chosen detector, then pass it to the\nnarrative function. Keeping the two steps separate means you can swap in any\ncustom detector without touching the narrative layer:\n\n```python\nimport numpy as np\nfrom trend_narrative import InsightExtractor, TrendDetector, get_segment_narrative\n\nx = np.arange(2010, 2022, dtype=float)\ny = np.array([100, 110, 120, 130, 140, 150, 140, 130, 120, 110, 100, 90], dtype=float)\n\nextractor = InsightExtractor(x, y, detector=TrendDetector(max_segments=2))\nnarrative = get_segment_narrative(extractor=extractor, metric=\"health spending\")\n```\n\nYou can also call the extraction step separately if you need the raw numbers:\n\n```python\nsuite = extractor.extract_full_suite()\n# {\"cv_value\": 14.2, \"segments\": [...], \"n_points\": 12}\n```\n\n#### Path 2 — from precomputed data\n\nIf you already have segments and a CV value stored (e.g. from a database or\na previous extraction run), pass them directly — no re-fitting required:\n\n```python\nfrom trend_narrative import get_segment_narrative\n\nnarrative = get_segment_narrative(\n    segments=row[\"segments\"],\n    cv_value=row[\"cv_value\"],\n    metric=\"health spending\",\n)\n```\n\n### Multilingual Support\n\nAll narrative functions accept a `lang` parameter. The default is `\"en\"` (English), so existing code works unchanged.\n\n```python\n# English — plain strings work for any metric\nnarrative = get_segment_narrative(extractor=extractor, metric=\"health spending\")\n\n# French — see \"Grammatical agreement\" below for non-trivial metrics\nnarrative = get_segment_narrative(\n    extractor=extractor,\n    metric={\"name\": \"les dépenses de santé\", \"plural\": True, \"feminine\": True},\n    lang=\"fr\",\n)\n# → \"De 2010 à 2015, les dépenses de santé ont affiché une tendance à la hausse.\n#    La tendance s'est ensuite inversée, atteignant un pic en 2015 avant de s'inverser en déclin.\"\n```\n\nCurrently supported: `\"en\"` (English), `\"fr\"` (French).\n\n#### Grammatical agreement (French)\n\nFrench verbs and adjectives must agree with the metric's grammatical **number** (singular/plural) and **gender** (masculine/feminine). When the metric isn't singular masculine, pass it as a dict:\n\n```python\n{\"name\": \"les dépenses\",   \"plural\": True,  \"feminine\": True}   # plural feminine\n{\"name\": \"les taux\",       \"plural\": True,  \"feminine\": False}  # plural masculine\n{\"name\": \"la production\",  \"plural\": False, \"feminine\": True}   # singular feminine\n{\"name\": \"le taux\",        \"plural\": False, \"feminine\": False}  # singular masculine\n```\n\nThe `plural` / `feminine` keys default to `False`. A plain string (e.g. `metric=\"les dépenses\"`) is accepted, but defaults to singular masculine — silently producing wrong agreement like `dépenses **a augmenté**` instead of `dépenses **ont augmenté**`. **The dict form is strongly recommended for any French metric that isn't singular masculine.**\n\nThe same applies to `reference_name` and `comparison_name` in `get_relationship_narrative`:\n\n```python\nimport numpy as np\nfrom trend_narrative import get_relationship_narrative\n\nyears = np.array([2010, 2012, 2014, 2016, 2018, 2020], dtype=float)\nspending = np.array([100, 120, 140, 160, 180, 200], dtype=float)\ninflation = np.array([2.0, 2.3, 2.7, 3.0, 3.4, 3.8], dtype=float)\n\nresult = get_relationship_narrative(\n    reference_years=years, reference_values=spending,\n    comparison_years=years, comparison_values=inflation,\n    reference_name={\"name\": \"les dépenses\", \"plural\": True, \"feminine\": True},\n    comparison_name={\"name\": \"le taux d'inflation\"},   # singular masculine defaults\n    lang=\"fr\",\n)\n```\n\nGrammar flags on the dict are silently ignored for languages that don't need them (e.g. English), so the same call shape works across languages.\n\nSee [Adding a new language](#adding-a-new-language) below.\n\n### Relationship Narratives\n\nAnalyze the relationship between two time series (e.g., spending vs outcomes).\n\n#### Path 1 — from raw data\n\n```python\nimport numpy as np\nfrom trend_narrative import get_relationship_narrative\n\nresult = get_relationship_narrative(\n    reference_years=np.array([2010, 2012, 2014, 2016, 2018]),\n    reference_values=np.array([100, 120, 140, 160, 180]),\n    comparison_years=np.array([2010, 2012, 2014, 2016, 2018]),\n    comparison_values=np.array([50, 55, 62, 70, 78]),\n    reference_name=\"spending\",\n    comparison_name=\"outcome\",\n)\nprint(result[\"narrative\"])\n# → \"When spending increases, outcome tends to increase in the same year...\"\nprint(result[\"method\"])  # \"lagged_correlation\", \"comovement\", or \"insufficient_data\"\n```\n\n#### Path 2 — from precomputed insights\n\n```python\nfrom trend_narrative import get_relationship_narrative\n\nnarrative = get_relationship_narrative(\n    insights=row[\"relationship_insights\"],\n    reference_name=\"spending\",\n    comparison_name=\"outcome\",\n)\nprint(narrative[\"narrative\"])\n```\n\n#### Separate analysis and narrative generation\n\nUse `analyze_relationship()` when you want to inspect or store the analysis\nresults separately from narrative generation:\n\n```python\nimport numpy as np\nfrom trend_narrative import analyze_relationship, get_relationship_narrative\n\nyears = np.array([2010, 2012, 2014, 2016, 2018, 2020], dtype=float)\nspending = np.array([100, 120, 140, 160, 180, 200], dtype=float)\noutcome = np.array([50, 55, 62, 70, 78, 85], dtype=float)\n\ninsights = analyze_relationship(\n    reference_years=years,\n    reference_values=spending,\n    comparison_years=years,\n    comparison_values=outcome,\n)\n# Store insights in a database, inspect programmatically, etc.\nprint(insights[\"method\"])    # → \"lagged_correlation\"\nprint(insights[\"best_lag\"])  # → {\"lag\": 1, \"correlation\": 0.85, \"p_value\": 0.15, \"n_pairs\": 4}\nprint(insights[\"n_points\"])  # → 6\n\n# Generate narrative later from stored insights — no re-analysis needed\nresult = get_relationship_narrative(\n    insights=insights,\n    reference_name=\"spending\",\n    comparison_name=\"outcome\",\n)\nprint(result[\"narrative\"])\n```\n\nThe function automatically chooses the analysis method based on data availability:\n- **Lagged correlation**: \u003e= 5 points, tests correlations at various lags\n- **Comovement**: 3-4 points, describes directional movement within segments\n- **Insufficient data**: \u003c 3 points\n\n---\n\n## API Reference\n\n### `get_segment_narrative(segments, cv_value, metric=\"expenditure\", lang=\"en\")`\n### `get_segment_narrative(extractor, metric=\"expenditure\", lang=\"en\")`\n\nGenerates a narrative for a single time series. Accepts either\nprecomputed data or an `InsightExtractor` instance. Set `lang=\"fr\"` for French.\n\n`metric` accepts either a plain string or a dict with grammatical\nproperties: `{\"name\": str, \"plural\": bool, \"feminine\": bool}`. For French,\nuse the dict form when the metric isn't singular masculine — see\n[Grammatical agreement](#grammatical-agreement-french).\n\n- No segments + low CV → *\"remained highly stable\"*\n- No segments + high CV → *\"exhibited significant volatility\"*\n- Single segment → direction + % change sentence\n- Multi-segment → transition phrases (peak / trough / continuation)\n\n### `analyze_relationship(...)`\n\nAnalyzes the relationship between two time series and returns structured\ninsights without generating narrative text.\n\n```python\nanalyze_relationship(\n    reference_years,           # array-like, the \"driver\" series years\n    reference_values,          # array-like, the \"driver\" series values\n    comparison_years,          # array-like, the \"outcome\" series years\n    comparison_values,         # array-like, the \"outcome\" series values\n    reference_segments=None,   # optional pre-computed segments\n    correlation_threshold=5,   # min points for correlation analysis\n    max_lag_cap=5,             # max lag to test in years\n)\n```\n\nReturns a dict with:\n- `method`: \"lagged_correlation\", \"comovement\", or \"insufficient_data\"\n- `n_points`: int, number of points in sparser series\n- `segment_details`: list[dict], per-segment analysis (comovement only)\n- `best_lag`: dict with lag, correlation, p_value, n_pairs (correlation only)\n- `all_lags`: list of all tested lags (correlation only)\n- `max_lag_tested`: int, maximum lag tested (correlation only)\n- `reference_leads`: bool, whether reference series leads comparison\n\n### `get_relationship_narrative(...)`\n\nGenerates a narrative from relationship analysis. Accepts either precomputed\ninsights or raw data arrays.\n\n```python\nget_relationship_narrative(\n    # Raw data (optional if insights provided)\n    reference_years=None,      # array-like, the \"driver\" series years\n    reference_values=None,     # array-like, the \"driver\" series values\n    comparison_years=None,     # array-like, the \"outcome\" series years\n    comparison_values=None,    # array-like, the \"outcome\" series values\n    # Required for narrative — str or dict (see \"Grammatical agreement\")\n    reference_name=\"\",         # str | dict, display name for reference\n    comparison_name=\"\",        # str | dict, display name for comparison\n    # Optional parameters\n    reference_segments=None,   # optional pre-computed segments\n    correlation_threshold=5,   # min points for correlation analysis\n    max_lag_cap=5,             # max lag to test in years\n    reference_format=\".2f\",    # format spec or callable for reference values\n    comparison_format=\".2f\",   # format spec or callable for comparison values\n    time_unit=\"year\",          # \"year\", \"month\", \"quarter\" for narratives\n    reference_leads=None,      # True/False to override, None to infer\n    # Precomputed insights\n    insights=None,             # dict from analyze_relationship()\n    # Language\n    lang=\"en\",                 # \"en\" or \"fr\"\n)\n```\n\nReturns a dict with:\n- `narrative`: str, human-readable description\n- `method`: \"lagged_correlation\", \"comovement\", or \"insufficient_data\"\n- `n_points`: int, number of points in sparser series\n- `segment_details`: list[dict], per-segment analysis (comovement only)\n- `best_lag`: dict with lag details (correlation path only)\n- `all_lags`: list of all tested lags (correlation path only)\n- `max_lag_tested`: int, maximum lag tested (correlation only)\n\n### `TrendDetector(max_segments=3, threshold=0.05)`\n\nFits a piecewise-linear model using BIC-optimised segment count, snapping\nbreakpoints to integer years and local extrema.\n\n| Method | Returns | Description |\n|---|---|---|\n| `extract_trend(x, y)` | `list[dict]` | Fit model; return per-segment stats |\n| `fit_best_model(x, y)` | `pwlf model \\| None` | Run both fitting passes |\n| `calculate_bic(ssr, n, k)` | `float` | Static BIC helper |\n\nEach segment dict contains: `start_year`, `end_year`, `start_value`,\n`end_value`, `slope`, `p_value`.\n\n### `InsightExtractor(x, y, detector=None)`\n\nCombines volatility measurement with trend detection. Pass a custom detector\nto control the fitting logic.\n\n| Method | Returns | Description |\n|---|---|---|\n| `get_volatility()` | `float` | Coefficient of Variation (%) |\n| `get_structural_segments()` | `list[dict]` | Delegates to the detector |\n| `extract_full_suite()` | `dict` | `{cv_value, segments, n_points}` |\n\n### `consolidate_segments(segments)`\n\nMerges consecutive segments that share the same slope direction. Applied\nautomatically inside `get_segment_narrative`.\n\n### `millify(n, lang=\"en\")`\n\nFormats large numbers with a human-readable suffix. The decimal separator\nand magnitude suffixes come from the language catalog:\n\n- `millify(1_500_000)` → `\"1.50 M\"`\n- `millify(1_500_000, lang=\"fr\")` → `\"1,50 M\"`\n- `millify(3_000_000_000, lang=\"fr\")` → `\"3,00 Md\"` (milliard — NOT `\"B\"`,\n  which in French means 10¹², a false friend with English)\n\n---\n\n## Running Tests\n\n```bash\nuv run pytest\n# or with coverage:\nuv run pytest --cov=trend_narrative --cov-report=term-missing\n```\n\n---\n\n## Project Structure\n\n```\ntrend-narrative/\n├── trend_narrative/\n│   ├── __init__.py              # Public API\n│   ├── detector.py              # TrendDetector – piecewise-linear fitting\n│   ├── extractor.py             # InsightExtractor – volatility + trend facade\n│   ├── narrative.py             # Segment narrative composition\n│   ├── relationship_analysis.py # Relationship analysis (language-neutral)\n│   ├── relationship_narrative.py # Relationship narrative composition\n│   └── translations/            # All localization lives here\n│       ├── __init__.py          # Catalog access, ICU engine, _unpack_metric,\n│       │                        #   millify, _format_percent, _genitive,\n│       │                        #   _resolve_time_unit, _time_unit_comparison\n│       ├── en.py                # English catalog (data only)\n│       └── fr.py                # French catalog (data only)\n├── tests/\n│   ├── test_detector.py\n│   ├── test_extractor.py\n│   ├── test_narrative.py\n│   ├── test_relationship_analysis.py\n│   ├── test_relationship_narrative.py\n│   └── test_translations.py     # i18n primitives + catalog + integration\n├── pyproject.toml\n└── README.md\n```\n\n---\n\n## Adding a New Language\n\nTo add a new language (e.g. Spanish):\n\n1. **Copy** `trend_narrative/translations/en.py` to `trend_narrative/translations/es.py`.\n2. **Translate** every value in the `STRINGS` dict. Take care with nested keys:\n   - `number_format` (`decimal_sep`, `percent_template`, `suffixes`) — Spanish uses `,` for decimals, like French.\n   - `time_units` and `time_unit_genders` — singular/plural pairs and grammatical genders for each unit.\n   - `time_unit_fallback_plural_suffix` — what to append to unknown units when `count \u003e 1` (English uses `\"s\"`; languages without a one-size-fits-all rule should use `\"\"`).\n3. **Register** the new module in `trend_narrative/translations/__init__.py`:\n\n   ```python\n   from . import en, fr, es  # add the new import\n\n   _REGISTRY: dict[str, dict[str, object]] = {\n       \"en\": en.STRINGS,\n       \"fr\": fr.STRINGS,\n       \"es\": es.STRINGS,  # add the new entry\n   }\n   ```\n\n4. **Implement `_genitive_\u003clang\u003e`** in `translations/__init__.py` if your language has article contractions or elision (Spanish: `de + el → del`, `de + la` stays). Dispatch in `_genitive`:\n\n   ```python\n   if lang == \"es\":\n       return _genitive_es(name)\n   ```\n\n`SUPPORTED_LANGUAGES` updates automatically. A catalog-parity check fires at import time (`_assert_catalog_parity`) and raises `ImportError` if your catalog is missing any top-level keys present in English — so half-finished catalogs fail loud rather than producing wrong output in production.\n\n---\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on how to contribute to this project.\n\n## Code of Conduct\n\nThis project follows the [Contributor Covenant Code of Conduct](CODE_OF_CONDUCT.md).\n\n## Contact\n\nFor questions, feedback, or collaboration enquiries, please reach out to [ysuzuki2@worldbank.org](mailto:ysuzuki2@worldbank.org) and [wlu4@worldbank.org](mailto:wlu4@worldbank.org).\n\n## License\n\nThis project is licensed under the MIT License together with the [World Bank IGO Rider](WB-IGO-RIDER.md). The Rider is purely procedural: it reserves all privileges and immunities enjoyed by the World Bank, without adding restrictions to the MIT permissions. Please review both files before using, distributing or contributing.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworldbank%2Ftrend-narrative","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fworldbank%2Ftrend-narrative","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworldbank%2Ftrend-narrative/lists"}