{"id":49056233,"url":"https://github.com/ondata/opensdmx","last_synced_at":"2026-05-01T19:00:43.551Z","repository":{"id":348880501,"uuid":"1167389370","full_name":"ondata/opensdmx","owner":"ondata","description":"Python CLI and library for any SDMX 2.1 REST API — Eurostat, ISTAT, OECD, ECB, World Bank and more. AI-ready.","archived":false,"fork":false,"pushed_at":"2026-04-28T21:40:43.000Z","size":2458,"stargazers_count":2,"open_issues_count":5,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-28T23:20:46.070Z","etag":null,"topics":["cli","data","eurostat","istat","oecd","open-data","python","rest-api","sdmx","statistics"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ondata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-26T08:40:20.000Z","updated_at":"2026-04-28T21:40:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ondata/opensdmx","commit_stats":null,"previous_names":["aborruso/opensdmx","ondata/opensdmx"],"tags_count":52,"template":false,"template_full_name":null,"purl":"pkg:github/ondata/opensdmx","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ondata%2Fopensdmx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ondata%2Fopensdmx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ondata%2Fopensdmx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ondata%2Fopensdmx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ondata","download_url":"https://codeload.github.com/ondata/opensdmx/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ondata%2Fopensdmx/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32508912,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","data","eurostat","istat","oecd","open-data","python","rest-api","sdmx","statistics"],"created_at":"2026-04-19T23:10:53.823Z","updated_at":"2026-05-01T19:00:43.540Z","avatar_url":"https://github.com/ondata.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI version](https://img.shields.io/pypi/v/opensdmx)](https://pypi.org/project/opensdmx/)\n[![GitHub](https://img.shields.io/badge/github-ondata%2Fopensdmx-blue?logo=github)](https://github.com/ondata/opensdmx)\n[![deepwiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ondata/opensdmx)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Newsletter](https://img.shields.io/badge/newsletter-ondata-FF6719?logo=substack)](https://ondata.substack.com/)\n\n# opensdmx\n\n\u003e **Note:** this is an experimental tool — help us test it by [opening issues](https://github.com/ondata/opensdmx/issues) or sharing feedback.\n\nSimple Python CLI and library for any SDMX 2.1 REST API. Default provider: **Eurostat**. Built-in support for ISTAT, OECD, ECB, World Bank, and more.\n\n**The right way to get official statistics with AI.** Large language models are good at understanding questions, but they fabricate numerical data — research shows GenAI returns inaccurate statistics up to two-thirds of the time (IMF, *StatGPT: AI for Official Statistics*, 2026). The correct pattern is to use AI to *generate structured API queries*, not to generate the numbers. opensdmx is the execution layer for that pattern: the AI decides what to fetch, opensdmx fetches the exact published figure.\n\n\u003e **Best used with AI.** opensdmx works well on its own, but **it shines when driven by an AI agent**: the CLI is designed to be composed, queried, and orchestrated step by step. For a guided, interactive experience — dataset discovery, schema exploration, filter selection, and data retrieval — pair it with the [`sdmx-explorer`](https://github.com/ondata/opensdmx/blob/main/skills/sdmx-explorer/SKILL.md) Agent Skill included in this repo. See the [**installation guide**](https://github.com/ondata/opensdmx/blob/main/docs/skill/README.md) for step-by-step instructions.\n\n## Installation\n\n**As a CLI tool** (recommended — available system-wide):\n\n```bash\nuv tool install opensdmx\n```\n\n\u003e **Install uv** — Linux/macOS: `curl -LsSf https://astral.sh/uv/install.sh | sh` · Windows: `winget install astral-sh.uv`\n\n**As a library** (for use in Python projects):\n\n```bash\nuv add opensdmx\n# or\npip install opensdmx\n```\n\n**Update to the latest version:**\n\n```bash\nuv tool upgrade opensdmx            # if installed as CLI with `uv tool`\nuv lock --upgrade-package opensdmx  # if used as a library dependency\npip install --upgrade opensdmx      # if installed with pip\n```\n\n## CLI quick start\n\n```bash\nopensdmx search \"unemployment\"\nopensdmx info UNE_RT_M\nopensdmx constraints UNE_RT_M geo\nopensdmx get UNE_RT_M --freq M --geo IT --sex T --out data.csv\n\n# Save a query for later reuse\nopensdmx get TIPSUN20 --sex T --age Y15-74 --start-period 2020 --query-file unemployment.yaml\n\n# Re-run from the saved query\nopensdmx run unemployment.yaml --out results.csv\n```\n\n## Python quick start\n\n```python\nimport opensdmx\n\n# Default provider: Eurostat\ndatasets = opensdmx.all_available()\nprint(datasets.head())\n\n# Search by keyword\nresults = opensdmx.search_dataset(\"unemployment\")\n\n# One-liner retrieval (Eurostat default)\ndata = opensdmx.fetch(\"UNE_RT_M\", freq=\"M\", geo=\"IT\", sex=\"T\", age=\"TOTAL\")\n\n# Switch provider\nopensdmx.set_provider(\"istat\")\nopensdmx.set_provider(\"oecd\")\nopensdmx.set_provider(\"ecb\")\n```\n\n## Providers\n\n```python\nimport opensdmx\n\n# Built-in presets\nopensdmx.set_provider(\"eurostat\")   # default\nopensdmx.set_provider(\"istat\")\nopensdmx.set_provider(\"oecd\")\nopensdmx.set_provider(\"ecb\")\nopensdmx.set_provider(\"worldbank\")\n\n# Custom provider (agency_id optional)\nopensdmx.set_provider(\"https://mysdmx.org/rest\")\nopensdmx.set_provider(\"https://mysdmx.org/rest\", agency_id=\"XYZ\", rate_limit=1.0)\n\n# Check active provider\nopensdmx.get_provider()  # returns dict with base_url, agency_id, rate_limit, language\n```\n\n\u003e **Note on output columns:** Eurostat uses the compact `SDMX-CSV` format (dimensions + `TIME_PERIOD` + `OBS_VALUE`). Other providers (ECB, OECD, etc.) return the generic `text/csv` format, which includes additional series metadata columns (`TITLE`, `UNIT`, `DECIMALS`, etc.). This is expected behavior — filter columns with standard tools if needed.\n\n### Provider via CLI and environment variables\n\nUse `--provider` (or `-p`) on any command, or set `OPENSDMX_PROVIDER` once for the whole session:\n\n```bash\n# Per-command\nopensdmx search \"inflation\" --provider ecb\nopensdmx get EXR --provider https://data-api.ecb.europa.eu/service --FREQ D\n\n# Session-wide via env var\nexport OPENSDMX_PROVIDER=ecb\nopensdmx search \"inflation\"\nopensdmx get EXR --FREQ D --CURRENCY USD\n\n# Custom URL with agency\nexport OPENSDMX_PROVIDER=https://mysdmx.org/rest\nexport OPENSDMX_AGENCY=XYZ\nopensdmx get MYDATASET\n```\n\n## Python API\n\n| Function | Description |\n|---|---|\n| `set_provider(name_or_url, ...)` | Set active provider (`'eurostat'`, `'istat'`, or custom URL) |\n| `get_provider()` | Return active provider config dict |\n| `all_available()` | List all datasets → Polars DataFrame |\n| `search_dataset(keyword)` | Search by keyword in description |\n| `load_dataset(id)` | Create a dataset object (dict) |\n| `print_dataset(ds)` | Print dataset summary |\n| `dimensions_info(ds)` | Dimension metadata → Polars DataFrame |\n| `get_dimension_values(ds, dim)` | Codelist values for a dimension |\n| `get_available_values(ds)` | Values actually present in the data (via `availableconstraint`) |\n| `set_filters(ds, **kwargs)` | Set dimension filters |\n| `reset_filters(ds)` | Reset all filters to `\".\"` (all) |\n| `get_data(ds, ...)` | Retrieve data → Polars DataFrame |\n| `fetch(id, ..., **filters)` | One-liner: load dataset + set filters + get data |\n| `run_query(query_file)` | Run a query from a YAML file saved with `--query-file` → Polars DataFrame |\n| `semantic_search(query, n)` | Semantic search via Ollama embeddings → Polars DataFrame (requires `build_embeddings` first) |\n| `build_embeddings(progress)` | Build and cache Ollama embeddings for all datasets (requires Ollama + `nomic-embed-text-v2-moe`) |\n| `set_timeout(seconds)` | Get/set API timeout (default: 300 s) |\n| `parse_time_period(series)` | Convert SDMX time strings to dates |\n\n### `get_data` and `fetch` parameters\n\n| Parameter | Type | Description |\n|---|---|---|\n| `start_period` | `str` | Start date: `\"2020\"`, `\"2020-Q1\"`, `\"2020-01\"` |\n| `end_period` | `str` | End date (same formats) |\n| `last_n_observations` | `int` | Return only last N observations per series |\n| `first_n_observations` | `int` | Return only first N observations per series |\n\n## Example: EU Unemployment Rate\n\n```python\nimport opensdmx\nfrom plotnine import ggplot, aes, geom_line, geom_point, labs, theme_minimal, scale_x_date\n\n# Eurostat monthly unemployment by sex and age\nds = opensdmx.load_dataset(\"UNE_RT_M\")\nds = opensdmx.set_filters(ds, freq=\"M\", geo=\"IT\", sex=\"T\", age=\"TOTAL\", s_adj=\"SA\", unit=\"PC_ACT\")\ndata = opensdmx.get_data(ds, start_period=\"2015\", last_n_observations=60)\n\nimport polars as pl\ndata = data.with_columns(pl.col(\"OBS_VALUE\").cast(pl.Float64))\n\nplot = (\n    ggplot(data.to_pandas(), aes(x=\"TIME_PERIOD\", y=\"OBS_VALUE\"))\n    + geom_line(color=\"#1f77b4\", size=1)\n    + geom_point(color=\"#1f77b4\", size=0.8)\n    + labs(title=\"Italy Unemployment Rate (Monthly)\", x=\"Year\", y=\"Rate (%)\")\n    + scale_x_date(date_breaks=\"2 years\", date_labels=\"%Y\")\n    + theme_minimal()\n)\nplot.save(\"unemployment.png\", dpi=150, width=10, height=5)\n```\n\n## CLI\n\n### Commands\n\nAll commands accept `--provider` (`-p`) to select the provider.\n\n| Command | Description |\n|---|---|\n| `opensdmx search \u003ckeyword\u003e [--n N] [-p provider]` | Keyword search in dataset descriptions (default: 20 results) |\n| `opensdmx search --semantic \u003cquery\u003e [--n N]` | Semantic search (requires `opensdmx embed`) |\n| `opensdmx embed [-p provider]` | Build semantic embeddings cache via Ollama |\n| `opensdmx info \u003cid\u003e [-p provider]` | Show dataset metadata and dimensions |\n| `opensdmx values \u003cid\u003e \u003cdim\u003e [--grep pattern] [-p provider]` | Show codelist values for a dimension (case-insensitive); optionally filter by regex |\n| `opensdmx constraints \u003cid\u003e [dim] [--grep pattern] [-p provider]` | Show values actually present in the dataflow (via `availableconstraint`); optionally filter by regex |\n| `opensdmx tree [--scheme ID] [--category CAT] [--depth N] [-p provider]` | Browse the thematic tree (SDMX `categoryscheme` + `categorisation`); use `--category` to zoom into a subtree; ASCII tree in table mode, flat rows in JSON/CSV |\n| `opensdmx siblings \u003cid\u003e [-p provider]` | Show dataflow siblings in each category — discover related variants that text search misses |\n| `opensdmx search \u003ckeyword\u003e --category \u003cCAT\u003e [-p provider]` | Restrict search to a category (leaf id or dotted path); cuts false positives vs pure token match |\n| `opensdmx get \u003cid\u003e [--DIM VALUE] [--start-period P] [--end-period P] [--last-n N] [--first-n N] [--out file] [--query-file file.yaml] [-p provider]` | Download data; optionally save the query as YAML |\n| `opensdmx run \u003cquery.yaml\u003e [--out file] [-p provider]` | Re-run a query saved with `--query-file` |\n| `opensdmx plot \u003cid\\|file.csv\u003e [--DIM VALUE] [--geom line\\|bar\\|barh\\|point\\|scatter] [--out file] [-p provider]` | Plot data as chart |\n| `opensdmx blacklist [-p provider]` | List and remove datasets from the unavailability blacklist |\n\n### Examples\n\n```bash\n# Eurostat (default)\nopensdmx search \"unemployment\"\nopensdmx search \"unemployment\" --n 5\nopensdmx info UNE_RT_M\nopensdmx values UNE_RT_M FREQ          # case-insensitive: freq works too\nopensdmx constraints UNE_RT_M\nopensdmx constraints UNE_RT_M geo\nopensdmx get UNE_RT_M --freq M --geo IT --out data.csv\nopensdmx get UNE_RT_M --freq M --geo IT --out data.parquet\nopensdmx plot UNE_RT_M --freq M --geo IT --geom line\nopensdmx plot data.csv --geom scatter --x TIME_PERIOD --y OBS_VALUE\n\n# Other providers\nopensdmx search \"disoccupazione\" --provider istat\nopensdmx get 151_929 --provider istat --FREQ A --REF_AREA IT --out data.csv\nopensdmx search \"GDP\" --provider oecd\nopensdmx search \"inflation\" --provider ecb\n\n# Thematic tree (categoryscheme + categorisation)\nopensdmx tree --provider istat                                            # list thematic schemes\nopensdmx tree --scheme Z1000AGR --provider istat                          # browse ISTAT Agricoltura\nopensdmx tree --scheme Z0400PRI --category PRI_HARCONEU --provider istat  # zoom into IPCA subtree\nopensdmx tree --scheme t_economy --category t_prc                         # zoom into Prices subtree\nopensdmx search \"prezzi\" --category DCSP_PREZZIAGR --provider istat\nopensdmx siblings NAMA_10_GDP                        # 27 Eurostat GDP-related dataflows\nopensdmx siblings 104_466_DF_DCSP_FERTILIZZANTI_2 --provider istat  # all 7 fertilizer variants\n```\n\nNot every provider exposes the thematic tree. Run `opensdmx providers` and check\nthe `categories` column (✓/✗). Currently supported: `eurostat`, `istat`, `ecb`,\n`oecd`, `insee`, `abs`, `bis`.\n\n### Query files\n\nSave any `get` command as a YAML file with `--query-file`. The file captures the full query — provider, dataset, filters with human-readable descriptions, and time range — so it can be re-run, shared, or version-controlled.\n\n```bash\n# Save query\nopensdmx get TIPSUN20 \\\n  --sex T --age Y15-74 --unit PC_ACT \\\n  --geo \"AT+BE+DE+ES+FR+IT\" \\\n  --start-period 2020 --end-period 2024 \\\n  --query-file unemployment_eu.yaml\n```\n\nThe generated YAML:\n\n```yaml\nprovider: eurostat\nprovider_url: https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1\nagency_id: ESTAT\ndataset: TIPSUN20\ndescription: Unemployment rate - annual data\nfilters:\n  sex:\n    value: T\n    description: Total\n  age:\n    value: Y15-74\n    description: From 15 to 74 years\n  unit:\n    value: PC_ACT\n    description: Percentage of population in the labour force\n  geo:\n    value: AT+BE+DE+ES+FR+IT\n    description: ''\nstart_period: '2020'\nend_period: '2024'\nlast_n: null\nfirst_n: null\n```\n\nRe-run with `run` — output goes to stdout by default, or to a file with `--out`:\n\n```bash\nopensdmx run unemployment_eu.yaml\nopensdmx run unemployment_eu.yaml --out results.csv\nopensdmx run unemployment_eu.yaml --out results.parquet\n```\n\nProvider resolution order: `--provider` CLI flag → alias in YAML → `provider_url` + `agency_id` in YAML → environment variable. This means query files work with any provider, including custom URLs.\n\n### Thematic tree\n\nSDMX providers organise their datasets into a hierarchical **category tree** (schemes → categories → subcategories → datasets). `opensdmx tree` lets you browse this tree instead of guessing keywords.\n\n**Step 1 — list available schemes**\n\n```bash\nopensdmx tree\n```\n\n```\n┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┓\n┃ scheme_id  ┃ scheme_name                          ┃ n_df ┃\n┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━┩\n│ economy    │ Economy and finance                  │  418 │\n│ popul      │ Population and social conditions     │ 3747 │\n│ t_economy  │ Economy and finance                  │  130 │\n│ …          │ …                                    │  …   │\n└────────────┴──────────────────────────────────────┴──────┘\n```\n\n**Step 2 — browse a scheme**\n\n```bash\nopensdmx tree --scheme t_economy\n```\n\n```\nEconomy and finance (t_economy)\n├── Exchange rates  (3 df)\n├── Government statistics\n│   └── Government finance statistics (EDP and ESA 2010)\n│       ├── Annual government finance statistics  (2 df)\n│       ├── Government deficit and debt  (3 df)\n│       └── Quarterly government finance statistics  (2 df)\n├── National accounts (including GDP)\n│   ├── Annual national accounts\n│   │   ├── Main GDP aggregates  (10 df)\n│   │   └── …\n│   └── …\n└── Prices\n    ├── Harmonised index of consumer prices (HICP)  (24 df)\n    └── …\n```\n\nThe ASCII tree does not show category IDs. To retrieve them (needed for `--category` filtering), use CSV output:\n\n```bash\nopensdmx --output csv tree --scheme t_economy | grep -i hicp\n# t_economy,Economy and finance,t_prc_hicp,…,Harmonised index of consumer prices (HICP),…\n```\n\n**Step 3 — restrict search to a category**\n\nWithout a category filter, `search \"annual\"` returns 502 datasets from across all themes. With `--category`, it narrows to the exact subcategory:\n\n```bash\nopensdmx search \"annual\"                          # 502 results across all themes\nopensdmx search \"annual\" --category t_prc_hicp    # 1 result: HICP - all items - annual average indices\n```\n\n**Step 4 — zoom into a subtree with `--category`**\n\nTo navigate deeper without scrolling through the entire scheme, pass a category ID to `--category`:\n\n```bash\nopensdmx tree --scheme t_economy --category t_prc\n```\n\n```\nPrices (t_prc)\n├── Harmonised index of consumer prices (HICP)  (24 df)\n├── Housing price statistics  (1 df)\n└── Purchasing power parities  (3 df)\n```\n\nTo list all dataflows inside a category branch, combine `tree --category` (for the hierarchy) with `search \"\" --category` (for the actual datasets):\n\n```bash\nopensdmx tree --scheme t_economy --category t_prc       # see the subcategory structure\nopensdmx search \"\" --category t_prc                     # list all dataflows in that branch\n```\n\nIf you accidentally pass a category ID to `--scheme`, the CLI detects it and suggests the correct command:\n\n```bash\nopensdmx tree --scheme t_prc\n# → 't_prc' is a category, not a scheme.\n# → Use: opensdmx tree --scheme t_economy --category t_prc\n```\n\nUse `--depth` to limit the tree depth when a scheme is very large:\n\n```bash\nopensdmx tree --scheme t_economy --depth 1\n```\n\n### Semantic search\n\n`opensdmx search` has two modes:\n\n| Mode | How it works | Best for |\n|---|---|---|\n| Keyword (default) | Exact substring match on dataset title | When you know the right technical term |\n| `--semantic` | Embedding similarity via Ollama | When you don't know the exact wording, or want conceptually related datasets |\n\n#### Setup\n\nRequires [Ollama](https://ollama.com) with the `nomic-embed-text-v2-moe` model:\n\n```bash\nollama pull nomic-embed-text-v2-moe\nopensdmx embed              # build embeddings for default provider (eurostat)\nopensdmx embed -p istat     # build embeddings for ISTAT\n```\n\nFor providers that expose a thematic catalog (Eurostat, ISTAT, ECB, OECD, INSEE,\nABS, BIS), running `opensdmx tree` once before `opensdmx embed` enriches each\nembedding with the names of the categories the dataflow belongs to. This\nmaterially improves recall on short or generic queries — a dataflow whose only\ndescription is `Prezzi al consumo` becomes embeddable as `CPI Prezzi al consumo\nSDDS Plus Indicators Real Sector`, giving the model far more signal to work with.\n\n#### Tips for better results\n\n- **Use multi-word, descriptive queries.** A single word (`\"inflation\"`,\n  `\"unemployment\"`) provides little context and produces noisy rankings, in any\n  language. Phrases like `\"consumer price inflation\"` or `\"unemployment rate by\n  age and sex\"` work much better.\n- **English queries on non-English catalogs work well.** The model is\n  multilingual: an English query against an Italian catalog typically returns\n  good cross-lingual matches.\n\n#### Why semantic search matters\n\nThe SDMX catalog uses technical terminology. The same concept can appear under many different labels, or under none of the words you'd naturally use. Semantic search bridges that gap.\n\n**Example 1 — colloquial phrase, zero word overlap with catalog**\n\n```bash\nopensdmx search \"people struggling to make ends meet\"            # 0 results\nopensdmx search --semantic \"people struggling to make ends meet\" # finds ILC_MDES09 with score 0.844\n```\n\n| df_id | df_description | score |\n|---|---|---|\n| ILC_MDES09 | Inability to make ends meet | 0.844 |\n| ILC_DI10 | Mean and median income by ability to make ends meet | 0.586 |\n| ILC_IGTP02 | Transition of ability to make ends meet from childhood to current situation | 0.557 |\n| HLTH_DM060 | Ability to make ends meet by level of disability | 0.521 |\n| … | … | … |\n\nThe query shares no words with the results — the model matches the concept, not the text.\n\n**Example 2 — informal phrasing for a technical concept**\n\n```bash\nopensdmx search \"people without a job\"            # 0 results\nopensdmx search --semantic \"people without a job\" # finds unemployed/jobless datasets\n```\n\n| df_id | df_description | score |\n|---|---|---|\n| MED_PS423 | Proportion of persons living in jobless households | 0.609 |\n| LFSA_UGATES | Unemployed persons by type of employment sought | 0.592 |\n| LFSA_UGAN | Unemployed persons by citizenship | 0.581 |\n| LFSA_UGPIS | Unemployed persons by previous occupation | 0.578 |\n| … | … | … |\n\n**Example 3 — demographic concept expressed differently**\n\n```bash\nopensdmx search \"aging population senior citizens\"            # 0 results\nopensdmx search --semantic \"aging population senior citizens\" # finds population 65+ datasets\n```\n\n| df_id | df_description | score |\n|---|---|---|\n| TPS00028 | Proportion of population aged 65 and over | 0.646 |\n| TPS00010 | Population by age group | 0.550 |\n| ILC_LVPS30 | Distribution of population aged 65 and over by type of household | 0.544 |\n| … | … | … |\n\n**When keyword search is enough**\n\nWhen you already know the technical term, keyword search is faster and returns all matching datasets (not capped at 10). `search \"unemployment\"` returns 114 results; `search --semantic \"unemployment\"` returns the 10 most similar by score — useful to surface the most relevant ones quickly.\n\n**Rule of thumb:** start with a keyword search. If results are empty or off-target, switch to `--semantic`.\n\n#### How the score works\n\nThe `score` column is the **[cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)** between the query vector and each dataset description vector. Both are produced by [`nomic-embed-text-v2-moe`](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) — the score is the model's native output, not a rescaled metric. It ranges from 0 to 1 (higher = more similar); in practice most relevant results fall between 0.5 and 0.7.\n\nThe model converts text into high-dimensional vectors such that semantically related phrases point in similar directions, regardless of the exact words used. Cosine similarity measures the angle between two such vectors: a score of 1 means identical direction, 0 means orthogonal (unrelated).\n\nThe ranking therefore depends entirely on the model: a different model would produce different vectors and a different ordering. The model is fixed — if you rebuild embeddings with `opensdmx embed`, the same model is used.\n\n### Caching\n\nCache is namespaced per provider under `~/.cache/opensdmx/{AGENCY_ID}/`.\n\n| File | Content | Default TTL |\n|---|---|---|\n| `dataflows.parquet` | Dataset catalog | 7 days |\n| `cache.db` — structures + codelists | Dimensions, codelist descriptions and values | 30 days |\n| `cache.db` — constraints | Available constraint values per dataflow | 7 days |\n\nEnvironment variables:\n\n| Variable | Description |\n|---|---|\n| `OPENSDMX_PROVIDER` | Provider name or custom base URL (session-wide default) |\n| `OPENSDMX_AGENCY` | Agency ID for custom URL providers |\n| `OPENSDMX_DATAFLOWS_CACHE_TTL` | Dataset catalog TTL in seconds (default: `604800` — 7 days) |\n| `OPENSDMX_METADATA_CACHE_TTL` | Structure/codelist TTL in seconds (default: `2592000` — 30 days) |\n| `OPENSDMX_CONSTRAINTS_CACHE_TTL` | Constraints TTL in seconds (default: `604800` — 7 days) |\n\nSee `.env.example` for a ready-to-use template.\n\n## Timeout\n\n```python\nopensdmx.set_timeout()      # get current timeout (default: 300s)\nopensdmx.set_timeout(600)   # set to 10 minutes\n```\n\n## Validation\n\nopensdmx was tested against the benchmark scenario described in the IMF [*StatGPT: AI for Official Statistics*](https://www.imf.org/en/publications/departmental-papers-policy-papers/issues/2026/03/10/statgpt-ai-for-official-statistics-573514) paper (2026).\nThree independent AI agents received the same natural language question about G7 GDP growth,\nworked through the full skill loop autonomously, and produced **42/42 identical observations** —\nzero divergence across agents, zero variance on repeated calls.\n\nSee [docs/validation-statgpt.md](docs/validation-statgpt.md) for the full test and results.\n\n## Acknowledgements\n\nInspired by [istatR](https://github.com/jfulponi/istatR) by [@jfulponi](https://github.com/jfulponi) and [istatapi](https://github.com/Attol8/istatapi) by [@Attol8](https://github.com/Attol8).\n\n## Eurostat release calendar RSS feed\n\nA Cloudflare Worker that converts the Eurostat data release calendar into an RSS feed, filtered to data releases only.\n\n```\nhttps://eurostat-rss.andy-pr.workers.dev/\n```\n\nFilter by theme (`economy`, `agriculture`, `transport`, `environment`, `industry`, `population`, `international`, `science`):\n\n```\nhttps://eurostat-rss.andy-pr.workers.dev/?theme=economy\n```\n\nSource: [`scripts/eurostat-rss/`](scripts/eurostat-rss/).\n\n## License\n\nMIT License — Copyright (c) 2026 Andrea Borruso\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fondata%2Fopensdmx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fondata%2Fopensdmx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fondata%2Fopensdmx/lists"}