{"id":19950887,"url":"https://github.com/franckalbinet/trufl","last_synced_at":"2026-03-06T21:34:06.776Z","repository":{"id":242198555,"uuid":"801191452","full_name":"franckalbinet/trufl","owner":"franckalbinet","description":"Toolkit allowing to optimise adaptive spatial sampling using Linear Programming and RL.","archived":false,"fork":false,"pushed_at":"2024-07-18T12:07:20.000Z","size":16188,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-03T12:51:05.007Z","etag":null,"topics":["environmental-monitoring","linear-programming","multiple-decision-criteria"],"latest_commit_sha":null,"homepage":"https://fr.anckalbi.net/trufl","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/franckalbinet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-15T19:10:09.000Z","updated_at":"2024-09-18T15:33:52.000Z","dependencies_parsed_at":"2024-06-01T10:01:19.745Z","dependency_job_id":"0ba57e02-9453-434a-bc71-e85e59814c77","html_url":"https://github.com/franckalbinet/trufl","commit_stats":null,"previous_names":["franckalbinet/trufl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/franckalbinet/trufl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franckalbinet%2Ftrufl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franckalbinet%2Ftrufl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franckalbinet%2Ftrufl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franckalbinet%2Ftrufl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/franckalbinet","download_url":"https://codeload.github.com/franckalbinet/trufl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/franckalbinet%2Ftrufl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30198670,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T19:07:06.838Z","status":"ssl_error","status_checked_at":"2026-03-06T18:57:34.882Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["environmental-monitoring","linear-programming","multiple-decision-criteria"],"created_at":"2024-11-13T01:05:55.059Z","updated_at":"2026-03-06T21:34:06.737Z","avatar_url":"https://github.com/franckalbinet.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Trufl\n\n\n\u003c!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! --\u003e\n\n**Trufl** was initiated in the context of the [IAEA (International\nAtomic Energy Agency)](https://www.iaea.org) Coordinated Research\nProject (CRP) titled [“Monitoring and Predicting Radionuclide Uptake and\nDynamics for Optimizing Remediation of Radioactive Contamination in\nAgriculture”](https://www.iaea.org/newscenter/news/new-crp-monitoring-and-predicting-radionuclide-uptake-and-dynamics-for-optimizing-remediation-of-radioactive-contamination-in-agriculture-crp-d15019).\n\nWhile **Trufl** was originally developed to address the remediation of\nfarmland affected by nuclear accidents, its approach and algorithms are\n**applicable to a wide range of application domains**. This includes\nmanaging **legacy contaminants** or monitoring phenomena that require\nconsideration of multiple decision criteria over time, taking into\naccount a wide range of factors and contexts.\n\nThis package leverages the work done by [Floris\nAbrams](https://www.linkedin.com/in/floris-abrams-59080a15a) in the\ncontext of his PhD in collaboration between [SCK\nCEN](https://www.sckcen.be) and [KU Leuven](https://www.kuleuven.be) and\n[Franck Albinet](https://www.linkedin.com/in/franckalbinet),\nInternational Consultant in Geospatial Data Science and currently PhD\nresearcher in AI applied to nuclear remedation at [KU\nLeuven](https://www.kuleuven.be).\n\n## Install\n\n`pip install trufl`\n\n## Getting started\n\nIn highly sensitive and high-stakes situations, it is **essential that\ndecision making is informed, transparent, and accountable**, with\ndecisions being based on a thorough and objective analysis of the\navailable data and the needs and concerns of affected communities being\ntaken into account.\n\nGiven the time constraints and limited budgets that are often associated\nwith data surveys (in particular ones supposed to informed highly\nsensitive situation), it is **crucial to make informed decisions about\nhow to allocate resources**. This is even more important when\nconsidering the many variables that can be taken into account, such as\nprior knowledge of the area, health and economic impacts, land use,\nwhether remediation has already taken place, population density, and\nmore. Our approach leverages **Multiple-criteria decision-making**\napproaches to optimize the data survey workflow:\n\nIn this demo, we will walk you through a **typical workflow** using the\n`Trufl` package. To help illustrate the process, we will use a “toy”\ndataset that represents a typical spatial pattern of soil contaminants.\n\n1.  We **assume that we have access to the ground truth**, which is a\n    raster file that shows the spatial distribution of a soil\n    contaminant;\n2.  We will make decisions about how to optimally sample the\n    **administrative units (polygons)**, which in this case are\n    **simulated as a grid** (using the\n    [`gridder`](https://franckalbinet.github.io/trufl/utils.html#gridder)\n    utilities function);\n3.  Based on prior knowledge, such as prior airborne surveys or other\n    data, an\n    [`Optimizer`](https://franckalbinet.github.io/trufl/optimizer.html#optimizer)\n    will **rank each administrative unit (grid cell) according to its\n    priority for sampling**;\n4.  We will then **perform random sampling on the designated units (grid\n    cells)** (using a\n    [`Sampler`](https://franckalbinet.github.io/trufl/sampler.html#sampler)).\n    To simulate the measurement process, we will use the ground truth to\n    emulate measurements at each location (using a\n    [`DataCollector`](https://franckalbinet.github.io/trufl/collector.html#datacollector));\n5.  We will **evaluate the new state of each unit based on the\n    measurements** and **pass it to a new round of optimization**. This\n    process will be repeated iteratively to refine the sampling\n    strategy.\n\n### Imports\n\n``` python\nimport matplotlib.pyplot as plt\nimport matplotlib.lines as mlines\n\nimport numpy as np\nimport pandas as pd\nimport rasterio\nimport geopandas as gpd\n\nfrom trufl.utils import gridder\nfrom trufl.sampler import Sampler, rank_to_sample\nfrom trufl.collector import DataCollector\nfrom trufl.callbacks import (State, MaxCB, MinCB, StdCB, CountCB, MoranICB, PriorCB)\nfrom trufl.optimizer import Optimizer\n\n\nred, black = '#BF360C', '#263238'\n```\n\n### Our simulated ground truth\n\nThe assumed ground truth reveals a typical spatial pattern of\ncontaminant such as `Cs137` after a nuclear accident for instance:\n\n``` python\nfname_raster = './files/ground-truth-01-4326-simulated.tif'\nwith rasterio.open(fname_raster) as src:\n    plt.axis('off')\n    plt.imshow(src.read(1))\n    plt.title('Simulated Ground Truth')\n```\n\n![](index_files/figure-commonmark/cell-3-output-1.png)\n\n### Simulate administrative units\n\nThe sampling strategy will be determined on a per-grid-cell basis within\nthe administrative unit. We define below a 10 x 10 grid over the area of\ninterest:\n\n``` python\ngdf_grid = gridder(fname_raster, nrows=10, ncols=10)\ngdf_grid.head()\n```\n\n\u003cdiv\u003e\n\n\u003cdiv\u003e\n\u003cstyle scoped\u003e\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\u0026#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\u0026#10;    .dataframe thead th {\n        text-align: right;\n    }\n\u003c/style\u003e\n\n|        | geometry                                          |\n|--------|---------------------------------------------------|\n| loc_id |                                                   |\n| 0      | POLYGON ((-1.20830 43.26950, -1.20830 43.26042... |\n| 1      | POLYGON ((-1.20830 43.27858, -1.20830 43.26950... |\n| 2      | POLYGON ((-1.20830 43.28766, -1.20830 43.27858... |\n| 3      | POLYGON ((-1.20830 43.29673, -1.20830 43.28766... |\n| 4      | POLYGON ((-1.20830 43.30581, -1.20830 43.29673... |\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003e [!TIP]\n\u003e\n\u003e Note how each administrative unit is uniquely identified by its\n\u003e `loc_id`.\n\n``` python\ngdf_grid.boundary.plot(color=black, lw=0.5)\nplt.axis('off')\nplt.title('Simulated Administrative Units');\n```\n\n![](index_files/figure-commonmark/cell-5-output-1.png)\n\n### Round I: Optimize sampling based on prior at $t_0$\n\n#### What prior knowledge do we have?\n\nAt the initial time $t_0$, data sampling has not yet begun, but we can\noften **leverage existing prior knowledge of our phenomenon** of\ninterest to inform our sampling strategy/policy. In the context of\nnuclear remediation, this prior knowledge can often be obtained through\nmobile surveys, such as airborne or carborne surveys, which can provide\na **coarse estimation** of soil contamination levels.\n\nIn the example below, we **simulate prior information** about the soil\nproperty of interest by **calculating the average value of the property\nover each grid cell**.\n\nAt this stage, we have no measurements, so we simply create an empty\n[Geopandas\nGeoDataFrame](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.html).\n\n``` python\nsamples_t0 = gpd.GeoDataFrame(index=pd.Index([], name='loc_id'), \n                              geometry=None, data={'value': None})\n```\n\n\u003e [!TIP]\n\u003e\n\u003e We need to set an index `loc_id` and have a `geometry` and `value`\n\u003e columns.\n\nNow we get/“sense” the state of our grid cells based on the simulated\nprior (Mean over each grid cell\n[`PriorCB`](https://franckalbinet.github.io/trufl/callbacks.html#priorcb)):\n\n``` python\nstate = State(samples_t0, gdf_grid, cbs=[PriorCB(fname_raster)])\n\n# You have to call the instance\nstate_t0 = state(); state_t0.head()\n```\n\n\u003cdiv\u003e\n\n\u003cdiv\u003e\n\u003cstyle scoped\u003e\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\u0026#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\u0026#10;    .dataframe thead th {\n        text-align: right;\n    }\n\u003c/style\u003e\n\n|        | Prior    |\n|--------|----------|\n| loc_id |          |\n| 0      | 0.102492 |\n| 1      | 0.125727 |\n| 2      | 0.161802 |\n| 3      | 0.184432 |\n| 4      | 0.201405 |\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n``` python\ngdf_grid.join(state_t0, how='left').plot(column='Prior',\n                                         cmap='viridis', \n                                         legend_kwds={'label': 'Value'}, \n                                         legend=True)\nplt.axis('off')\nplt.title('Prior: Mean value at Administrative Unit level');\n```\n\n![](index_files/figure-commonmark/cell-8-output-1.png)\n\n\u003e [!TIP]\n\u003e\n\u003e We get the `Prior` for each individual `loc_id` (here only the first 5\n\u003e shown). The current\n\u003e [`State`](https://franckalbinet.github.io/trufl/callbacks.html#state)\n\u003e is only composed of a single\n\u003e [`PriorCB`](https://franckalbinet.github.io/trufl/callbacks.html#priorcb)\n\u003e variable but can include many more variables as we will see below.\n\n#### Sampling priority ranks\n\n``` python\nbenefit_criteria = [True]\noptimizer = Optimizer(state=state_t0)\ndf_rank = optimizer.get_rank(is_benefit_x=benefit_criteria, w_vector = [1],  \n                             n_method=None, c_method = None, \n                             w_method=None, s_method=\"CP\")\n\ndf_rank.head()\n```\n\n\u003cdiv\u003e\n\n\u003cdiv\u003e\n\u003cstyle scoped\u003e\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\u0026#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\u0026#10;    .dataframe thead th {\n        text-align: right;\n    }\n\u003c/style\u003e\n\n|        | rank |\n|--------|------|\n| loc_id |      |\n| 92     | 1    |\n| 93     | 2    |\n| 91     | 3    |\n| 94     | 4    |\n| 82     | 5    |\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003e [!TIP]\n\u003e\n\u003e For more information on how the\n\u003e [`Optimizer`](https://franckalbinet.github.io/trufl/optimizer.html#optimizer)\n\u003e operates, please see the section [Delving deeper into the optimization\n\u003e process](#delving-deeper-into-the-optimization-process).\n\n``` python\ngdf_grid.join(df_rank, how='left').plot(column='rank',\n                                        cmap='viridis_r', \n                                        legend_kwds={'label': 'Rank'}, \n                                        legend=True)\nplt.axis('off')\nplt.title('Sampling Priorirty Rank');\n```\n\n![](index_files/figure-commonmark/cell-10-output-1.png)\n\n#### Informed random sampling\n\n\u003e [!TIP]\n\u003e\n\u003e It’s worth noting that in the absence of any prior knowledge, a\n\u003e uniform sampling strategy over the area of interest may be used.\n\u003e However, this approach may not be the most efficient use of the\n\u003e available data collection and analysis budget.\n\nBased on the **ranks (sampling priority)** calculated by the\n[`Optimizer`](https://franckalbinet.github.io/trufl/optimizer.html#optimizer)\nand given sampling **budget**, let’s calculate the number of samples to\nbe collected for each administrative unit (`loc_id`). **Different\nsampling policies** can be used (Weighted, Quantiles, …):\n\n``` python\nbudget_t0 = 600\nn = rank_to_sample(df_rank['rank'].sort_index().values, \n                   budget=budget_t0, min=1, policy=\"quantiles\"); n\n```\n\n    array([ 1,  1,  1,  1,  1,  1,  1,  1,  1,  4,  1,  1,  1,  1,  1,  4,  4,\n            4,  4,  4,  1,  1,  1,  1,  1,  4,  4,  4,  4,  4,  1,  1,  1,  4,\n            4,  7,  7, 12,  7,  7,  1,  1,  4,  4,  7, 12, 12, 12, 12,  7,  1,\n            4,  4,  7,  7, 12, 12, 12,  7,  4,  4,  4,  7,  7,  7,  7,  7,  7,\n            4,  4,  4,  7, 12,  7, 12, 12,  7,  7,  4,  4,  7, 12, 12, 12, 12,\n           12, 12, 12,  7,  7, 12, 12, 12, 12, 12, 12, 12,  7,  7,  7])\n\nWe can now decide where to sample based on this sampling schema:\n\n``` python\nsampler = Sampler(gdf_grid)\nsample_locs_t0 = sampler.sample(n, method='uniform')\n\nprint(sample_locs_t0.head())\nax = sample_locs_t0.plot(markersize=2, color=red)\n\ngdf_grid.boundary.plot(color=black, lw=0.5, ax=ax)\nplt.axis('off')\nplt.title('Ranked Random Samples Location');\n```\n\n                             geometry\n    loc_id                           \n    0       POINT (-1.21727 43.26778)\n    1       POINT (-1.22102 43.27806)\n    2       POINT (-1.21712 43.27979)\n    3       POINT (-1.22145 43.29287)\n    4       POINT (-1.21036 43.30109)\n\n![](index_files/figure-commonmark/cell-12-output-2.png)\n\n#### Emulating measurement campaign\n\nThe data collector collects measurements at the random sampling\nlocations in the field. In our case, we emulate this process by\nextracting measurements from the provided raster file.\n\n“Measuring” variable of interest from a given raster:\n\n``` python\ndc_emulator = DataCollector(fname_raster)\nmeasurements_t0 = dc_emulator.collect(sample_locs_t0)\n\nprint(measurements_t0.head())\nax = measurements_t0.plot(column='value', s=2, legend=True)\ngdf_grid.boundary.plot(color=black, lw=0.5, ax=ax);\nplt.axis('off')\nplt.title('Measurements at Random Sampling Points');\n```\n\n                             geometry     value\n    loc_id                                     \n    0       POINT (-1.21727 43.26778)  0.137188\n    1       POINT (-1.22102 43.27806)  0.151005\n    2       POINT (-1.21712 43.27979)  0.164272\n    3       POINT (-1.22145 43.29287)  0.181001\n    4       POINT (-1.21036 43.30109)  0.168969\n\n![](index_files/figure-commonmark/cell-13-output-2.png)\n\nThis marks the **end of our initial measurement efforts**, based on our\nprior knowledge of the phenomenon. **Going forward, we can use the\nadditional insights gained during this phase** to enhance our future\nmeasurements.\n\n### Round II: Optimize sampling with additional insights at $t_1$\n\nFor each administrative unit, we now have additional knowledge acquired\nduring the previous campaign, in addition to our prior knowledge. **In\nthe current round**, the **optimization** of the sampling will be\n**carried out based on** the **maximum**, **minimum**, **standard\nDeviation**, **number of measurements** already conducted, our **prior\nknowledge**, and an estimate of the **presence of spatial trends** or\nspatial correlations (Moran’s I).\n\n\u003e [!TIP]\n\u003e\n\u003e It’s worth noting that you can use any quantitative or qualitative\n\u003e secondary geographical information as a variable in the state, such as\n\u003e population, whether any previous remediation actions have taken place,\n\u003e the economic impact of the contamination, and so on.\n\n#### Getting administrative units new state\n\n``` python\nstate = State(measurements_t0, gdf_grid, cbs=[\n    MaxCB(), MinCB(), StdCB(), CountCB(), MoranICB(k=5), PriorCB(fname_raster)])\n```\n\n``` python\nstate().head()\n```\n\n\u003cdiv\u003e\n\n\u003cdiv\u003e\n\u003cstyle scoped\u003e\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\u0026#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\u0026#10;    .dataframe thead th {\n        text-align: right;\n    }\n\u003c/style\u003e\n\n|        | Max      | Min      | Standard Deviation | Count | Moran.I | Prior    |\n|--------|----------|----------|--------------------|-------|---------|----------|\n| loc_id |          |          |                    |       |         |          |\n| 0      | 0.137188 | 0.137188 | 0.0                | 1     | NaN     | 0.102492 |\n| 1      | 0.151005 | 0.151005 | 0.0                | 1     | NaN     | 0.125727 |\n| 2      | 0.164272 | 0.164272 | 0.0                | 1     | NaN     | 0.161802 |\n| 3      | 0.181001 | 0.181001 | 0.0                | 1     | NaN     | 0.184432 |\n| 4      | 0.168969 | 0.168969 | 0.0                | 1     | NaN     | 0.201405 |\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003e [!TIP]\n\u003e\n\u003e The **Moran’s I index** is a statistical method used to determine if\n\u003e there is a **spatial correlation/trend** within each area of interest.\n\u003e For example, a **random field** would have a **Moran’s I index close\n\u003e to 0**, while a clear **gradient of low to high values**, such as from\n\u003e south to north, would be characterized by a **Moran’s I index close to\n\u003e 1**.\n\n#### Finding optimal number of samples to be collected\n\n1.  We first decide if each variable of the State are to maximize\n    (**benefit**) or minimize (**cost**):\n\n``` python\nbenefit_criteria = [True, True, True, False, False, True]\n```\n\n2.  Then assign an **importance weight** to each of the variable of the\n    [`State`](https://franckalbinet.github.io/trufl/callbacks.html#state)\n    (`Min`, `Max`, …):\n\n``` python\noptimizer = Optimizer(state=state())\ndf_rank = optimizer.get_rank(is_benefit_x=benefit_criteria, \n                             w_vector = [0.2, 0.1, 0.1, 0.2, 0.2, 0.2],  \n                             n_method=\"LINEAR1\", c_method=None, w_method=None, s_method=\"CP\")\n```\n\n``` python\ngdf_grid.join(df_rank, how='left').plot(column='rank',\n                                        cmap='viridis_r', \n                                        legend_kwds={'label': 'Rank'}, \n                                        legend=True)\nplt.axis('off')\nplt.title('Sampling Priorirty Rank');\n```\n\n![](index_files/figure-commonmark/cell-18-output-1.png)\n\n``` python\ndf_rank.head()\n```\n\n\u003cdiv\u003e\n\n\u003cdiv\u003e\n\u003cstyle scoped\u003e\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\u0026#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\u0026#10;    .dataframe thead th {\n        text-align: right;\n    }\n\u003c/style\u003e\n\n|        | rank |\n|--------|------|\n| loc_id |      |\n| 26     | 1    |\n| 73     | 2    |\n| 27     | 3    |\n| 78     | 4    |\n| 24     | 5    |\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\nBased on this rank we can again: 1. based on the **ranks (sampling\npriority)** and given sampling **budget**, calculate the number of\nsamples to be collected for each administrative unit and carry out\n**random sampling**; 2. **perform** the random **sampling**; 3. and\n**carry out** the **measurements**.\n\n#### Informed random sampling\n\n``` python\nbudget_t1 = 400\nn = rank_to_sample(df_rank['rank'].sort_index().values, \n                   budget=budget_t1, min=1, policy=\"quantiles\"); n\n```\n\n    array([1, 1, 1, 1, 1, 1, 3, 4, 3, 1, 1, 1, 1, 1, 1, 1, 3, 1, 3, 8, 1, 1,\n           1, 1, 8, 8, 8, 8, 4, 8, 1, 1, 1, 3, 8, 4, 4, 4, 8, 3, 1, 1, 3, 4,\n           4, 4, 4, 4, 4, 3, 1, 3, 4, 4, 8, 3, 3, 4, 8, 8, 1, 8, 3, 3, 4, 4,\n           4, 3, 4, 8, 8, 8, 8, 8, 4, 3, 4, 8, 8, 8, 8, 4, 8, 8, 4, 3, 3, 3,\n           3, 3, 8, 4, 3, 3, 4, 4, 3, 8, 3, 3])\n\n``` python\nsampler = Sampler(gdf_grid)\nsample_locs_t1 = sampler.sample(n, method='uniform')\n\nax = sample_locs_t1.plot(markersize=2, color=red)\ngdf_grid.boundary.plot(color=black, lw=0.5, ax=ax)\nplt.axis('off')\nplt.title('Ranked Random Samples Location');\n```\n\n![](index_files/figure-commonmark/cell-21-output-1.png)\n\n#### Second measurement campaign\n\n``` python\ndc_emulator = DataCollector(fname_raster)\nmeasurements_t1 = dc_emulator.collect(sample_locs_t1)\n\nax = measurements_t1.plot(column='value', s=2, legend=True)\ngdf_grid.boundary.plot(color=black, lw=0.5, ax=ax);\nplt.axis('off')\nplt.title('Measurements at Random Sampling Points');\n```\n\n![](index_files/figure-commonmark/cell-22-output-1.png)\n\n``` python\nmeasurements_sofar = pd.concat([measurements_t0, measurements_t1])\n\nax = measurements_sofar.plot(column='value', s=2, legend=True)\ngdf_grid.boundary.plot(color=black, lw=0.5, ax=ax);\nplt.axis('off')\nplt.title('Measurements after \\n 2 informed measurement campaigns');\n```\n\n![](index_files/figure-commonmark/cell-23-output-1.png)\n\n## Delving deeper into the optimization process\n\n### Determine the ranking of the administrative polygons\n\nThe **ranking** is based on the importance of increasing sampling in\neach polygon. A multi-criteria decision-making methodology is used to\nrank the polygons from most important to least important, with **lower\nranks indicating a higher priority for sampling**.\n\n#### Criteria\n\nThe state of the polygons will be used as criteria to determine the\nrank:\n\n| Criteria             | State variable | Criteria Type |     |\n|----------------------|----------------|---------------|-----|\n| Estimated value      | PriorCB()      | Benefit       |     |\n| Maximum sample value | MaxCB()        | Benefit       |     |\n| Minimal sample value | MinCB()        | Benefit       |     |\n| Sample count         | CountCB()      | Cost          |     |\n| Standard deviation   | StdCB()        | Benefit       |     |\n| Moran I index        | MoranICB(k=5)  | Cost          |     |\n\n#### Criteria type\n\nCriteria can be of the type benefit or cost:\n\n- Benefit: **high values** equal **high importance** to sample more;\n- Cost: **low value** equal **high importance** to sample more).\n\n#### Weights\n\nA **weight vector** is used to determine the **importance of criteria**\nin comparison with each other.\n\n#### MCDM techniques\n\n- **CP** (Compromise programming):\n  - Distance based measure, where the distance to the optmal point is\n    used, where low values relate to good alternatives.\n- **TOPSIS** (Technique for Order Preference by Similarity to Ideal\n  Solution):\n  - Distance-based measure, where the closeness to the optimal and\n    anti-optimal points is assessed (with higher values indicating\n    better alternatives).\n\n#### Rank\n\nBased on the MCDM value a ranking of the polygons is created:\n\n\u003e [!TIP]\n\u003e\n\u003e Start with using equal weights for all the criteria, later you will\n\u003e explore the impact of changing the weight vector. Make sure the sum of\n\u003e the weight vector is 1.\n\nRanking of administrative units based on three criteria:\n\n``` python\nbenefit_criteria = [True, True, True]\nstate = State(measurements_sofar, gdf_grid, cbs=[MaxCB(), MinCB(), StdCB()])\nweight_vector = [0.3, 0.3, 0.4]\n\noptimizer = Optimizer(state=state())\ndf = optimizer.get_rank(is_benefit_x=benefit_criteria, w_vector = weight_vector,  \n                    n_method=\"LINEAR1\", c_method = None, w_method=None, s_method=\"CP\")\n\ndf.head()\n```\n\n\u003cdiv\u003e\n\n\u003cdiv\u003e\n\u003cstyle scoped\u003e\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\u0026#10;    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\u0026#10;    .dataframe thead th {\n        text-align: right;\n    }\n\u003c/style\u003e\n\n|        | rank |\n|--------|------|\n| loc_id |      |\n| 71     | 1    |\n| 72     | 2    |\n| 69     | 3    |\n| 59     | 4    |\n| 38     | 5    |\n\n\u003c/div\u003e\n\n\u003c/div\u003e\n\nBased on the ranking of the administrative units, an optimized sampling\nstrategy for $t_1$ can be determined.\n\n``` python\ncombined_df = pd.merge(df, gdf_grid[['geometry']], left_index=True, right_index=True, how='inner')\ncombined_gdf = gpd.GeoDataFrame(combined_df)\n\nfig, ax = plt.subplots(1, 1, figsize=(10, 8))\ncax = combined_gdf.plot(column='rank', cmap='Reds_r', legend=True, ax=ax)\nmeasurements_sofar.plot(column='value', ax=ax, cmap='viridis', s=1.5, legend=True)\n\ncbar = cax.get_figure().get_axes()[1]\ncbar.invert_yaxis()\n\nrank_legend = mlines.Line2D([], [], color='Red', marker='o', linestyle='None',\n                            markersize=10, label='High Rank')\nvalue_legend = mlines.Line2D([], [], color='Yellow', marker='o', linestyle='None',\n                             markersize=10, label='High prior value')\n\nax.legend(handles=[rank_legend, value_legend], loc='upper left', bbox_to_anchor=(1.5, 1.25))\nplt.show()\n```\n\n![](index_files/figure-commonmark/cell-25-output-1.png)\n\n### Multi-year Adaptive sampling approach\n\n- Sampling in year 0 will done based on the prior;\n- Sampling in year t will be done based on 6 state variables:\n  - \\[Max value, Min value, Standard deviation, sample count, Moran I,\n    Prior value\\]\n  - \\[0.2, 0.1, 0.1, 0.2, 0.2, 0.2\\]\n- Sampling policy will be based on the point budget and the quantile in\n  which the unit ranks:\n  - 1st: 50 % of point budget\n  - 2nd: 30% of point budget\n  - 3th: 20% of point budget\n  - 4th: no extra sample points\n\n``` python\nnumber_of_years = 4\nyearly_sample_budget = 150\n```\n\n``` python\nfig, axs = plt.subplots(1, number_of_years, figsize=(12, 8))  # Adjust figsize as needed\naxs = axs.flatten()\n\nsampler = Sampler(gdf_grid)\ndc_emulator = DataCollector(fname_raster)\n\n# Samples\nsamples_t_0 = gpd.GeoDataFrame(index=pd.Index([], name='loc_id'), geometry=None, data={'value': None})\nsamples_t = []\n\nstate = State(samples_t_0, gdf_grid, cbs=[PriorCB(fname_raster)])\n\n# You have to call the instance\nstate_t0 = state()\n\nbenefit_criteria = [True]\noptimizer = Optimizer(state=state_t0)\ndf_rank = optimizer.get_rank(is_benefit_x=benefit_criteria, w_vector = [1],  \n                             n_method=None, c_method = None, \n                             w_method=None, s_method=\"CP\")\n\ncombined_df = pd.merge(df, gdf_grid[['geometry']], left_index=True, right_index=True, how='inner')\ncombined_gdf = gpd.GeoDataFrame(combined_df)\ncombined_gdf.plot(column='rank',cmap='Reds_r', legend_kwds={'label': 'Rank'}, ax = axs[0])\n\nfor fig_n, ax in zip(range(1, number_of_years+1), axs[1:]):\n    n = rank_to_sample(combined_gdf['rank'].sort_index().values, \n                    budget=yearly_sample_budget, min=1, policy=\"quantiles\")\n    sample_locs_t = sampler.sample(n, method='uniform')\n    samples = dc_emulator.collect(sample_locs_t)\n    try:\n        samples_t = pd.concat([samples_t, samples])\n    except:\n        samples_t = pd.concat([samples])\n    \n    # plot points versus rank of polygon\n    ax = combined_gdf.plot(column='rank', cmap='Reds_r', ax=ax)\n    samples_t.plot(column='value', ax=ax, cmap='viridis', s=1)\n    ax.title.set_text(f\"Year {fig_n} (number of samples: {len(samples_t)})\")\n    \n    # new state\n    state = State(samples_t, gdf_grid, cbs=[\n        MaxCB(), MinCB(), StdCB(), CountCB(), MoranICB(k=5), PriorCB(fname_raster)])\n    \n    optimizer = Optimizer(state=state())\n\n    # 2. rank polygons\n    benefit_criteria = [True, True, True, False, False, True]\n    df = optimizer.get_rank(is_benefit_x=benefit_criteria, w_vector = [0.2, 0.1, 0.1, 0.2, 0.2, 0.2],  n_method=\"LINEAR1\", c_method = None, w_method=None, s_method=\"CP\")\n\n    # 3. map ranking\n    combined_df = pd.merge(df, gdf_grid[['geometry']], left_index=True, right_index=True, how='inner')\n    combined_gdf = gpd.GeoDataFrame(combined_df)\n\nplt.tight_layout()\nplt.show()\n```\n\n![](index_files/figure-commonmark/cell-27-output-1.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffranckalbinet%2Ftrufl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffranckalbinet%2Ftrufl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffranckalbinet%2Ftrufl/lists"}