{"id":22442008,"url":"https://github.com/mggg/maup","last_synced_at":"2025-05-07T08:21:37.485Z","repository":{"id":34309408,"uuid":"169437433","full_name":"mggg/maup","owner":"mggg","description":"The geospatial toolkit for redistricting data.","archived":false,"fork":false,"pushed_at":"2024-07-10T22:44:11.000Z","size":31003,"stargazers_count":68,"open_issues_count":11,"forks_count":24,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-03T05:52:53.250Z","etag":null,"topics":["gis","python","redistricting","shapefile"],"latest_commit_sha":null,"homepage":"https://maup.readthedocs.io/en/latest/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mggg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-02-06T16:29:11.000Z","updated_at":"2025-04-08T19:16:07.000Z","dependencies_parsed_at":"2023-10-28T08:27:24.144Z","dependency_job_id":"3029467f-2dc4-4596-a0fb-810377ee5adf","html_url":"https://github.com/mggg/maup","commit_stats":{"total_commits":94,"total_committers":7,"mean_commits":"13.428571428571429","dds":"0.37234042553191493","last_synced_commit":"4f2830fdd3f0d91c7b104ecf6c0798aafaa13b03"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mggg%2Fmaup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mggg%2Fmaup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mggg%2Fmaup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mggg%2Fmaup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mggg","download_url":"https://codeload.github.com/mggg/maup/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252839685,"owners_count":21812149,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gis","python","redistricting","shapefile"],"created_at":"2024-12-06T02:17:18.408Z","updated_at":"2025-05-07T08:21:37.449Z","avatar_url":"https://github.com/mggg.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# maup\n\n[![maup tests](https://github.com/mggg/maup/actions/workflows/tests.yaml/badge.svg)](https://github.com/mggg/maup/actions/workflows/tests.yaml)\n[![codecov](https://codecov.io/gh/mggg/maup/branch/master/graph/badge.svg)](https://codecov.io/gh/mggg/maup)\n[![PyPI](https://img.shields.io/pypi/v/maup.svg?color=%23)](https://pypi.org/project/maup/)\n\n`maup` is the geospatial toolkit for redistricting data. The package streamlines\nthe basic workflows that arise when working with blocks, precincts, and\ndistricts, such as\n\n-   [Assigning precincts to districts](#assigning-precincts-to-districts),\n-   [Aggregating block data to precincts](#aggregating-block-data-to-precincts),\n-   [Disaggregating data from precincts down to blocks](#disaggregating-data-from-precincts-down-to-blocks),\n-   [Prorating data when units do not nest neatly](#prorating-data-when-units-do-not-nest-neatly),\n    and\n-   [Fixing topological issues, overlaps, and gaps](#fixing-topological-issues-overlaps-and-gaps)\n\nThe project's priorities are to be efficient by using spatial indices whenever\npossible and to integrate well with the existing ecosystem around\n[pandas](https://pandas.pydata.org/), [geopandas](https://geopandas.org) and\n[shapely](https://shapely.readthedocs.io/en/latest/). The package is distributed\nunder the MIT License.\n\n## Installation\n\nTo install `maup` from PyPI, run `pip install maup` from your terminal.\n\nFor development, `maup` uses [Poetry](https://python-poetry.org/docs/basic-usage/).\nTo develop new `maup` features, clone this repository and run `poetry install`.\n\n## Examples\n\nHere are some basic situations where you might find `maup` helpful. For these\nexamples, we use test data from Providence, Rhode Island, which you can find in\nour\n[Rhode Island shapefiles repo](https://github.com/mggg-states/RI-shapefiles), or\nin the `examples` folder of this repo, reprojected to a non-geographic coordinate\nreference system (CRS) optimized\nfor Rhode Island.\n\n** Many of maup's functions behave badly in geographic projections (i.e., lat/long \ncoordinates), which are the default for shapefiles from the U.S. Census bureau. In \norder to find an appropriate CRS for a particular shapefile, consult the database\nat [https://epsg.org](https://epsg.org). **\n\n\n```python\n\u003e\u003e\u003e import geopandas\n\u003e\u003e\u003e import pandas\n\u003e\u003e\u003e\n\u003e\u003e\u003e blocks = geopandas.read_file(\"zip://./examples/blocks.zip\").to_crs(32030)\n\u003e\u003e\u003e precincts = geopandas.read_file(\"zip://./examples/precincts.zip\").to_crs(32030)\n\u003e\u003e\u003e districts = geopandas.read_file(\"zip://./examples/districts.zip\").to_crs(32030)\n\n```\n\n## Assigning precincts to districts\n\nThe `assign` function in `maup` takes two sets of geometries called `sources`\nand `targets` and returns a pandas `Series`. The Series maps each geometry in\n`sources` to the geometry in `targets` that covers it. (Here, geometry _A_\n_covers_ geometry _B_ if every point of _A_ and its boundary lies in _B_ or its\nboundary.) If a source geometry is not covered by one single target geometry, it\nis assigned to the target geometry that covers the largest portion of its area.\n\n```python\n\u003e\u003e\u003e import maup\n\u003e\u003e\u003e\n\u003e\u003e\u003e precinct_to_district_assignment = maup.assign(precincts, districts)\n\u003e\u003e\u003e # Add the assigned districts as a column of the `precincts` GeoDataFrame:\n\u003e\u003e\u003e precincts[\"DISTRICT\"] = precinct_to_district_assignment\n\u003e\u003e\u003e precinct_to_district_assignment.head()\n0     7\n1     5\n2    13\n3     6\n4     1\ndtype: int64\n\n```\n\nAs an aside, you can use that `precinct_to_district_assignment` object to create a\n[gerrychain](https://gerrychain.readthedocs.io/en/latest/) `Partition`\nrepresenting this districting plan.\n\n## Aggregating block data to precincts\n\nPrecinct shapefiles usually come with election data, but not demographic data.\nIn order to study their demographics, we need to aggregate demographic data from\ncensus blocks up to the precinct level. We can do this by assigning blocks to\nprecincts and then aggregating the data with a Pandas\n[`groupby`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html)\noperation:\n\n```python\n\u003e\u003e\u003e variables = [\"TOTPOP\", \"NH_BLACK\", \"NH_WHITE\"]\n\u003e\u003e\u003e\n\u003e\u003e\u003e blocks_to_precincts_assignment = maup.assign(blocks, precincts)\n\u003e\u003e\u003e precincts[variables] = blocks[variables].groupby(blocks_to_precincts_assignment).sum()\n\u003e\u003e\u003e precincts[variables].head()\n   TOTPOP  NH_BLACK  NH_WHITE\n0    5907       886       380\n1    5636       924      1301\n2    6549       584      4699\n3    6009       435      1053\n4    4962       156      3713\n\n```\n\nIf you want to move data from one set of geometries to another but your source\ngeometries do not nest cleanly into your target geometries, see\n[Prorating data when units do not nest neatly](#prorating-data-when-units-do-not-nest-neatly).\n\n## Disaggregating data from precincts down to blocks\n\nIt's common to have data at a coarser scale that you want to attach to\nfiner-scale geometries. For instance, this may happen when vote totals for a certain election are only reported at the county level, and we want to attach that data to precinct geometries.\n\nLet's say we want to prorate the vote totals in the columns `\"PRES16D\"`,\n`\"PRES16R\"` from our `precincts` GeoDataFrame down to our `blocks` GeoDataFrame.\nThe first crucial step is to decide how we want to distribute a precinct's data\nto the blocks within it. Since we're prorating election data, it makes sense to\nuse a block's total population or voting-age population. Here's how we might\nprorate by population (`\"TOTPOP\"`):\n\n```python\n\u003e\u003e\u003e election_columns = [\"PRES16D\", \"PRES16R\"]\n\u003e\u003e\u003e blocks_to_precincts_assignment = maup.assign(blocks, precincts)\n\u003e\u003e\u003e\n\u003e\u003e\u003e # We prorate the vote totals according to each block's share of the overall\n\u003e\u003e\u003e # precinct population:\n\u003e\u003e\u003e weights = blocks.TOTPOP / blocks_to_precincts_assignment.map(blocks.TOTPOP.groupby(blocks_to_precincts_assignment).sum())\n\u003e\u003e\u003e prorated = maup.prorate(blocks_to_precincts_assignment, precincts[election_columns], weights)\n\u003e\u003e\u003e\n\u003e\u003e\u003e # Add the prorated vote totals as columns on the `blocks` GeoDataFrame:\n\u003e\u003e\u003e blocks[election_columns] = prorated\n\u003e\u003e\u003e\n\u003e\u003e\u003e # We'll call .round(2) to round the values for display purposes, but note that the \n\u003e\u003e\u003e # actual values should NOT be rounded in order to avoid accumulation of rounding\n\u003e\u003e\u003e # errors.\n\u003e\u003e\u003e blocks[election_columns].round(2).head()\n   PRES16D  PRES16R\n0     0.00     0.00\n1    12.26     1.70\n2    15.20     2.62\n3    15.50     2.67\n4     3.28     0.45\n\n```\n\n#### Warning about areal interpolation\n\n**We strongly urge you _not_ to prorate by area!** The area of a census block is\n**not** a good predictor of its population. In fact, the correlation goes in the\nother direction: larger census blocks are _less_ populous than smaller ones.\n\n#### Warnings about data anomalies\n\n(1) Many states contain Census blocks and precincts that have zero population. In the\nexample above, a zero-population precinct leads to division by zero in the \ndefinition of the weights, which results in NaN values for some entries.\n\nAlthough it is not strictly necessary to resolve this in the example above, sometimes\nthis creates issues down the line.  One option is to replace NaN values with zeros, \nusing\n\n```python\n\u003e\u003e\u003e weights = weights.fillna(0)\n```\n\n(2) In some cases, zero-population precincts may have a small nonzero number of recorded\nvotes in some elections. The procedure outlined above will lose these votes in the \nproration process due to the zero (or NaN) values for the weights corresponding to all\nthe blocks in those precincts. If it is crucial to keep vote totals perfectly accurate, \nthese votes will need to be assigned to the new units manually.\n\n## Prorating data when units do not nest neatly\n\nSuppose you have a shapefile of precincts with some election results data and\nyou want to join that data onto a different, more recent precincts shapefile.\nThe two sets of precincts will have overlaps, and will not nest neatly like the\nblocks and precincts did in the above examples. (Not that blocks and precincts\nalways nest neatly---in fact, they usually don't!)\n\nIn most cases, election data should be prorated from each old precinct to the new\nprecincts with weights proportional to the population of the intersections between\nthe old precinct and each new precinct.  The most straightforward way to accomplish \nthis is to first disaggregate the data from the old precincts to Census blocks as in the example above, and then reaggregate from blocks to the new precincts. \n\n```python\n\u003e\u003e\u003e old_precincts = precincts\n\u003e\u003e\u003e new_precincts = geopandas.read_file(\"zip://./examples/new_precincts.zip\").to_crs(32030)\n\u003e\u003e\u003e\n\u003e\u003e\u003e election_columns = [\"SEN18D\", \"SEN18R\"]\n\u003e\u003e\u003e\n\u003e\u003e\u003e blocks_to_old_precincts_assignment = maup.assign(blocks, old_precincts)\n\u003e\u003e\u003e blocks_to_new_precincts_assignment = maup.assign(blocks, new_precincts)\n\u003e\u003e\u003e\n\u003e\u003e\u003e # We prorate the vote totals according to each block's share of the overall\n\u003e\u003e\u003e # old precinct population:\n\u003e\u003e\u003e weights = blocks.TOTPOP / blocks_to_old_precincts_assignment.map(blocks.TOTPOP.groupby(blocks_to_old_precincts_assignment).sum()).fillna(0)\n\u003e\u003e\u003e prorated = maup.prorate(blocks_to_old_precincts_assignment, precincts[election_columns], weights)\n\u003e\u003e\u003e\n\u003e\u003e\u003e # Add the prorated vote totals as columns on the `blocks` GeoDataFrame:\n\u003e\u003e\u003e blocks[election_columns] = prorated\n\u003e\u003e\u003e\n\u003e\u003e\u003e new_precincts[election_columns] = blocks[election_columns].groupby(blocks_to_new_precincts_assignment).sum()\n\u003e\u003e\u003e new_precincts[election_columns].round(2).head()\n    SEN18D   SEN18R\n0   728.17    49.38\n1\t370.00\t  21.00\n2\t 97.00\t  17.00\n3\t 91.16\t   5.55\n4\t246.00\t  20.00\n```\n\nAs a sanity check, let's make sure that no votes were lost in either step.\nTotal votes in the old precincts, blocks, and new precincts:\n```python\n\u003e\u003e\u003e old_precincts[election_columns].sum()\nSEN18D    23401\nSEN18R     3302\ndtype: float64\n\u003e\u003e\u003e\n\u003e\u003e\u003e blocks[election_columns].sum()\nSEN18D    23401.0\nSEN18R     3302.0\ndtype: float64\n\u003e\u003e\u003e\n\u003e\u003e\u003e new_precincts[election_columns].sum()\nSEN18D    20565.656675\nSEN18R     2947.046857\ndtype: float64\n```\n\nOh no - what happened??? All votes were successfully disaggregated to blocks, but a\nsignificant percentage were lost when reaggregating to new precincts.\n\nIt turns out that when blocks were assigned to both old and new precincts, many blocks were not assigned to any precincts.  We can count how many blocks were unassigned in each case:\n\n```python\nprint(len(blocks))\nprint(blocks_to_old_precincts_assignment.isna().sum())\nprint(blocks_to_new_precincts_assignment.isna().sum())\n3014\n884\n1227\n```\n\nSo, out of 3,014 total Census blocks, 884 were not assigned to any old precinct and \n1,227 were not assigned to any new precinct.  If we plot the GeoDataFrames, we can see why:\n```python\n\u003e\u003e\u003e blocks.plot()\n```\n\n![Providence blocks](../_static/images/Providence_blocks_plot.png)\n\n```python\n\u003e\u003e\u003e old_precincts.plot()\n```\n\n![Providence old precincts](../_static/images/Providence_old_precincts_plot.png)\n\n```python\n\u003e\u003e\u003e new_precincts.plot()\n```\n\n![Providence new precincts](../_static/images/Providence_new_precincts_plot.png)\n\nThe boundaries of the regions covered by these shapefiles are substantially \ndifferent---and that doesn't even get into the possibility that the precinct shapefiles may have gaps between precinct polygons that some blocks may fall into.\n\nOnce we know to look for this issue, we can see that it affected the previous example \nas well:\n```python\n\u003e\u003e\u003e blocks[variables].sum()\nTOTPOP      178040\nNH_BLACK     23398\nNH_WHITE     66909\ndtype: int64\n\u003e\u003e\u003e\n\u003e\u003e\u003e precincts[variables].sum()\nTOTPOP      140332\nNH_BLACK     19345\nNH_WHITE     46667\ndtype: int64\n```\n\n#### Moral: Precinct shapefiles often have _terrible_ topological issues!\nThese issues should be diagnosed and repaired to the greatest extent possible before\nmoving data around between shapefiles; see\n[Fixing topological issues, overlaps, and gaps](#fixing-topological-issues-overlaps-and-gaps)\nbelow for details about how maup can help with this.\n\n\n## Progress bars\n\nFor long-running operations, the user might want to see a progress bar to\nestimate how much longer a task will take (and whether to abandon it altogether).\n\n`maup` provides an optional progress bar for this purpose. To temporarily activate\na progress bar for a certain operation, use `with maup.progress():`:\n\n```python\n\u003e\u003e\u003e with maup.progress():\n...     assignment = maup.assign(precincts, districts)\n...\n\n```\n\nTo turn on progress bars for all applicable operations (e.g. for an entire script),\nset `maup.progress.enabled = True`:\n\n```python\n\u003e\u003e\u003e maup.progress.enabled = True\n\u003e\u003e\u003e # Now a progress bar will display while this function runs:\n\u003e\u003e\u003e assignment = maup.assign(precincts, districts)\n\u003e\u003e\u003e # And this one too:\n\u003e\u003e\u003e pieces = maup.intersections(old_precincts, new_precincts, area_cutoff=0)\n\n```\n\n## Fixing topological issues, overlaps, and gaps\n\nPrecinct shapefiles are often created by stitching together collections of\nprecinct geometries sourced from different counties or different years. As a\nresult, the shapefile often has gaps or overlaps between precincts where the\ndifferent sources disagree about the boundaries.  (And by \"often,\" we mean \"for almost every shapefile that isn't produced by the U.S. Census Burueau.\") \nAs we saw in the examples above, these issues can pose problems when moving data between shapefiles.\n\nEven when working with a single shapefile, gaps and overlaps may cause problems if you are interested in working with the adjacency graph of the precincts. \nThis adjacency information is especially important when studying redistricting, because districts are almost always expected to be contiguous.\n\nBefore doing anything else, it is wise to understand the current status of a shapefile with regard to topological issues.  `maup` provides a `doctor` function to diagnose gaps, overlaps, and invalid geometries.  If a shapefile has none of these issues, `maup.doctor` returns a value of `True`; otherwise it returns `False` along with a brief summary of the problems that it found.\n\nThe blocks shapefile, like most shapefiles from the Census Bureau, is clean:\n```python\n\u003e\u003e\u003e maup.doctor(blocks)\nTrue\n```\n\nThe old precincts shapefile, however, has some minor issues:\n```python\n\u003e\u003e\u003e maup.doctor(old_precincts)\nThere are 2 overlaps.\nThere are 3 holes.\nFalse\n```\n\nAs of version 2.0.0, `maup` provides two repair functions with a variety of options for fixing these issues:  \n\n1. `quick_repair` is the new name for the `autorepair` function from version 1.x (and `autorepair` still works as a synonym).  This function makes fairly simplistic repairs to gaps and overlaps:\n    * Any polygon $Q$ created by the overlapping intersection of two geometries $P_1$ and $P_2$ is removed from both polygons and reassigned to the one with which it shares the greatest perimeter.\n    * Any polygon $Q$ representing a gap between geometries $P_1,\\ldots, P_n$ is assigned to the one with which it shares the greatest perimeter.\n\n    This function is probably sufficient when gaps and overlaps are all very small in area relative to the areas of the geometries, **AND** when the repaired file will only be used for operations like aggregating and prorating data.  But it should **NOT** be relied upon when it is important for the repaired file to accurately represent adjacency relations between neighboring geometries, such as when a precinct shapefile is used as a basis for creating districting plans with contiguous districts.  \n  \n    For instance, when a gap adjoins many geometries (which happens frequently along county boundaries in precinct shapefiles!), whichever geometry the gap is adjoined to becomes \"adjacent\" to **all** the other geometries adjoining the gap, which can lead to the creation of discontiguous districts in plans based on the repaired shapefile. \n\n2. `smart_repair` is a more sophisticated repair function designed to reproduce the \"true\" adjacency relations between geometries as accurately as possible.  In the case of gaps that adjoin several geometries, this is accomplished by an algorithm that divides the gap into pieces to be assigned to different geometries instead of assigning the entire gap to a single geometry.  \n\n   In addition to repairing gaps and overlaps, `smart_repair` includes two optional features:\n    * In many cases, the shapefile geometries are intended to nest cleanly into some larger units; e.g., in many states, precincts should nest cleanly into counties.  `smart_repair` allows the user to optionally specify a second shapefile---e.g., a shapefile of county boundaries within a state---and then performs the repair process so that the repaired geometries nest cleanly into the units in the second shapefile.\n    * Whether as a result of inaccurate boundaries in the original map or as an artifact of the repair algorithm, it may happen that some units share boundaries with very short perimeter but should actually be considered \"queen adjacent\"---i.e., intersecting at only a single point---rather than \"rook adjacent\"---i.e., intersecting along a boundary of positive length.  `smart_repair` includes an optional step in which all rook adjacencies of length below a user-specified parameter are converted to queen adjacencies.\n\n`smart_repair` can accept either a GeoSeries or GeoDataFrame as input, and the output type will be the same as the input type.  The input must be projected to a non-geographic coordinate reference system (CRS)---i.e., **not** lat/long coordinates---in order to have sufficient precision for the repair.  One option is to reproject a GeoDataFrame called `gdf` to a suitable UTM (Universal Transverse Mercator) projection via\n    \n```python\ngdf = gdf.to_crs(gdf.estimate_utm_crs())\n```\n\n\nAt a minimum, all overlaps will be repaired in the output. Optional arguments include:\n  * `snapped` (default value `True`): If `True`, all polygon vertices are snapped to a grid of size no more than $10^{-10}$ times the maximum of width/height of the entire shapefile extent. **HIGHLY RECOMMENDED**  to avoid topological exceptions due to rounding errors.\n  * `fill_gaps` (default value `True`): If `True`, all simply connected gaps with area less than `fill_gaps_threshold` times the largest area of all geometries adjoining the gap are filled.  Default threshold is $0.1$; setting `fill_gaps_threshold = None` will fill all simply connected gaps.\n  * `nest_within_regions` (default value `None`): If `nest_within_regions` is a secondary GeoSeries or GeoDataFrame of region boundaries (e.g., counties within a state) then the repair will be performed so that repaired geometries nest cleanly into the region boundaries; specifically, each repaired geometry will be contained in the region with which the original geometry has the largest area of intersection.  Note that the CRS for the region GeoSeries/GeoDataFrame must be the same as that for the primary input.\n  * `min_rook_length` (default value `None`): If `min_rook_length` is given a numerical value, all rook adjacencies with length below this value will be replaced with queen adjacencies.  Note that this is an absolute value and not a relative value, so make sure that the value provided is in the correct units with respect to the input GeoSeries/GeoDataFrame's CRS.\n        \n\n### Examples\n\n#### First, we'll use `shapely` and `geopandas` to create a GeoDataFrame of \"toy precincts\" from scratch. \n\n```python\nimport random\nimport geopandas\nimport maup\nfrom shapely.geometry import Polygon\n\nrandom.seed(2023) # For reproducibility\n\nppolys = []\nfor i in range(4):\n    for j in range(4):\n        poly = Polygon(\n            [(0.5*i + 0.1*k, 0.5*j + (random.random() - 0.5)/12) for k in range(6)] +\n            [(0.5*(i+1) + (random.random() - 0.5)/12, 0.5*j + 0.1*k) for k in range(1,6)] +\n            [(0.5*(i+1) - 0.1*k, 0.5*(j+1) + (random.random() - 0.5)/12) for k in range(1,6)] +\n            [(0.5*i + (random.random() - 0.5)/12, 0.5*(j+1) - 0.1*k) for k in range(1,5)]\n        )\n        ppolys.append(poly)\n        \ntoy_precincts_df = geopandas.GeoDataFrame(geometry = geopandas.GeoSeries(ppolys))\ntoy_precincts_df.plot(cmap = \"tab20\", alpha=0.7)\n```\n\n![toy_precincts](../_static/images/toy_precincts.png)\n\nCheck for gaps and overlaps:\n```python\n\u003e\u003e\u003e maup.doctor(old_precincts)\nThere are 28 overlaps.\nThere are 23 holes.\nFalse\n```\nAll the gaps between geometries in this example are below the default threshold, so a basic application of `smart_repair` will resolve all overlaps and fill all gaps:\n\n```python\ntoy_precincts_repaired_df = maup.smart_repair(toy_precincts_df)\ntoy_precincts_repaired_df.plot(cmap = \"tab20\", alpha=0.7)\n```\n\n![toy_precincts_repaired](../_static/images/toy_precincts_repaired.png)\n\nWe can check that the repair succeeded:\n```python\n\u003e\u003e\u003e maup.doctor(old_precincts)\nTrue\n```\n\nNow suppose that the precincts are intended to nest cleanly into the following \"toy counties:\"\n\n```python\ncpoly1 = Polygon([(0,0), (1,0), (1,1), (0,1)])\ncpoly2 = Polygon([(1,0), (2,0), (2,1), (1,1)])\ncpoly3 = Polygon([(0,1), (1,1), (1,2), (0,2)])\ncpoly4 = Polygon([(1,1), (2,1), (2,2), (1,2)])\n\ntoy_counties_df = geopandas.GeoDataFrame(geometry = geopandas.GeoSeries([cpoly1, cpoly2, cpoly3, cpoly4]))\n\ntoy_counties_df.plot(cmap='tab20')\n```\n![toy_counties](../_static/images/toy_counties.png)\n\nWe can perform a \"county-aware\" repair as follows:\n```python\ntoy_precincts_repaired_county_aware_df = maup.smart_repair(toy_precincts_df, nest_within_regions = toy_counties_df)\ntoy_precincts_repaired_county_aware_df.plot(cmap = \"tab20\", alpha=0.7)\n```\n![toy_precincts_repaired_county_aware](../_static/images/toy_precincts_repaired_county_aware.png)\n\nNext, suppose that we'd like to get rid of small rook adjacencies at corner points where 4 precincts meet.  We might reasonably estimate that these all have length less than $0.1$, so we can accomplish this as follows:\n```python\ntoy_precincts_repaired_county_aware_rook_to_queen_df = maup.smart_repair(toy_precincts_df, nest_within_regions = toy_counties_df, min_rook_length = 0.1)\ntoy_precincts_repaired_county_aware_rook_to_queen_df.plot(cmap = \"tab20\", alpha=0.7)\n```\n![toy_precincts_repaired_county_aware_rook_to_queen](../_static/images/toy_precincts_repaired_county_aware_rook_to_queen.png)\n\nThe difference is hard to see, so let's zoom in on gap between the 4 original precincts in the upper left-hand corner.\n\nOriginal precincts:\n\n![toy_precincts_corner](../_static/images/toy_precincts_corner.png)\n\nCounty-aware repair:\n\n![toy_precincts_corner_repaired](../_static/images/toy_precincts_corner_repaired.png)\n\nCounty-aware repair with rook adjacency converted to queen:\n\n![toy_precincts_corner_repaired_rook_to_queen](../_static/images/toy_precincts_corner_repaired_rook_to_queen.png)\n\n\n## Modifiable areal unit problem\n\nThe name of this package comes from the\n[modifiable areal unit problem (MAUP)](https://en.wikipedia.org/wiki/Modifiable_areal_unit_problem):\nthe same spatial data will look different depending on how you divide up the\nspace. Since `maup` is all about changing the way your data is aggregated and\npartitioned, we have named it after the MAUP to encourage users to use the\ntoolkit thoughtfully and responsibly.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmggg%2Fmaup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmggg%2Fmaup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmggg%2Fmaup/lists"}