{"id":32558664,"url":"https://github.com/scikit-mobility/deepgravity","last_synced_at":"2026-03-14T05:36:16.137Z","repository":{"id":46125424,"uuid":"379271033","full_name":"scikit-mobility/DeepGravity","owner":"scikit-mobility","description":"a PyTorch implementation of the paper \"Deep Gravity: enhancing mobility flows generation with deep neural networks and geographic information\"","archived":false,"fork":false,"pushed_at":"2021-12-29T18:30:14.000Z","size":16883,"stargazers_count":104,"open_issues_count":9,"forks_count":38,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-09-05T03:46:40.735Z","etag":null,"topics":["deep-learning","flow","flow-generator","human-mobility","mobility","mobility-model","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scikit-mobility.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-06-22T13:00:53.000Z","updated_at":"2025-08-08T13:53:11.000Z","dependencies_parsed_at":"2022-07-26T08:17:00.592Z","dependency_job_id":null,"html_url":"https://github.com/scikit-mobility/DeepGravity","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/scikit-mobility/DeepGravity","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scikit-mobility%2FDeepGravity","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scikit-mobility%2FDeepGravity/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scikit-mobility%2FDeepGravity/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scikit-mobility%2FDeepGravity/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scikit-mobility","download_url":"https://codeload.github.com/scikit-mobility/DeepGravity/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scikit-mobility%2FDeepGravity/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281533433,"owners_count":26517827,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-28T02:00:06.022Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","flow","flow-generator","human-mobility","mobility","mobility-model","pytorch"],"created_at":"2025-10-28T23:57:06.548Z","updated_at":"2025-10-28T23:58:06.698Z","avatar_url":"https://github.com/scikit-mobility.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A Deep Gravity model for mobility flows generation\n\n## Table of contents\n1. [Citing](#citing)\n2. [Abstract](#abstract)\n3. [Architecture of Deep Gravity](#architecture)\n4. [Running Deep Gravity](#running)\n\t  - [Setup](#setup)\n\t  - [Experiments](#experiments)\n    - [Plot the results](#plot)\n    - [Additional data](#additional_data)\n\n\n\u003ca id='citing'\u003e\u003c/a\u003e\n## Citing\n\nIf you use the code in this repository, please cite our paper:\n\nF. Simini, G. Barlacchi, M. Luca, L. Pappalardo, *A Deep Gravity model for mobility flows generation*, Nature Communications 12, 6576 (2021). https://doi.org/10.1038/s41467-021-26752-4\n\n```\n@article{Simini2021,\nauthor = {Simini, Filippo and Barlacchi, Gianni and Luca, Massimilano and Pappalardo, Luca},\ndoi = {10.1038/s41467-021-26752-4},\nissn = {2041-1723},\njournal = {Nature Communications},\nnumber = {1},\npages = {6576},\ntitle = {{A Deep Gravity model for mobility flows generation}},\nurl = {https://doi.org/10.1038/s41467-021-26752-4},\nvolume = {12},\nyear = {2021}}\n```\n\nand the official code repository: [![DOI](https://zenodo.org/badge/379271033.svg)](https://zenodo.org/badge/latestdoi/379271033)\n\n\u003ca id='abstract'\u003e\u003c/a\u003e\n## Abstract\nThe movements of individuals within and among cities influence critical aspects of our society, such as well-being, the spreading of epidemics, and the quality of the environment. When information about mobility flows is not available for a particular region of interest, we must rely on mathematical models to generate them. We propose Deep Gravity, an effective model to generate flow probabilities that exploits many features (e.g., land use, road network, transport, food, health facilities) extracted from voluntary geographic data, and uses deep neural networks to discover non-linear relationships between those features and mobility flows. Our experiments, conducted on mobility flows in England, Italy, and New York State, show that Deep Gravity achieves a significant increase in performance, especially in densely populated regions of interest, with respect to the classic gravity model and models that do not use deep neural networks or geographic data. Deep Gravity has good generalization capability, generating realistic flows also for geographic areas for which there is no data availability for training. Finally, we show how flows generated by Deep Gravity may be explained in terms of the geographic features and highlight crucial differences among the three considered countries interpreting the model’s prediction with explainable AI techniques.\n\n![Performances of DG vs G in an highly populated area in England](https://github.com/scikit-mobility/DeepGravity/blob/master/imgs/plot.png?raw=true)\n_Figure 1. Performances in terms of Common Part of Commuters (CPC) of Deep Gravity (DG) vs the gravity model (G) in an highly populated area in England_\n\n\n\u003ca id='architecture'\u003e\u003c/a\u003e\n## Architecture of Deep Gravity\nTo generate the flows from a given origin location (e.g., \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_i\"\u003e), Deep Gravity uses a number of input features to compute the probability \u003cimg src=\"https://render.githubusercontent.com/render/math?math=p_{i,j}\"\u003e that any of the \u003cimg src=\"https://render.githubusercontent.com/render/math?math=n\"\u003e locations in the region of interest (e.g., \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_j\"\u003e) is the destination of a trip from \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_i\"\u003e. Specifically, the model output is a n-dimensional vector of probabilities \u003cimg src=\"https://render.githubusercontent.com/render/math?math=p_{i,j}\"\u003e for \u003cimg src=\"https://render.githubusercontent.com/render/math?math=j = 1, ..., n\"\u003e. These probabilities are computed in three steps (see figure below).\n\n![Architecture of Deep Gravity](https://github.com/scikit-mobility/DeepGravity/blob/master/imgs/architecture.png?raw=true)\n_Figure 2. Architecture of Deep Gravity_\n\n1. The input vectors \u003cimg src=\"https://render.githubusercontent.com/render/math?math=x(l_i, l_j) = concat[x_i, x_j, r_{i,j}]\"\u003e for \u003cimg src=\"https://render.githubusercontent.com/render/math?math=j =1, \\dots, n\"\u003e are obtained performing a concatenation of the following input features: \u003cimg src=\"https://render.githubusercontent.com/render/math?math=x_i\"\u003e, the feature vector of the origin location \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_i\"\u003e; \u003cimg src=\"https://render.githubusercontent.com/render/math?math=x_j\"\u003e the feature vector of the destination location \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_j\"\u003e; and the distance between origin and destination \u003cimg src=\"https://render.githubusercontent.com/render/math?math=r_{i, j}\"\u003e. \nFor each origin location (e.g. \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_i\"\u003e), \u003cimg src=\"https://render.githubusercontent.com/render/math?math=n\"\u003e input vectors \u003cimg src=\"https://render.githubusercontent.com/render/math?math=x(l_i, l_j)\"\u003e with \u003cimg src=\"https://render.githubusercontent.com/render/math?math=j = 1, \\dots, n\"\u003e are created, one for each location in the region of interest that could be a potential destination. \n\n2. The input vectors \u003cimg src=\"https://render.githubusercontent.com/render/math?math=x(l_i, l_j)\"\u003e are fed in parallel to the same feed-forward neural network. The network has 15 hidden layers of dimensions 256 (the bottom six layers) and 128 (the other layers) with LeakyReLu activation function, \u003cimg src=\"https://render.githubusercontent.com/render/math?math=a\"\u003e. Specifically, the output of hidden layer \u003cimg src=\"https://render.githubusercontent.com/render/math?math=h\"\u003e is given by the vector \u003cimg src=\"https://render.githubusercontent.com/render/math?math=z^{(0)}(l_i, l_j) = a(W^{(0)} \\cdot x(l_i, l_j))\"\u003e for the first layer (\u003cimg src=\"https://render.githubusercontent.com/render/math?math=h=0\"\u003e) and \u003cimg src=\"https://render.githubusercontent.com/render/math?math=z^{(h)}(l_i, l_j) = a(W^{(h)} \\cdot z^{(h - 1)}(l_i, l_j))\"\u003e for \u003cimg src=\"https://render.githubusercontent.com/render/math?math=h\u003e0\"\u003e, where \u003cimg src=\"https://render.githubusercontent.com/render/math?math=W\"\u003e are matrices whose entries are parameters learned during training. \n\n3. The output of the last layer is a scalar \u003cimg src=\"https://render.githubusercontent.com/render/math?math=s(l_i, l_j) \\in[-\\infty, +\\infty]\"\u003e called score: the higher the score for a pair of locations \u003cimg src=\"https://render.githubusercontent.com/render/math?math=(l_i, l_j)\"\u003e, the higher the probability to observe a trip from \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_i\"\u003e to \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_j\"\u003e according to the model. Finally, the scores are transformed into probabilities using a softmax function, \u003cimg src=\"https://render.githubusercontent.com/render/math?math=p_{i,j} = e^{s(l_i, l_j)} / \\sum_{k} e^{s(l_i, l_k)}\"\u003e, which transforms all scores into positive numbers that sum up to one. The generated flow between two locations is then obtained by multiplying the probability (i.e., the model's output) and the origin's total outflow.\n\nThe location feature vector \u003cimg src=\"https://render.githubusercontent.com/render/math?math=x_i\"\u003e provides a spatial representation of an area, and it contains features describing some properties of location \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_i\"\u003e, e.g., the total length of residential roads or the number of restaurants therein. Its dimension, \u003cimg src=\"https://render.githubusercontent.com/render/math?math=d\"\u003e, is equal to the total number of features considered. The location features we use include the population size of each location and geographical features extracted from OpenStreetMap belonging to the following categories:\n\n- Land use areas (5 features): total area (in squared km) for each possible land use class, i.e., residential, commercial, industrial, retail and natural;\n- Road network (3 features): total length (in km) for each different types of roads, i.e., residential, main and other; \n- Transport facilities (2 features): total count of Points Of Interest (POIs) and  buildings related to each possible transport facility, e.g., bus/train station, bus stop, car parking;\n- Food facilities (2 features): total count of POIs and  buildings related to food facilities, e.g., bar, cafe, restaurant;\n- Health facilities (2 features): total count of POIs and  buildings related to health facilities, e.g., clinic, hospital, pharmacy;\n- Education facilities (2 features): total count of POIs and  buildings related to education facilities, e.g., school, college, kindergarten; \n- Retail facilities (2 features): total count of POIs and  buildings related to retail facilities, e.g., supermarket, department store, mall.\n\nIn addition, Deep Gravity includes as feature the geographic distance, \u003cimg src=\"https://render.githubusercontent.com/render/math?math=r_{i, j}\"\u003e, between two locations \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_i\"\u003e and \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_j\"\u003e, which is defined as the distance measured along the surface of the earth between the centroids of the two polygons representing the locations. All values of features for a given location (excluding distance) are normalized dividing them by the location's area.\n\nEach flow in Deep Gravity is hence described by 39 features (18 geographic features of the origin and 18 of the destination, distance between origin and destination, and their populations). \n\nThe loss function of Deep Gravity is the cross-entropy: \n\n\u003cimg src=\"https://render.githubusercontent.com/render/math?math=H = - \\sum_{i} \\sum_j \\frac{y(l_i, l_j)}{O_i} \\ln p_{i,j}\"\u003e\n\nwhere \u003cimg src=\"https://render.githubusercontent.com/render/math?math=y(l_i, l_j) / O_i\"\u003e is the fraction of observed flows from \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_i\"\u003e that go to \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_j\"\u003e and \u003cimg src=\"https://render.githubusercontent.com/render/math?math=p_{i, j}\"\u003e is the model's probability of a unit flow from \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_i\"\u003e to \u003cimg src=\"https://render.githubusercontent.com/render/math?math=l_j\"\u003e. \nNote that the sum over \u003cimg src=\"https://render.githubusercontent.com/render/math?math=i\"\u003e of the cross-entropies of different origin locations follows from the assumption that flows from different locations are independent events, which allows us to apply the additive property of the cross-entropy for independent random variables. \n\nThe network is trained for 20 epochs with the RMSprop optimizer with momentum 0.9 and learning rate \u003cimg src=\"https://render.githubusercontent.com/render/math?math=5 \\cdot 10^{-6}\"\u003e using batches of size 64 origin locations. To reduce the training time, we use negative sampling and consider up to 512 randomly selected destinations for each origin location. \n\n\u003ca id='running'\u003e\u003c/a\u003e\n## Running Deep Gravity\n\n\u003ca id='setup'\u003e\u003c/a\u003e\n### Setup\nMake sure you have the following dependencies installed:\n\n- `pytorch 1.7.1`\n- `numpy 1.19.2`\n- `pandas 1.2.4`\n- `geopandas 0.9.0`\n- `scikit-mobility 1.1.0`\n- `area`\n\n\u003ca id='experiments'\u003e\u003c/a\u003e\n### Experiments\n\nOnce you installed all the packages correctly, you can run the experiments.\n\nWe expect to find some datasets in a path named `data/\u003ccountry_name\u003e` where country name is a parameter that can be passed to the model. In particular, we expect to find:\n\n- `tessellation.geojson` or `tessellation.shp`. The tessellation can also be generated by using the parameters `tessellation-area` and `tessellation-size` when the model is called.\n- `output_areas.geojson` or `output_areas.shp`. A file containing the location code and the geometry of the output areas. the column containing the location code can be specified using the parameter `oa-id-column` when calling the model.\n- `flows.csv` containing three columns indicating the origin, destination and the actual flow of people. The columns with the information can be called specifying the parameters `flow-origin-column`, `flow-destination-column` and `flow-flows-column`. Due to GitHub policy, the file containing the flows for the running example of New York have to be downloaded from [here](https://drive.google.com/file/d/1rLJz5E0igbrmAnmnDmazdBl97UuQ0sch/view?usp=sharing). Data are derived starting from the [GeoDS COVID-19 project](https://github.com/GeoDS/COVID19USFlows)\n- `features.csv` containing at least a column named like `oa-id-column` and a set of other columns representing the features of the model\n\nAn example of dataset collected in New York is already loaded in the repository and the following examples are based on that. Note that when main.py is launched for the first time, a set of additional files are generated in a folder called `processed`. These files should not be removed.\n\nThe model can be run with the following command:\n\n`python main.py --dataset new_york --oa-id-column GEOID --flow-origin-column geoid_o --flow-destination-column geoid_d --flow-flows-column pop_flows --epochs 1 --device cpu --mode train`\n\nyou can also include some parameters related to the model:\n\n- `batch-size` to specify the input batch size for training. Deafult is 1 \n- `test-batch-size` to specify the batch size at test time. Default is 1\n- `epochs` default is 10 \n- `lr` that is the learning rate. Default is 5e-6 \n- `momentum`  default is 0.9\n- `seed` \n- `device` can be `cpu` or `gpu` \n- `mode` that can be `train` or `test` \n\nThere are also some parameters related to the \n\nOnce your model is trained, you will find the results of the test phase in a file in the results directory. The file will be named `tile2cpc_\u003cmodel-type\u003e_\u003ccountry\u003e_\u003cno-round\u003e.csv`. In the same folder, you will also find the trained model named `model_\u003cmodel-type\u003e_\u003ccountry\u003e_\u003cno-round\u003e.pt`\n\n\u003ca id='plot'\u003e\u003c/a\u003e\n### Plot of the results\n\nOnce you have the results for all the four models in at least a country and at least for one no-round, you can reproduce Figure 3 and Table 1 of the paper using the notebook `plot_results.ipynb`\n\n\u003ca id='additional_data'\u003e\u003c/a\u003e\n### Additional Data\n\nThe datasets used in the experiments can be found at:\n- England\n  - [https://census.ukdataservice.ac.uk/use-data/guides/flow-data.aspx](https://census.ukdataservice.ac.uk/use-data/guides/flow-data.aspx)\n  - [https://census.ukdataservice.ac.uk/use-data/guides/boundary-data](https://census.ukdataservice.ac.uk/use-data/guides/boundary-data)\n- Italy\n  - [http://datiopen.istat.it/datasetPND.php](http://datiopen.istat.it/datasetPND.php)\n  - [https://www.istat.it/it/archivio/104317#accordions](https://www.istat.it/it/archivio/104317#accordions)\n- New York\n  - [https://github.com/GeoDS/COVID19USFlows](https://github.com/GeoDS/COVID19USFlows)\n  - [https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2020\u0026layergroup=Census+Tracts](https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2020\u0026layergroup=Census+Tracts)\n\nData related to POIs should be retrieved from appropriate services. Examples are Overpass API, HOTosm or - suggested - by downloading a local copy of the OSM database in a PostgreSQL instance and by running appropriate queries. The query we used to retrieved POIs information is available in `osm_query.yaml`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscikit-mobility%2Fdeepgravity","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscikit-mobility%2Fdeepgravity","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscikit-mobility%2Fdeepgravity/lists"}