{"id":13856950,"url":"https://github.com/scikit-mobility/scikit-mobility","last_synced_at":"2025-10-21T19:58:22.445Z","repository":{"id":38427693,"uuid":"184337448","full_name":"scikit-mobility/scikit-mobility","owner":"scikit-mobility","description":"scikit-mobility: mobility analysis in Python","archived":false,"fork":false,"pushed_at":"2024-05-25T09:25:41.000Z","size":36162,"stargazers_count":702,"open_issues_count":62,"forks_count":155,"subscribers_count":29,"default_branch":"master","last_synced_at":"2024-07-22T07:09:34.761Z","etag":null,"topics":["complex-systems","data-analysis","data-science","human-mobility","mobility-analysis","mobility-flows","network-science","risk-assessment","scikit-mobility","statistics","synthetic-flows"],"latest_commit_sha":null,"homepage":"https://scikit-mobility.github.io/scikit-mobility/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scikit-mobility.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-04-30T22:02:13.000Z","updated_at":"2024-07-19T11:11:42.000Z","dependencies_parsed_at":"2023-02-12T04:03:23.147Z","dependency_job_id":"b373b027-0e0e-4b8b-8fa9-06d2e262e9e1","html_url":"https://github.com/scikit-mobility/scikit-mobility","commit_stats":{"total_commits":712,"total_committers":25,"mean_commits":28.48,"dds":0.6811797752808989,"last_synced_commit":"9433d05a4cf7f42144e2e92279098740521493e2"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scikit-mobility%2Fscikit-mobility","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scikit-mobility%2Fscikit-mobility/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scikit-mobility%2Fscikit-mobility/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scikit-mobility%2Fscikit-mobility/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scikit-mobility","download_url":"https://codeload.github.com/scikit-mobility/scikit-mobility/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":213988293,"owners_count":15666958,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["complex-systems","data-analysis","data-science","human-mobility","mobility-analysis","mobility-flows","network-science","risk-assessment","scikit-mobility","statistics","synthetic-flows"],"created_at":"2024-08-05T03:01:19.977Z","updated_at":"2025-10-21T19:58:17.420Z","avatar_url":"https://github.com/scikit-mobility.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"[![DOI](https://zenodo.org/badge/184337448.svg)](https://zenodo.org/badge/latestdoi/184337448)\n![GitHub release (latest by date)](https://img.shields.io/github/v/release/scikit-mobility/scikit-mobility)\n![GitHub milestones](https://img.shields.io/github/milestones/open/scikit-mobility/scikit-mobility)\n![GitHub](https://img.shields.io/github/license/scikit-mobility/scikit-mobility)\n![GitHub contributors](https://img.shields.io/github/contributors/scikit-mobility/scikit-mobility)\n\n# scikit-mobility - mobility analysis in Python\n\n\u003cimg src=\"logo_skmob.png\" width=300/\u003e\n\n###### Try `scikit-mobility` without installing it\n[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/scikitmobility.svg?style=social\u0026label=Follow%20%40scikitmobility)](https://twitter.com/scikitmobility)\n\n- in a MyBinder notebook: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/scikit-mobility/scikit-mobility/master)\n- on [Jovian](https://jovian.ai/jonpappalord/collections/scikit-mobility-tutorial)\n\n\n`scikit-mobility` is a library for human mobility analysis in Python. The library allows to:\n\n- represent trajectories and mobility flows with proper data structures, `TrajDataFrame` and `FlowDataFrame`.\n\n- manage and manipulate mobility data of various formats (call detail records, GPS data, data from social media, survey data, etc.);\n\n- extract mobility metrics and patterns from data, both at individual and collective level (e.g., length of displacements, characteristic distance, origin-destination matrix, etc.)\n\n- generate synthetic individual trajectories using standard mathematical models (random walk models, exploration and preferential return model, etc.)\n\n- generate synthetic mobility flows using standard migration models (gravity model, radiation model, etc.)\n\n- assess the privacy risk associated with a mobility data set\n\n## Table of contents\n1. [Documentation](#documentation)\n2. [Citing](#citing)\n3. [Collaborate with us](#collaborate)\n4. [Installation](#installation)\n\t- [with pip](#installation_pip)\n\t- [with conda](#installation_conda)\n\t- [known issues](#known_conda)\n\t- [test installation](#test_installation)\n\t- [Google Colab](#google_colab)\n5. [Tutorials](#tutorials)\n6. [Examples](#examples)\n\t- [TrajDataFrame](#trajdataframe)\n\t- [FlowDataFrame](#flowdataframe)\n\t- [Preprocessing](#preprocessing)\n\t- [Measures](#measures)\n\t- [Collective generative models](#collective_models)\n\t- [Individual generative models](#individual_models)\n\t- [Privacy](#privacy)\n\t- [Downloading datasets](#data)\n\n\n\u003ca id='documentation'\u003e\u003c/a\u003e\n## Documentation\nThe documentation of scikit-mobility's classes and functions is available at: https://scikit-mobility.github.io/scikit-mobility/\n\n\u003ca id='citing'\u003e\u003c/a\u003e\n## Citing\n\nif you use scikit-mobility please cite the following paper:\n\nPappalardo, L., Simini, F., Barlacchi, G., \u0026 Pellungrini, R. (2022). scikit-mobility: A Python Library for the Analysis, Generation, and Risk Assessment of Mobility Data. Journal of Statistical Software, 103(1), 1–38. https://doi.org/10.18637/jss.v103.i04\n\nBibtex:\n```\n@article{JSSv103i04,\n title={scikit-mobility: A Python Library for the Analysis, Generation, and Risk Assessment of Mobility Data},\n volume={103},\n url={https://www.jstatsoft.org/index.php/jss/article/view/v103i04},\n doi={10.18637/jss.v103.i04},\n number={1},\n journal={Journal of Statistical Software},\n author={Pappalardo, Luca and Simini, Filippo and Barlacchi, Gianni and Pellungrini, Roberto},\n year={2022},\n pages={1–38}\n}\n```\n\n\u003ca id='collaborate'\u003e\u003c/a\u003e\n## Collaborate with us\n`scikit-mobility` is an active project and any contribution is welcome.\n\nIf you would like to include your algorithm in `scikit-mobility`, feel free to fork the project, open an issue and [contact us](mailto:scikit.mobility@gmail.com).\n\n\n\u003ca id='installation'\u003e\u003c/a\u003e\n## Installation\nscikit-mobility for Python \u003e= 3.8 and all it's dependencies are available from conda-forge and can be installed using\n`conda install -c conda-forge scikit-mobility`.\n\nNote that it is **NOT recommended** to install scikit-mobility from PyPI! If you're on Windows or Mac, many GeoPandas / scikit-mobility dependencies cannot be pip installed (for details see the corresponding notes in the GeoPandas documentation).\n\n\u003ca id='installation_pip'\u003e\u003c/a\u003e\n### installation with pip (python \u003e= 3.8 required)\n\n1. Create an environment `skmob`\n\n        python3 -m venv skmob\n\n2. Activate\n\n        source skmob/bin/activate\n\n3. Install skmob\n\n        pip install scikit-mobility\n\n4. OPTIONAL to use `scikit-mobility` on the jupyter notebook\n\n\t- Activate the virutalenv:\n\n\t\t\tsource skmob/bin/activate\n\n\t- Install jupyter notebook:\n\n\t\t\tpip install jupyter\n\n\t- Run jupyter notebook\n\n\t\t\tjupyter notebook\n\n\t- (Optional) install the kernel with a specific name\n\n\t\t\tipython kernel install --user --name=skmob\n\n\n\u003ca id='installation_conda'\u003e\u003c/a\u003e\n### installation with conda - miniconda\n\n1. Create an environment `skmob` and install pip\n\n        conda create -n skmob pip python=3.9 rtree\n\n2. Activate\n\n        conda activate skmob\n\n3. Install skmob\n\n        conda install -c conda-forge scikit-mobility\n\n4. OPTIONAL to use `scikit-mobility` on the jupyter notebook\n\n    - Install the kernel\n\n          conda install jupyter -c conda-forge\n\n    - Open a notebook and check if the kernel `skmob` is on the kernel list. If not, run the following:\n    \t- On Mac and Linux\n\n          \t  env=$(basename `echo $CONDA_PREFIX`)\n          \t  python -m ipykernel install --user --name \"$env\" --display-name \"Python [conda env:\"$env\"]\"\n\n       - On Windows\n\n             python -m ipykernel install --user --name skmob --display-name \"Python [conda env: skmob]\"\n\n:exclamation: You may run into dependency issues if you try to import the package in Python. If so, try installing the following packages as followed.\n\n```\nconda install -n skmob pyproj urllib3 chardet markupsafe\n```\n\n\u003ca id='test_installation'\u003e\u003c/a\u003e\n### Test the installation\n\n```\n\u003e source activate skmob\n(skmob)\u003e python\n\u003e\u003e\u003e import skmob\n\u003e\u003e\u003e\n```\n\n\u003ca id='google_colab'\u003e\u003c/a\u003e\n## Google Colab\nscikit-mobility can be installed on \u003ca href=\"https://colab.research.google.com/notebooks/intro.ipynb#recent=true\"\u003eGoogle Colab\u003c/a\u003e using the following commands:\n```\n!apt-get install -qq curl g++ make\n!curl -L http://download.osgeo.org/libspatialindex/spatialindex-src-1.8.5.tar.gz | tar xz\nimport os\nos.chdir('spatialindex-src-1.8.5')\n!./configure\n!make\n!make install\n!pip install rtree\n!ldconfig\n!pip install scikit-mobility\n```\n\n\u003ca id='tutorials'\u003e\u003c/a\u003e\n## Tutorials\nYou can some tutorials on scikit-mobility here: https://github.com/scikit-mobility/tutorials.\n\n\u003ca id='examples'\u003e\u003c/a\u003e\n## Examples\n\n\u003ca id='trajdataframe'\u003e\u003c/a\u003e\n### Create a `TrajDataFrame`\n\nIn scikit-mobility, a set of trajectories is described by a `TrajDataFrame`, an extension of the pandas `DataFrame` that has specific columns names and data types. A `TrajDataFrame` can contain many trajectories, and each row in the `TrajDataFrame` represents a point of a trajectory, described by three mandatory fields (aka columns):\n- `latitude` (type: float);\n- `longitude` (type: float);\n- `datetime` (type: date-time).\n\nAdditionally, two optional columns can be specified:\n- `uid` (type: string) identifies the object associated with the point of the trajectory. If `uid` is not present, scikit-mobility assumes that the `TrajDataFrame` contains trajectories associated with a single moving object;\n- `tid` specifies the identifier of the trajectory to which the point belongs to. If `tid` is not present, scikit-mobility assumes that all rows in the `TrajDataFrame` associated with a `uid` belong to the same trajectory;\n\nNote that, besides the mandatory columns, the user can add to a `TrajDataFrame` as many columns as they want since the data structures in scikit-mobility inherit all the pandas `DataFrame` functionalities.\n\nCreate a `TrajDataFrame` from a list:\n\n```python\n\u003e\u003e\u003e import skmob\n\u003e\u003e\u003e # create a TrajDataFrame from a list\n\u003e\u003e\u003e data_list = [[1, 39.984094, 116.319236, '2008-10-23 13:53:05'], [1, 39.984198, 116.319322, '2008-10-23 13:53:06'], [1, 39.984224, 116.319402, '2008-10-23 13:53:11'], [1, 39.984211, 116.319389, '2008-10-23 13:53:16']]\n\u003e\u003e\u003e tdf = skmob.TrajDataFrame(data_list, latitude=1, longitude=2, datetime=3)\n\u003e\u003e\u003e # print a portion of the TrajDataFrame\n\u003e\u003e\u003e print(tdf.head())\n```\n\t   0        lat         lng            datetime\n\t0  1  39.984094  116.319236 2008-10-23 13:53:05\n\t1  1  39.984198  116.319322 2008-10-23 13:53:06\n\t2  1  39.984224  116.319402 2008-10-23 13:53:11\n\t3  1  39.984211  116.319389 2008-10-23 13:53:16\n```python\n\u003e\u003e\u003e print(type(tdf))\n```\n\t\u003cclass 'skmob.core.trajectorydataframe.TrajDataFrame'\u003e\n\nCreate a `TrajDataFrame` from a [pandas](https://pandas.pydata.org/) `DataFrame`:\n\n```python\n\u003e\u003e\u003e import pandas as pd\n\u003e\u003e\u003e # create a DataFrame from the previous list\n\u003e\u003e\u003e data_df = pd.DataFrame(data_list, columns=['user', 'latitude', 'lng', 'hour'])\n\u003e\u003e\u003e # print the type of the object\n\u003e\u003e\u003e print(type(data_df))\n```\n\t\u003cclass 'pandas.core.frame.DataFrame'\u003e\n```python\n\u003e\u003e\u003e # now create a TrajDataFrame from the pandas DataFrame\n\u003e\u003e\u003e tdf = skmob.TrajDataFrame(data_df, latitude='latitude', datetime='hour', user_id='user')\n\u003e\u003e\u003e # print the type of the object\n\u003e\u003e\u003e print(type(tdf))\n```\n\t\u003cclass 'skmob.core.trajectorydataframe.TrajDataFrame'\u003e\n```python\n\u003e\u003e\u003e # print a portion of the TrajDataFrame\n\u003e\u003e\u003e print(tdf.head())\n```\n\t   uid        lat         lng            datetime\n\t0    1  39.984094  116.319236 2008-10-23 13:53:05\n\t1    1  39.984198  116.319322 2008-10-23 13:53:06\n\t2    1  39.984224  116.319402 2008-10-23 13:53:11\n\t3    1  39.984211  116.319389 2008-10-23 13:53:16\n\nWe can also create a `TrajDataFrame` from a file. For example, in the following we create a `TrajDataFrame` from a portion of a GPS trajectory dataset collected in the context of the [GeoLife](https://www.microsoft.com/en-us/research/publication/geolife-gps-trajectory-dataset-user-guide/) project by 178 users in a period of over four years from April 2007 to October 2011.\n\n```python\n\u003e\u003e\u003e # download the file from https://raw.githubusercontent.com/scikit-mobility/scikit-mobility/master/examples/geolife_sample.txt.gz\n\u003e\u003e\u003e # read the trajectory data (GeoLife, Beijing, China)\n\u003e\u003e\u003e tdf = skmob.TrajDataFrame.from_file('geolife_sample.txt.gz', latitude='lat', longitude='lon', user_id='user', datetime='datetime')\n\u003e\u003e\u003e # print a portion of the TrajDataFrame\n\u003e\u003e\u003e print(tdf.head())\n```\n\t\t lat         lng            datetime  uid\n\t0  39.984094  116.319236 2008-10-23 05:53:05    1\n\t1  39.984198  116.319322 2008-10-23 05:53:06    1\n\t2  39.984224  116.319402 2008-10-23 05:53:11    1\n\t3  39.984211  116.319389 2008-10-23 05:53:16    1\n\t4  39.984217  116.319422 2008-10-23 05:53:21    1\n\nA `TrajDataFrame` can be plotted on a [folium](https://python-visualization.github.io/folium/) interactive map using the `plot_trajectory` function.\n\n```python\n\u003e\u003e\u003e tdf.plot_trajectory(zoom=12, weight=3, opacity=0.9, tiles='Stamen Toner')\n```\n\n![Plot Trajectory](examples/plot_trajectory_example.png)\n\n\u003ca id='flowdataframe'\u003e\u003c/a\u003e\n### Create a `FlowDataFrame`\n\nIn scikit-mobility, an origin-destination matrix is described by the `FlowDataFrame` structure, an extension of the pandas `DataFrame` that has specific column names and data types. A row in a `FlowDataFrame` represents a flow of objects between two locations, described by three mandatory columns:\n- `origin` (type: string);\n- `destination` (type: string);\n- `flow` (type: integer).\n\nAgain, the user can add to a `FlowDataFrame` as many columns as they want since the `FlowDataFrame` data structure inherits all the pandas `DataFrame` functionalities.\n\nIn mobility tasks, the territory is often discretized by mapping the coordinates to a spatial tessellation, i.e., a covering of the bi-dimensional space using a countable number of geometric shapes (e.g., squares, hexagons), called tiles, with no overlaps and no gaps. For instance, for the analysis or prediction of mobility flows, a spatial tessellation is used to aggregate flows of people moving among locations (the tiles of the tessellation). For this reason, each `FlowDataFrame` is associated with a **spatial tessellation**, a [geopandas](http://geopandas.org/) `GeoDataFrame` that contains two mandatory columns:\n- `tile_ID` (type: integer) indicates the identifier of a location;\n- `geometry` indicates the polygon (or point) that describes the geometric shape of the location on a territory (e.g., a square, a voronoi shape, the shape of a neighborhood).\n\nNote that each location identifier in the `origin` and `destination` columns of a `FlowDataFrame` must be present in the associated spatial tessellation.\n\nCreate a spatial tessellation from a file describing counties in New York state:\n\n```python\n\u003e\u003e\u003e import skmob\n\u003e\u003e\u003e import geopandas as gpd\n\u003e\u003e\u003e # load a spatial tessellation\n\u003e\u003e\u003e url_tess = skmob.utils.constants.NY_COUNTIES_2011\n\u003e\u003e\u003e tessellation = gpd.read_file(url_tess).rename(columns={'tile_id': 'tile_ID'})\n\u003e\u003e\u003e # print a portion of the spatial tessellation\n\u003e\u003e\u003e print(tessellation.head())\n```\n\t  tile_ID  population                                           geometry\n\t0   36019       81716  POLYGON ((-74.006668 44.886017, -74.027389 44....\n\t1   36101       99145  POLYGON ((-77.099754 42.274215, -77.0996569999...\n\t2   36107       50872  POLYGON ((-76.25014899999999 42.296676, -76.24...\n\t3   36059     1346176  POLYGON ((-73.707662 40.727831, -73.700272 40....\n\t4   36011       79693  POLYGON ((-76.279067 42.785866, -76.2753479999...\n\nCreate a `FlowDataFrame` from a spatial tessellation and a file of real flows between counties in New York state:\n\n```python\n\u003e\u003e\u003e # load real flows into a FlowDataFrame\n\u003e\u003e\u003e fdf = skmob.FlowDataFrame.from_file(skmob.utils.constants.NY_FLOWS_2011,\n\t\t\t\ttessellation=tessellation,\n\t\t\t\ttile_id='tile_ID',\n\t\t\t\tsep=\",\")\n\u003e\u003e\u003e # print a portion of the flows\n\u003e\u003e\u003e print(fdf.head())\n```\n\t     flow origin destination\n\t0  121606  36001       36001\n\t1       5  36001       36005\n\t2      29  36001       36007\n\t3      11  36001       36017\n\t4      30  36001       36019\n\nA `FlowDataFrame` can be visualized on a [folium](https://python-visualization.github.io/folium/) interactive map using the `plot_flows` function, which plots the flows on a geographic map as lines between the centroids of the tiles in the `FlowDataFrame`'s spatial tessellation:\n\n```python\n\u003e\u003e\u003e fdf.plot_flows(flow_color='red')\n```\n\n![Plot Fluxes](examples/plot_flows_example.png)\n\nSimilarly, the spatial tessellation of a `FlowDataFrame` can be visualized using the `plot_tessellation` function. The argument `popup_features` (type:list, default:[`constants.TILE_ID`]) allows to enhance the plot's interactivity displaying popup windows that appear when the user clicks on a tile and includes information contained in the columns of the tessellation's `GeoDataFrame` specified in the argument’s list:\n\n```python\n\u003e\u003e\u003e fdf.plot_tessellation(popup_features=['tile_ID', 'population'])\n```\n\n![Plot Tessellation](examples/plot_tessellation_example.png)\n\nThe spatial tessellation and the flows can be visualized together using the `map_f` argument, which specifies the folium object on which to plot:\n\n```python\n\u003e\u003e\u003e m = fdf.plot_tessellation() # plot the tessellation\n\u003e\u003e\u003e fdf.plot_flows(flow_color='red', map_f=m) # plot the flows\n```\n\n![Plot Tessellation and Flows](examples/plot_tessellation_and_flows_example.png)\n\n\u003ca id='preprocessing'\u003e\u003c/a\u003e\n### Trajectory preprocessing\nAs any analytical process, mobility data analysis requires data cleaning and preprocessing steps. The `preprocessing` module allows the user to perform four main preprocessing steps:\n- noise filtering;\n- stop detection;\n- stop clustering;\n- trajectory compression;\n\nNote that, if a `TrajDataFrame` contains multiple trajectories from multiple users, the preprocessing methods automatically apply to the single trajectory and, when necessary, to the single moving object.\n\n#### Noise filtering\nIn scikit-mobility, the function `filter` filters out a point if the speed from the previous point is higher than the parameter `max_speed`, which is by default set to 500km/h.\n\n```python\n\u003e\u003e\u003e from skmob.preprocessing import filtering\n\u003e\u003e\u003e # filter out all points with a speed (in km/h) from the previous point higher than 500 km/h\n\u003e\u003e\u003e ftdf = filtering.filter(tdf, max_speed_kmh=500.)\n\u003e\u003e\u003e print(ftdf.parameters)\n```\n\t{'from_file': 'geolife_sample.txt.gz', 'filter': {'function': 'filter', 'max_speed_kmh': 500.0, 'include_loops': False, 'speed_kmh': 5.0, 'max_loop': 6, 'ratio_max': 0.25}}\n```python\n\u003e\u003e\u003e n_deleted_points = len(tdf) - len(ftdf) # number of deleted points\n\u003e\u003e\u003e print(n_deleted_points)\n```\n\t54\n\nNote that the `TrajDataFrame` structure as the `parameters` attribute, which indicates the operations that have been applied to the `TrajDataFrame`. This attribute is a dictionary the key of which is the signature of the function applied.\n\n#### Stop detection\nSome points in a trajectory can represent Point-Of-Interests (POIs) such as schools, restaurants, and bars, or they can represent user-specific places such as home and work locations. These points are usually called Stay Points or Stops, and they can be detected in different ways. A common approach is to apply spatial clustering algorithms to cluster trajectory points by looking at their spatial proximity. In scikit-mobility, the `stay_locations` function, contained in the `detection` module, finds the stay points visited by a moving object. For instance, to identify the stops where the object spent at least `minutes_for_a_stop` minutes within a distance `spatial_radius_km \\time stop_radius_factor`, from a given point, we can use the following code:\n\n```python\n\u003e\u003e\u003e from skmob.preprocessing import detection\n\u003e\u003e\u003e # compute the stops for each individual in the TrajDataFrame\n\u003e\u003e\u003e stdf = detection.stay_locations(tdf, stop_radius_factor=0.5, minutes_for_a_stop=20.0, spatial_radius_km=0.2, leaving_time=True)\n\u003e\u003e\u003e # print a portion of the detected stops\n\u003e\u003e\u003e print(stdf.head())\n```\n\t\t lat         lng            datetime  uid    leaving_datetime\n\t0  39.978030  116.327481 2008-10-23 06:01:37    1 2008-10-23 10:32:53\n\t1  40.013820  116.306532 2008-10-23 11:10:19    1 2008-10-23 23:45:27\n\t2  39.978419  116.326870 2008-10-24 00:21:52    1 2008-10-24 01:47:30\n\t3  39.981166  116.308475 2008-10-24 02:02:31    1 2008-10-24 02:30:29\n\t4  39.981431  116.309902 2008-10-24 02:30:29    1 2008-10-24 03:16:35\n```python\n\u003e\u003e\u003e print('Points of the original trajectory:\\t%s'%len(tdf))\n\u003e\u003e\u003e print('Points of stops:\\t\\t\\t%s'%len(stdf))\n```\n\tPoints of the original trajectory:\t217653\n\tPoints of stops:\t\t\t391\n\nA new column `leaving_datetime` is added to the `TrajDataFrame` in order to indicate the time when the user left the stop location. We can then visualize the detected stops using the `plot_stops` function:\n\n```python\n\u003e\u003e\u003e m = stdf.plot_trajectory(max_users=1, start_end_markers=False)\n\u003e\u003e\u003e stdf.plot_stops(max_users=1, map_f=m)\n```\n\n![Plot Stops](examples/plot_stops_example_single_user.png)\n\n#### Trajectory compression\nThe goal of trajectory compression is to reduce the number of trajectory points while preserving the structure of the trajectory. This step results in a significant reduction of the number of trajectory points. In scikit-mobility, we can use one of the methods in the `compression` module under the `preprocessing` module. For instance, to merge all the points that are closer than 0.2km from each other, we can use the following code:\n\n```python\n\u003e\u003e\u003e from skmob.preprocessing import compression\n\u003e\u003e\u003e # compress the trajectory using a spatial radius of 0.2 km\n\u003e\u003e\u003e ctdf = compression.compress(tdf, spatial_radius_km=0.2)\n\u003e\u003e\u003e # print the difference in points between original and filtered TrajDataFrame\n\u003e\u003e\u003e print('Points of the original trajectory:\\t%s'%len(tdf))\n\u003e\u003e\u003e print('Points of the compressed trajectory:\\t%s'%len(ctdf))\n```\n\tPoints of the original trajectory:\t217653\n\tPoints of the compressed trajectory:\t6281\n\n\u003ca id='measures'\u003e\u003c/a\u003e\n### Mobility measures\nSeveral measures have been proposed in the literature to capture the patterns of human mobility, both at the individual and collective levels. Individual measures summarize the mobility patterns of a single moving object, while collective measures summarize mobility patterns of a population as a whole. scikit-mobility provides a wide set of [mobility measures](https://scikit-mobility.github.io/scikit-mobility/reference/measures.html), each implemented as a function that takes in input a `TrajDataFrame` and outputs a pandas `DataFrame`. Individual and collective measures are implemented the in `skmob.measure.individual` and the `skmob.measures.collective` modules, respectively.\n\nFor example, the following code compute the *radius of gyration*, the *jump lengths* and the *home locations* of a `TrajDataFrame`:\n\n```python\n\u003e\u003e\u003e from skmob.measures.individual import jump_lengths, radius_of_gyration, home_location\n\u003e\u003e\u003e # load a TrajDataFrame from an URL\n\u003e\u003e\u003e url = \"https://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz\"\n\u003e\u003e\u003e df = pd.read_csv(url, sep='\\t', header=0, nrows=100000,\n     names=['user', 'check-in_time', 'latitude', 'longitude', 'location id'])\n\u003e\u003e\u003e tdf = skmob.TrajDataFrame(df, latitude='latitude', longitude='longitude', datetime='check-in_time', user_id='user')\n\u003e\u003e\u003e # compute the radius of gyration for each individual\n\u003e\u003e\u003e rg_df = radius_of_gyration(tdf)\n\u003e\u003e\u003e print(rg_df)\n```\n\t   uid  radius_of_gyration\n\t0    0         1564.436792\n\t1    1         2467.773523\n\t2    2         1439.649774\n\t3    3         1752.604191\n\t4    4         5380.503250\n```python\n\u003e\u003e\u003e # compute the jump lengths for each individual\n\u003e\u003e\u003e jl_df = jump_lengths(tdf.sort_values(by='datetime'))\n\u003e\u003e\u003e print(jl_df.head())\n```\n\t   uid                                       jump_lengths\n\t0    0  [19.640467328877936, 0.0, 0.0, 1.7434311010381...\n\t1    1  [6.505330424378251, 46.75436600375988, 53.9284...\n\t2    2  [0.0, 0.0, 0.0, 0.0, 3.6410097195943507, 0.0, ...\n\t3    3  [3861.2706300798827, 4.061631313492122, 5.9163...\n\t4    4  [15511.92758595804, 0.0, 15511.92758595804, 1....\n\nNote that for some measures, such as `jump_length`, the `TrajDataFrame` must be order in increasing order by the column `datetime` (see the documentation for the measures that requires this condition https://scikit-mobility.github.io/scikit-mobility/reference/measures.html).\n\n```python\n\u003e\u003e\u003e # compute the home location for each individual\n\u003e\u003e\u003e hl_df = home_location(tdf)\n\u003e\u003e\u003e print(hl_df.head())\n```\n\t   uid        lat         lng\n\t0    0  39.891077 -105.068532\n\t1    1  37.630490 -122.411084\n\t2    2  39.739154 -104.984703\n\t3    3  37.748170 -122.459192\n\t4    4  60.180171   24.949728\n```python\n\u003e\u003e\u003e # now let's visualize a cloropleth map of the home locations\n\u003e\u003e\u003e import folium\n\u003e\u003e\u003e from folium.plugins import HeatMap\n\u003e\u003e\u003e m = folium.Map(tiles = 'openstreetmap', zoom_start=12, control_scale=True)\n\u003e\u003e\u003e HeatMap(hl_df[['lat', 'lng']].values).add_to(m)\n\u003e\u003e\u003e m\n```\n\n![Cloropleth map home locations](examples/cloropleth_map_home_locations.png)\n\n\u003ca id='collective_models'\u003e\u003c/a\u003e\n### Collective generative models\nCollective generative models estimate spatial flows between a set of discrete locations. Examples of spatial flows estimated with collective generative models include commuting trips between neighborhoods, migration flows between municipalities, freight shipments between states, and phone calls between regions.\n\nIn scikit-mobility, a collective generative model takes in input a spatial tessellation, i.e., a geopandas `GeoDataFrame`. To be a valid input for a collective model, the spatial tessellation should contain two columns, `geometry` and `relevance`, which are necessary to compute the two variables used by collective algorithms: the distance between tiles and the importance (aka \"attractiveness\") of each tile. A collective algorithm produces a `FlowDataFrame` that contains the generated flows and the spatial tessellation. scikit-mobility implements the most common collective generative algorithms:\n- the `Gravity` model;\n- the `Radiation` model.\n\n#### Gravity model\nThe class `Gravity`, implementing the Gravity model, has two main methods:\n- `fit`, which calibrates the model's parameters using a `FlowDataFrame`;\n- `generate`, which generates the flows on a given spatial tessellation.\n\nLoad the spatial tessellation and a data set of real flows in a `FlowDataFrame`:\n\n```python\n\u003e\u003e\u003e from skmob.utils import utils, constants\n\u003e\u003e\u003e import geopandas as gpd\n\u003e\u003e\u003e from skmob.models.gravity import Gravity\n\u003e\u003e\u003e import numpy as np\n\u003e\u003e\u003e # load a spatial tessellation\n\u003e\u003e\u003e url_tess = skmob.utils.constants.NY_COUNTIES_2011\n\u003e\u003e\u003e tessellation = gpd.read_file(url_tess).rename(columns={'tile_id': 'tile_ID'})\n\u003e\u003e\u003e # load the file with the real fluxes\n\u003e\u003e\u003e fdf = skmob.FlowDataFrame.from_file(skmob.utils.constants.NY_FLOWS_2011,\n\t\t\t\t\ttessellation=tessellation,\n\t\t\t\t\ttile_id='tile_ID',\n\t\t\t\t\tsep=\",\")\n\u003e\u003e\u003e # compute the total outflows from each location of the tessellation (excluding self loops)\n\u003e\u003e\u003e tot_outflows = fdf[fdf['origin'] != fdf['destination']].groupby(by='origin', axis=0)[['flow']].sum().fillna(0)\n\u003e\u003e\u003e tessellation = tessellation.merge(tot_outflows, left_on='tile_ID', right_on='origin').rename(columns={'flow': 'tot_outflow'})\n```\n\nInstantiate a Gravity model object and generate synthetic flows:\n\n```python\n\u003e\u003e\u003e # instantiate a singly constrained Gravity model\n\u003e\u003e\u003e gravity_singly = Gravity(gravity_type='singly cons/tetrained')\n\u003e\u003e\u003e print(gravity_singly)\n```\n\tGravity(name=\"Gravity model\", deterrence_func_type=\"power_law\", deterrence_func_args=[-2.0], origin_exp=1.0, destination_exp=1.0, gravity_type=\"singly constrained\")\n```python\n\u003e\u003e\u003e # start the generation of the synthetic flows\n\u003e\u003e\u003e np.random.seed(0)\n\u003e\u003e\u003e synth_fdf = gravity_singly.generate(tessellation,\n\t\t\t\t   tile_id_column='tile_ID',\n\t\t\t\t   tot_outflows_column='tot_outflow',\n\t\t\t\t   relevance_column= 'population',\n\t\t\t\t   out_format='flows')\n\u003e\u003e\u003e # print a portion of the synthetic flows\n\u003e\u003e\u003e print(synth_fdf.head())\n```\n\t  origin destination  flow\n\t0  36019       36101   101\n\t1  36019       36107    66\n\t2  36019       36059  1041\n\t3  36019       36011   151\n\t4  36019       36123    33\n\nFit the parameters of the Gravity model from the `FlowDataFrame` and generate the synthetic flows:\n\n```python\n\u003e\u003e\u003e # instantiate a Gravity object (with default parameters)\n\u003e\u003e\u003e gravity_singly_fitted = Gravity(gravity_type='singly constrained')\n\u003e\u003e\u003e print(gravity_singly_fitted)\n```\n\tGravity(name=\"Gravity model\", deterrence_func_type=\"power_law\", deterrence_func_args=[-2.0], origin_exp=1.0, destination_exp=1.0, gravity_type=\"singly constrained\")\n```python\n\u003e\u003e\u003e # fit the parameters of the Gravity from the FlowDataFrame\n\u003e\u003e\u003e gravity_singly_fitted.fit(fdf, relevance_column='population')\n\u003e\u003e\u003e print(gravity_singly_fitted)\n```\n\tGravity(name=\"Gravity model\", deterrence_func_type=\"power_law\", deterrence_func_args=[-1.9947152031914186], origin_exp=1.0, destination_exp=0.6471759552223144, gravity_type=\"singly constrained\")\n```python\n\u003e\u003e\u003e # generate the synthetics flows\n\u003e\u003e\u003e np.random.seed(0)\n\u003e\u003e\u003e synth_fdf_fitted = gravity_singly_fitted.generate(tessellation,\n\t\t\t\t\t\t\ttile_id_column='tile_ID',\n\t\t\t\t\t\t\ttot_outflows_column='tot_outflow',\n\t\t\t\t\t\t\trelevance_column= 'population',\n\t\t\t\t\t\t\tout_format='flows')\n\u003e\u003e\u003e # print a portion of the synthetic flows\n\u003e\u003e\u003e print(synth_fdf_fitted.head())\n```\n\t  origin destination  flow\n\t0  36019       36101   102\n\t1  36019       36107    66\n\t2  36019       36059  1044\n\t3  36019       36011   152\n\t4  36019       36123    33\n\nPlot the real flows and the synthetic flows:\n\n```python\n\u003e\u003e\u003e m = fdf.plot_flows(min_flow=100, flow_exp=0.01, flow_color='blue')\n\u003e\u003e\u003e synth_fdf_fitted.plot_flows(min_flow=1000, flow_exp=0.01, map_f=m)\n```\n\n![Gravity model: real flows vs synthetic flows](examples/real_flows_vs_synth_flows.png)\n\n#### Radiation model\nThe Radiation model is parameter-free and has only one method: `generate`. Given a spatial tessellation, the synthetic flows can be generated using the `Radiation` class as follows:\n\n```python\n\u003e\u003e\u003e from skmob.models.radiation import Radiation\n\u003e\u003e\u003e # instantiate a Radiation object\n\u003e\u003e\u003e radiation = Radiation()\n\u003e\u003e\u003e # start the simulation\n\u003e\u003e\u003e np.random.seed(0)\n\u003e\u003e\u003e rad_flows = radiation.generate(tessellation,\n\t\t\t\ttile_id_column='tile_ID',\n\t\t\t\ttot_outflows_column='tot_outflow',\n\t\t\t\trelevance_column='population',\n\t\t\t\tout_format='flows_sample')\n\u003e\u003e\u003e # print a portion of the synthetic flows\n\u003e\u003e\u003e print(rad_flows.head())\n```\n\t  origin destination   flow\n\t0  36019       36033  11648\n\t1  36019       36031   4232\n\t2  36019       36089   5598\n\t3  36019       36113   1596\n\t4  36019       36041    117\n\n\u003ca id='individual_models'\u003e\u003c/a\u003e\n### Individual generative models\nThe goal of individual generative models of human mobility is to create a population of agents whose mobility patterns are statistically indistinguishable from those of real individuals. An individual generative model typically generates a synthetic trajectory corresponding to a single moving object, assuming that an object is independent of the others.\n\nscikit-mobility implements the most common individual generative models, such as the [Exploration and Preferential Return](https://www.nature.com/articles/nphys1760) model and its variants, and [DITRAS](https://link.springer.com/article/10.1007/s10618-017-0548-4). Each generative model is a python class with a public method `generate`, which starts the generation of synthetic trajectories.\n\nThe following code generate synthetic trajectories using the `DensityEPR` model:\n\n```python\n\u003e\u003e\u003e from skmob.models.epr import DensityEPR\n\u003e\u003e\u003e # load a spatial tesellation on which to perform the simulation\n\u003e\u003e\u003e url = skmob.utils.constants.NY_COUNTIES_2011\n\u003e\u003e\u003e tessellation = gpd.read_file(url)\n\u003e\u003e\u003e # starting and end times of the simulation\n\u003e\u003e\u003e start_time = pd.to_datetime('2019/01/01 08:00:00')\n\u003e\u003e\u003e end_time = pd.to_datetime('2019/01/14 08:00:00')\n\u003e\u003e\u003e # instantiate a DensityEPR object\n\u003e\u003e\u003e depr = DensityEPR()\n\u003e\u003e\u003e # start the simulation\n\u003e\u003e\u003e tdf = depr.generate(start_time, end_time, tessellation, relevance_column='population', n_agents=100)\n\u003e\u003e\u003e print(tdf.head())\n```\n\t   uid                   datetime        lat        lng\n\t0    1 2019-01-01 08:00:00.000000  42.452018 -76.473618\n\t1    1 2019-01-01 08:32:30.108708  42.170344 -76.306260\n\t2    1 2019-01-01 09:09:11.760703  43.241550 -75.435903\n\t3    1 2019-01-01 10:00:22.832309  42.170344 -76.306260\n\t4    1 2019-01-01 14:00:25.923314  42.267915 -77.383591\n```python\n\u003e\u003e\u003e print(tdf.parameters)\n```\n\t{'model': {'class': \u003cfunction DensityEPR.__init__ at 0x7f70ce0a7e18\u003e, 'generate': {'start_date': Timestamp('2019-01-01 08:00:00'), 'end_date': Timestamp('2019-01-14 08:00:00'), 'gravity_singly': {}, 'n_agents': 100, 'relevance_column': 'population', 'random_state': None, 'verbose': True}}}\n\n\u003ca id='privacy'\u003e\u003c/a\u003e\n### Privacy\nMobility data is sensitive since the movements of individuals can reveal confidential personal information or allow the re-identification of individuals in a database, creating serious privacy risks. In the literature, privacy risk assessment relies on the concept of re-identification of a moving object in a database through an attack by a malicious adversary. A common framework for privacy risk assessment assumes that during the attack a malicious adversary acquires, in some way, the access to an anonymized mobility data set, i.e., a mobility data set in which the moving object associated with a trajectory is not known. Moreover, it is assumed that the malicious adversary acquires, in some way, information about the trajectory (or a portion of it) of an individual represented in the acquired data set. Based on this information, the risk of re-identification of that individual is computed estimating how unique that individual's mobility data are with respect to the mobility data of the other individuals represented in the acquired data set.\n\nscikit-mobility provides several attack models, each implemented as a python class. For example in a location attack model, implemented in the `LocationAttack` class, the malicious adversary knows a certain number of locations visited by an individual, but they do not know the temporal order of the visits. To instantiate a `LocationAttack` object we can run the following code:\n\n```python\n\u003e\u003e\u003e import skmob\n\u003e\u003e\u003e from skmob.privacy import attacks\n\u003e\u003e\u003e at = attacks.LocationAttack(knowledge_length=2)\n```\n\nThe argument `knowledge_length` specifies how many locations the malicious adversary knows of each object's movement. The re-identification risk is computed based on the worst possible combination of `knowledge_length` locations out of all possible combinations of locations.\n\nTo assess the re-identification risk associated with a mobility data set, represented as a `TrajDataFrame`, we specify it as input to the `assess_risk` method, which returns a pandas `DataFrame` that contains the `uid` of each object in the `TrajDataFrame` and the associated re-identification risk as the column `risk` (type: float, range: $[0,1]$  where 0 indicates minimum risk and 1 maximum risk).\n\n```python\n\u003e\u003e\u003e tdf = skmob.TrajDataFrame.from_file(filename=\"privacy_toy.csv\")\n\u003e\u003e\u003e tdf_risk = at.assess_risk(tdf)\n\u003e\u003e\u003e print(tdf_risk.head())\n```\n\t   uid      risk\n\t0    1  0.333333\n\t1    2  0.500000\n\t2    3  0.333333\n\t3    4  0.333333\n\t4    5  0.250000\n\nSince risk assessment may be time-consuming for more massive datasets, scikit-mobility provides the option to focus only on a subset of the objects with the argument `targets`. For example, in the following code, we compute the re-identification risk for the object with `uid` 1 and 2 only:\n\n```python\n\u003e\u003e\u003e tdf_risk = at.assess_risk(tdf, targets=[1,2])\n\u003e\u003e\u003e print(tdf_risk)\n```\n\t   uid      risk\n\t0    1  0.333333\n\t1    2  0.500000\n\n\n\u003ca id=\"data\"\u003e\u003c/a\u003e\n### Downloading datasets\n\nThe `data` module of scikit-mobility provides users with an easy way to: 1) Download ready-to-use mobility data (e.g., trajectories, flows, spatial tessellations, and auxiliary data); 2) Load and transform the downloaded dataset into standard skmob structures (TrajDataFrame, GeoDataFrame, FlowDataFrame, DataFrame); 3) Allow developers and contributors to add new datasets to the library.\n\nThe `data` module provides three functions:\n - `list_datasets`\n - `get_dataset_info`\n - `load_dataset`\n\n\nThe user can download the list of all datasets currently available in the library using `list_datasets`:\n\n```python\n\u003e\u003e\u003e import skmob\n\u003e\u003e\u003e from skmob.data.load import list_datasets\n\n\u003e\u003e\u003e list_datasets()\n```\n\t['flow_foursquare_nyc',\n\t 'foursquare_nyc',\n\t 'nyc_boundaries',\n\t 'parking_san_francisco',\n\t 'taxi_san_francisco']\n\n\nThe user can retrieve information about a specific dataset in the library using `get_dataset_info`:\n\n```python\n\u003e\u003e\u003e import skmob\n\u003e\u003e\u003e from skmob.data.load import get_dataset_info\n\n\u003e\u003e\u003e get_dataset_info(\"foursquare_nyc\")\n```\n\n\t{'name': 'Foursquare_NYC',\n\t 'description': 'Dataset containing the Foursquare checkins of individuals moving in New York City',\n\t 'url': 'http://www-public.it-sudparis.eu/~zhang_da/pub/dataset_tsmc2014.zip',\n\t 'hash': 'cbe3fdab373d24b09b5fc53509c8958c77ff72b6c1a68589ce337d4f9a80235b',\n\t 'auth': 'no',\n\t 'data_type': 'trajectory',\n\t 'download_format': 'zip',\n\t 'sep': '   ',\n\t 'encoding': 'ISO-8859-1'}\n\n\nFinally, the user can download a specific dataset using `load_dataset`:\n\n```python\n\u003e\u003e\u003e import skmob\n\u003e\u003e\u003e from skmob.data.load import load_dataset, list_datasets\n\n\u003e\u003e\u003e tdf_nyc = load_dataset(\"foursquare_nyc\", drop_columns=True)\n\u003e\u003e\u003e print(tdf_nyc.head())\n```\n\t   uid        lat        lng                  datetime\n\t0  470  40.719810 -74.002581 2012-04-03 18:00:09+00:00\n\t1  979  40.606800 -74.044170 2012-04-03 18:00:25+00:00\n\t2   69  40.716162 -73.883070 2012-04-03 18:02:24+00:00\n\t3  395  40.745164 -73.982519 2012-04-03 18:02:41+00:00\n\t4   87  40.740104 -73.989658 2012-04-03 18:03:00+00:00\n\n\n\n\n# Related packages\n[*movingpandas*](https://github.com/anitagraser/movingpandas) is a similar package that deals with movement data. Instead of implementing new data structures tailored for trajectories (`TrajDataFrame`) and mobility flows (`FlowDataFrame`), *movingpandas* describes a trajectory using a *geopandas* `GeoDataFrame`. There is little overlap in the covered use cases and implemented functionality (comparing [*scikit-mobility* tutorials](https://github.com/scikit-mobility/tutorials) and [*movingpandas* tutorials](https://github.com/anitagraser/movingpandas/tree/master/tutorials)): *scikit-mobility* focuses on computing human mobility metrics, generating synthetic trajectories and assessing privacy risks of mobility datasets. *movingpandas* on the other hand focuses on spatio-temporal data exploration with corresponding functions for data manipulation and analysis.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscikit-mobility%2Fscikit-mobility","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscikit-mobility%2Fscikit-mobility","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscikit-mobility%2Fscikit-mobility/lists"}