Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/LibCity/Bigscity-LibCity-Datasets
Traffic data processing tools in LibCity
https://github.com/LibCity/Bigscity-LibCity-Datasets
dataset libcity spatio-temporal-data spatio-temporal-prediction traffic traffic-data traffic-prediction
Last synced: 3 months ago
JSON representation
Traffic data processing tools in LibCity
- Host: GitHub
- URL: https://github.com/LibCity/Bigscity-LibCity-Datasets
- Owner: LibCity
- License: apache-2.0
- Created: 2021-01-25T05:20:12.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2023-06-22T13:02:42.000Z (over 1 year ago)
- Last Synced: 2024-06-19T07:41:51.031Z (5 months ago)
- Topics: dataset, libcity, spatio-temporal-data, spatio-temporal-prediction, traffic, traffic-data, traffic-prediction
- Language: Python
- Homepage:
- Size: 1.08 MB
- Stars: 172
- Watchers: 3
- Forks: 20
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-open-transport - (dataset) LibCity Traffic Data Processing tool
README
# Dataset in LibCity
This repository is used to introduce the dataset in LibCity.
## Dataset Conversion Tools
The dataset used in LibCity is stored in a unified data storage format named [atomic files](https://bigscity-libcity-docs.readthedocs.io/en/latest/user_guide/data/atomic_files.html). In order to directly use [the datasets we collected](https://bigscity-libcity-docs.readthedocs.io/en/latest/user_guide/data/raw_data.html) in LibCity, we have converted all datasets into the format of atomic files, and provide the conversion tools in this repository.
All conversion tools take the original dataset in the `./input/` directory as input, and output the converted atomic files to the `./output/` directory. In addition, we provide a link to obtain the original dataset in the first line of each conversion tool. You can download the original dataset through this link and place it in the `./input/` directory. Imitating our conversion tools, you can easily convert your own traffic dataset to adapt it to LibCity.
Besides, you can simply download the datasets we have processed, the data link is [BaiduDisk with code 1231](https://pan.baidu.com/s/1qEfcXBO-QwZfiT0G3IYMpQ) or [Google Drive](https://drive.google.com/drive/folders/1g5v2Gq1tkOq8XO0HDCZ9nOTtRpB6-gPe?usp=sharing).
## Dataset Statistics Infomation
Here we present the statistics of the datasets we have processed.
- [Traffic State Datasets: Point-based Flow or Speed or Occupancy](#Traffic-State-Datasets-Point-based-Flow-or-Speed-or-Occupancy)
- [Traffic State Datasets: Grid-based In-Flow and Out-Flow](#Traffic-State-Datasets-Grid-based-In-Flow-and-Out-Flow)
- [Traffic State Datasets: OD-based Flow](#Traffic-State-Datasets-OD-based-Flow)
- [Traffic State Datasets: Grid-OD-based Flow](#Traffic-State-Datasets-Grid-OD-based-Flow)
- [Traffic State Datasets: Risk](#Traffic-State-Datasets-Risk)
- [GPS Point Trajectory Datasets](#GPS-Point-Trajectory-Datasets)
- [Road Segment-based Trajectory Datasets](#Road-Segment-based-Trajectory-Datasets)
- [POI-based Trajectory Datasets](#POI-based-Trajectory-Datasets)
- [Road Network Datasets](#Road-Network-Datasets)### Traffic State Datasets-Point-based Flow or Speed or Occupancy
>Collected from sensors or Pre-processed from trajectory data.
| DATASET | #GEO | #REL | #USR | #DYNA | PLACE | DURATION | INTERVAL |
| ------------------------- | ------ | ------- | ---- | ----------- | --------------------------- | -------------------------------- | -------- |
| METR_LA | 207 | 11,753 | — | 7,094,304 | Los Angeles, USA | Mar. 1, 2012 - Jun. 27, 2012 | 5min |
| LOS_LOOP | 207 | 42,849 | — | 7,094,304 | Los Angeles, USA | Mar. 1, 2012 - Jun. 27, 2012 | 5min |
| LOS_LOOP_SMALL | 207 | 42,849 | — | 417,312 | Los Angeles, USA | May. 1, 2012 - May. 5, 2012 | 5min |
| SZ_TAXI | 156 | 24,336 | — | 464,256 | Shenzhen, China | Jan. 1, 2015 - Jan. 31, 2015 | 15min |
| LOOP_SEATTLE | 323 | 104,329 | — | 33,953,760 | Greater Seattle Area, USA | over the entirely of 2015 | 5min |
| Q_TRAFFIC | 45,148 | 63,422 | — | 264,386,688 | Beijing, China | Apr. 1, 2017 - May 31, 2017 | 15min |
| PEMSD3 | 358 | 547 | — | 9,382,464 | California, USA | Sept. 1, 2018 - Nov. 30, 2018 | 5min |
| PEMSD4 | 307 | 340 | — | 5,216,544 | San Francisco Bay Area, USA | Jan. 1, 2018 - Feb. 28, 2018 | 5min |
| PEMSD7 | 883 | 866 | — | 24,921,792 | California, USA | May. 1, 2017 - Aug. 31, 2017 | 5min |
| PEMSD8 | 170 | 277 | — | 3,035,520 | San Bernardino Area, USA | Jul. 1, 2016 - Aug. 31, 2016 | 5min |
| PEMSD7(M) | 228 | 51,984 | — | 2,889,216 | California, USA | weekdays of May and June, 2012 | 5min |
| PEMS_BAY | 325 | 8,358 | — | 16,937,700 | San Francisco Bay Area, USA | Jan. 1, 2017 - Jun. 30, 2017 | 5min |
| BEIJING_SUBWAY | 276 | 76,176 | — | 248,400 | Beijing, China | Feb. 29, 2016 - Apr. 3, 2016 | 30min |
| M_DENSE | 30 | — | — | 525,600 | Madrid, Spain | Jan. 1, 2018 - Dec. 21, 2019 | 60min |
| ROTTERDAM | 208 | — | — | 4,813,536 | Rotterdam, Holland | 135 days of 2018 | 2min |
| SHMETRO | 288 | 82,944 | — | 1,934,208 | Shanghai, China | Jul. 1, 2016 - Sept. 30, 2016 | 15min |
| HZMETRO | 80 | 6,400 | — | 146,000 | Hangzhou, China | Jan. 1, 2019 - Jan. 25, 2019 | 15min |
| NYCTAXI202001-202003_DYNA | 263 | 69,169 | — | 574,392 | New York, USA | Jan. 1, 2020 - Mar. 30, 2020 | 60min |### Traffic State Datasets-Grid-based In-Flow and Out-Flow
> Pre-processed from trajectory data.
| DATASET | #GEO | #REL | #USR | #DYNA | PLACE | DURATION | INTERVAL |
| -------------------- | --------- | --------- | --------- | ----------- | --------------------------------------------------------- | -------------------------------- | -------- |
| TAXIBJ | 32*32 | — | — | 5,652,480 | Beijing, China | Mar. 1, 2015 - Jun. 30, 2015 et al. | 30min |
| T_DRIVE20150206 | 32*32 | 1,048,576 | — | 3,686,400 | Beijing, China | Feb. 1, 2015 - Jun. 30, 2015 | 60min |
| T_DRIVE_SMALL | 32*32 | — | — | 172,032 | Beijing, China | Feb. 2, 2008 - Feb. 8, 2008 | 60min |
| NYCTAXI201401-201403_GRID | 10*20 | — | — | 432,000 | New York, USA | Jan. 1, 2014 - Mar. 31, 2014 | 60min |
| NYCBIKE202007-202009 | 10*20 | — | — | 441,600 | New York, USA | Jul. 1, 2020 - Sept. 30, 2020 | 60min |
| PORTO201307-201309 | 20*10 | — | — | 441,600 | Porto, Portugal | Jul. 1, 2013 - Sept. 30, 2013 | 60min |
| AUSTINRIDE20160701-20160930 | 16*8 | — | — | 282,624 | Austin, USA | Jul. 1, 2016 - Sept. 30, 2016 | 60min |
| BIKEDC202007-202009 | 16*8 | — | — | 282,624 | Washington, USA | Jul. 1, 2020 - Sept. 30, 2020 | 60min |
| BIKECHI202007-202009-3600 | 15*18 | — | — | 596,160 | Chicago, USA | Jul. 1, 2020 - Sept. 30, 2020 | 60min |
| BIKECHI202007-202009 | 15*18 | — | — | 1,192,320 | Chicago, USA | Jul. 1, 2020 - Sept. 30, 2020 | 30min |
| NYCTaxi20140112 | 15*5 | — | — | 1,314,000 | New York, USA | Jan. 1, 2014 - Dec. 31, 2014 | 30min |
| NYCTaxi20150103 | 10*20 | — | — | 576,000 | New York, USA | Jan. 1, 2015 - Mar. 1, 2015 | 30min |
| NYCTaxi20160102 | 16*12 | — | — | 552,960 | New York, USA | Jan. 1, 2016 - Feb. 29, 2016 | 30min |
| NYCBike20140409 | 16*8 | — | — | 562,176 | New York, USA | Apr. 1, 2014 - Sept. 30, 2014 | 60min |
| NYCBike20160708 | 10*20 | — | — | 576,000 | New York, USA | Jul. 1, 2016 - Aug. 29, 2016 | 30min |
| NYCBike20160809 | 14*8 | — | — | 322,560 | New York, USA | Aug. 1, 2016 - Sept. 29, 2016 | 30min |### Traffic State Datasets-OD-based Flow
| DATASET | #GEO | #REL | #USR | #DYNA | PLACE | DURATION | INTERVAL |
| ----------------------- | ---- | ------ | ---- | ----------- | ------------- | ---------------------------- | -------- |
| NYCTAXI202004-202006_OD | 263 | 69,169 | — | 150,995,927 | New York, USA | Apr. 1, 2020 - Jun. 30, 2020 | 60min |### Traffic State Datasets-Grid-OD-based Flow
| DATASET | #GEO | #REL | #USR | #DYNA | PLACE | DURATION | INTERVAL |
| -------------------- | --------- | --------- | --------- | ----------- | --------------------------------------------------------- | -------------------------------- | -------- |
| NYC_TOD | 15\*5 | — | — | 98,550,000 | New York, USA | — |—|### Traffic State Datasets-Risk
| DATASET | #GEO | #REL | #USR | #DYNA | PLACE | DURATION | INTERVAL |
| -------------------- | --------- | --------- | --------- | ----------- | --------------------------------------------------------- | -------------------------------- | -------- |
| NYC_RISK | 243 | 59049 | — | 3504000 | New York, USA | Jan. 01, 2013 - Dec. 31, 2013 | 60min |
| CHICAGO_RISK | 197 | 38809 | — | 2332800 | Chicago, USA | Feb. 01, 2016 - Sep. 30, 2016 | 60min |### GPS Point Trajectory Datasets
| DATASET | #GEO | #REL | #USR | #DYNA | PLACE | DURATION | INTERVAL |
| -------------------- | --------- | --------- | --------- | ----------- | --------------------------------------------------------- | -------------------------------- | -------- |
| Chengdu_Taxi_Sample1 | — | — | 4565 | 712360 | Chengdu, China | Aug. 03, 2014 - Aug. 30, 2014 | — |
| Beijing_Taxi_Sample | 16384 | — | 76 | 518424 | Beijing, China | Oct. 01, 2013 - Oct. 31, 2013 | — |
| Seattle | 613645 | 857406 | 1 | 7531 | Seattle WA, USA | Jan.17,2009 20:27:37 - 22:34:28 | 1s |
| Global | 11045 | 18196 | 1 | 2502 | Neftekamsk, Republic of Bashkortostan, Russian Federation | — | 1s |### Road Segment-based Trajectory Datasets
| DATASET | #GEO | #REL | #USR | #DYNA | PLACE | DURATION | INTERVAL |
| -------------------- | --------- | --------- | --------- | ----------- | --------------------------------------------------------- | -------------------------------- | -------- |### POI-based Trajectory Datasets
| DATASET | #GEO | #REL | #USR | #DYNA | PLACE | DURATION | INTERVAL |
| -------------------- | --------- | --------- | --------- | ----------- | --------------------------------------------------------- | -------------------------------- | -------- |
| Foursquare_TKY | 61,858 | — | 2,293 | 573,703 | Tokyo, Japan | Apr. 4, 2012 - Feb. 16, 2013 | — |
| Foursquare_NYC | 38,333 | — | 1,083 | 227,428 | New York, USA | Apr. 3, 2012 - Feb. 15, 2013 | — |
| Gowalla | 1,280,969 | 913,660 | 107,092 | 6,442,892 | Global | Feb. 4, 2009 - Oct. 23, 2010 | — |
| BrightKite | 772,966 | 394,334 | 51,406 | 4,747,287 | Global | Mar. 21, 2008 - Oct. 18, 2010 | — |
| Instagram | 13,187 | — | 78,233 | 2,205,794 | New York, USA | Jun. 15, 2011 - Nov. 8, 2016 | — |### Road Network Datasets
| DATASET | #GEO | #REL | #USR | #DYNA | PLACE | DURATION | INTERVAL |
| -------------------- | --------- | --------- | --------- | ----------- | --------------------------------------------------------- | -------------------------------- | -------- |
| bj_roadmap_edge | 38027 | 95660 | — | — | Beijing, China | — | — |
| bj_roadmap_node | 16927 | 38027 | — | — | Beijing, China | — | — |Note:
- NYCTAXI_DYNA is a dataset that counts the inflow and outflow of the region with an irregular area division method.
- NYCTAXI_OD is a dataset that counts the origin-destination flow between regions with an irregular area division method.
- NYCTAXI_GRID is a dataset that counts the inflow and outflow of the region with a grid-base division method.
- NYC_TOD is a dataset that counts the origin-destination flow between regions with a grid-base division method.## Cite
Our paper is accepted by ACM SIGSPATIAL 2021. If you find LibCity useful for your research or development, please cite our [paper](https://dl.acm.org/doi/10.1145/3474717.3483923).
```
@inproceedings{10.1145/3474717.3483923,
author = {Wang, Jingyuan and Jiang, Jiawei and Jiang, Wenjun and Li, Chao and Zhao, Wayne Xin},
title = {LibCity: An Open Library for Traffic Prediction},
year = {2021},
isbn = {9781450386647},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3474717.3483923},
doi = {10.1145/3474717.3483923},
booktitle = {Proceedings of the 29th International Conference on Advances in Geographic Information Systems},
pages = {145–148},
numpages = {4},
keywords = {Spatial-temporal System, Reproducibility, Traffic Prediction},
location = {Beijing, China},
series = {SIGSPATIAL '21}
}
``````
Jingyuan Wang, Jiawei Jiang, Wenjun Jiang, Chao Li, and Wayne Xin Zhao. 2021. LibCity: An Open Library for Traffic Prediction. In Proceedings of the 29th International Conference on Advances in Geographic Information Systems (SIGSPATIAL '21). Association for Computing Machinery, New York, NY, USA, 145–148. DOI:https://doi.org/10.1145/3474717.3483923
```**04/27/2023 Update**: We published a [long paper](https://arxiv.org/abs/2304.14343) on LibCity, including (1) classification and base units of urban spatial-temporal data and proposed a unified storage format, i.e., atomic files, (2) a detailed review of urban spatial-temporal prediction field (including macro-group prediction, micro-individual prediction, and fundamental tasks), (3) proposed LibCity, an open source library for urban spatial-temporal prediction, detailing each module and use cases, and providing a web-based experiment management and visualization platform, (4) selected more than 20 models and datasets for comparison experiments based on LibCity, obtained model performance rankings and summarized promising future research directions. Please check this [link](https://arxiv.org/abs/2304.14343) for more details.
For the long paper, please cite it as follows:
```
@article{libcitylong,
title={Towards Efficient and Comprehensive Urban Spatial-Temporal Prediction: A Unified Library and Performance Benchmark},
author={Jingyuan Wang and Jiawei Jiang and Wenjun Jiang and Chengkai Han and Wayne Xin Zhao},
journal={arXiv preprint arXiv:2304.14343},
year={2023}
}
```