An open API service indexing awesome lists of open source software.

https://github.com/hkuds/urbangpt

[KDD'2024] "UrbanGPT: Spatio-Temporal Large Language Models"
https://github.com/hkuds/urbangpt

fundation-models instruction-tuning large-language-models pre-trained-model smart-cities spatio-temporal-prediction urban-computing urban-data-science

Last synced: 4 months ago
JSON representation

[KDD'2024] "UrbanGPT: Spatio-Temporal Large Language Models"

Awesome Lists containing this project

README

          

# UrbanGPT: Spatio-Temporal Large Language Models

A pytorch implementation for the paper: [UrbanGPT: Spatio-Temporal Large Language Models]

[Zhonghang Li](https://scholar.google.com/citations?user=__9uvQkAAAAJ), [Lianghao Xia](https://akaxlh.github.io/), [Jiabin Tang](https://tjb-tech.github.io/), [Yong Xu](https://scholar.google.com/citations?user=1hx5iwEAAAAJ), [Lei Shi](https://harryshil.github.io/), [Long Xia](https://scholar.google.com/citations?user=NRwerBAAAAAJ), [Dawei Yin](https://www.yindawei.com/), [Chao Huang](https://sites.google.com/view/chaoh)* (*Correspondence)

**[Data Intelligence Lab](https://sites.google.com/view/chaoh/home)@[University of Hong Kong](https://www.hku.hk/)**, [South China University of Technology](https://www.scut.edu.cn/en/), Baidu Inc

-----




[![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=4BIbQt-EIAM)
โ€ข ๐ŸŒ ไธญๆ–‡ๅšๅฎข

This repository hosts the code, data, and model weights of **UrbanGPT**.

-----
## ๐ŸŽ‰ News

- [x] ๐Ÿš€๐Ÿ”ฅ [2024.05] ๐ŸŽฏ๐ŸŽฏ๐Ÿ“ข๐Ÿ“ข Exciting News! We are thrilled to announce that our ๐ŸŒŸUrbanGPT๐ŸŒŸ has been accepted by KDD'2024! ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Thanks to all the team members ๐Ÿค—

๐ŸŽฏ๐ŸŽฏ๐Ÿ“ข๐Ÿ“ข We upload the **models** and **data** used in our UrbanGPT on ๐Ÿค— **Huggingface**. We highly recommend referring to the table below for further details:

| ๐Ÿค— Huggingface Address | ๐ŸŽฏ Description |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [https://huggingface.co/bjdwh/UrbanGPT](https://huggingface.co/bjdwh/UrbanGPT/tree/main) | It's the checkpoint of our UrbanGPT based on Vicuna-7B-v1.5-16k tuned on instruction data [train-data](https://huggingface.co/datasets/bjdwh/ST_data_urbangpt/tree/main/train_data) |
| [https://huggingface.co/datasets/bjdwh/ST_data_urbangpt](https://huggingface.co/datasets/bjdwh/ST_data_urbangpt) | We released a portion of the instruction dataset for evaluation. |
| [https://huggingface.co/datasets/bjdwh/UrbanGPT_ori_stdata](https://huggingface.co/datasets/bjdwh/UrbanGPT_ori_stdata) | We released the original dataset used in UrbanGPT. |

- [x] [2023.02.23] ๐Ÿš€๐Ÿš€ Release the code of UrbanGPT.
- [x] [2023.02.29] Add video.
- [x] [2023.03.05] Release the full paper.
- [x] [2023.03.11] Upload the new checkpoint of our UrbanGPT.
- [x] [2023.06.07] Release instruction generation codes and the original dataset used in UrbanGPT.

## ๐Ÿ‘‰ TODO
- [ ] Release baselines codes.
- [ ] ...

-----------

## Introduction


In this work, we present a spatio-temporal large language model that can exhibit exceptional generalization capabilities across a wide range of downstream urban tasks.
To achieve this objective, we present the UrbanGPT, which seamlessly integrates a spatio-temporal dependency encoder with the instruction-tuning paradigm.
This integration enables large language models (LLMs) to comprehend the complex inter-dependencies across time and space, facilitating more comprehensive and accurate predictions under data scarcity.
Extensive experimental findings highlight the potential of building LLMs for spatio-temporal learning, particularly in zero-shot scenarios.

![The detailed framework of the proposed UrbanGPT.](https://github.com/urban-gpt/urban-gpt.github.io/blob/main/images/urbangpt_framework.png)

### Demo Video
https://github.com/HKUDS/UrbanGPT/assets/90381931/9cd094b4-8fa3-486f-890d-631a08b19b4a

-----------

## Getting Started

### Table of Contents:
* 1. Code Structure
* 2. Environment
* 3. Training UrbanGPT
* 3.1. Prepare Pre-trained Checkpoint
* 3.2. Instruction Tuning
* 4. Evaluating UrbanGPT
* 4.1. Preparing Checkpoints and Data
* 4.2. Running Evaluation
* 4.3. Evaluation Metric Calculation
* 5. Instructions Generation
****

### 1. Code Structure [Back to Top]

```
.
| README.md
| urbangpt_eval.sh
| urbangpt_train.sh
|
+---checkpoints
| \---st_encoder
| pretrain_stencoder.pth
|
+---playground
| | inspect_conv.py
| |
| +---test_embedding
| | README.md
| | test_classification.py
| | test_semantic_search.py
| | test_sentence_similarity.py
| |
| \---test_openai_api
| anthropic_api.py
| openai_api.py
|
+---tests
| test_openai_curl.sh
| test_openai_langchain.py
| test_openai_sdk.py
|
\---urbangpt
| constants.py
| conversation.py
| utils.py
| __init__.py
|
+---eval
| | run_urbangpt.py # evaluation
| | run_vicuna.py
| |
| \---script
| run_model_qa.yaml
|
+---model
| | apply_delta.py
| | apply_lora.py
| | builder.py
| | compression.py
| | convert_fp16.py
| | make_delta.py
| | model_adapter.py
| | model_registry.py
| | monkey_patch_non_inplace.py
| | STLlama.py # model
| | utils.py
| | __init__.py
| |
| \---st_layers
| args.py
| ST_Encoder.conf
| ST_Encoder.py # ST-Encoder
| __init__.py
|
+---protocol
| openai_api_protocol.py
|
+---serve
| | api_provider.py
| | bard_worker.py
| | cacheflow_worker.py
| | cli.py
| | controller.py
| | controller_graph.py
| | gradio_block_arena_anony.py
| | gradio_block_arena_named.py
| | gradio_css.py
| | gradio_patch.py
| | gradio_web_server.py
| | gradio_web_server_graph.py
| | gradio_web_server_multi.py
| | huggingface_api.py
| | inference.py
| | model_worker.py
| | model_worker_graph.py
| | openai_api_server.py
| | register_worker.py
| | test_message.py
| | test_throughput.py
| | __init__.py
| |
| +---examples
| | extreme_ironing.jpg
| | waterview.jpg
| |
| +---gateway
| | nginx.conf
| | README.md
| |
| \---monitor
| basic_stats.py
| clean_battle_data.py
| elo_analysis.py
| hf_space_leaderboard_app.py
| monitor.py
|
\---train
llama2_flash_attn_monkey_patch.py
llama_flash_attn_monkey_patch.py
stchat_trainer.py
train_lora.py
train_mem.py
train_st.py # train

```

### 2.Environment [Back to Top]
Please first clone the repo and install the required environment, which can be done by running the following commands:
```shell
conda create -n urbangpt python=3.9.13

conda activate urbangpt

# Torch with CUDA 11.7
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

# To support vicuna base model
pip3 install "fschat[model_worker,webui]"

# To install pyg and pyg-relevant packages
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.1+cu117.html

# Clone our UrabnGPT or download it
git clone https://github.com/HKUDS/UrbanGPT.git
cd UrbanGPT

# Install required libraries
# (The recommendation is to install separately using the following method)
pip install deepspeed
pip install ray
pip install einops
pip install wandb
# ๏ผˆThere is a version compatibility issue between "flash-attn" and "transformers". Please refer to the flash-attn [GitHub URL](https://github.com/Dao-AILab/flash-attention) for more information.๏ผ‰
pip install flash-attn==2.3.5 # or download from (https://github.com/Dao-AILab/flash-attention/releases, e.g. flash_attn-2.3.5+cu117torch2.0cxx11abiFALSE-cp39-cp39-linux_x86_64.whl)
pip install transformers==4.34.0

# ๏ผˆor you can install according to the requirements file.๏ผ‰
pip install -r requirements.txt
```

### 3. Training UrbanGPT [Back to Top]

#### 3.1. Preparing Pre-trained Checkpoint [Back to Top]
UrabnGPT is trained based on following excellent existing models.
Please follow the instructions to prepare the checkpoints.

- `Vicuna`:
Prepare our base model Vicuna, which is an instruction-tuned chatbot and base model in our implementation. Please download its weights [here](https://github.com/lm-sys/FastChat#model-weights). We generally utilize v1.5 and v1.5-16k model with 7B parameters. You should update the 'config.json' of vicuna, for example, the 'config.json' in v1.5-16k can be found in [config.json](https://huggingface.co/datasets/bjdwh/checkpoints/blob/main/train_config/config.json)

- `Spatio-temporal Encoder`:
We employ a simple TCNs-based spatio-temporal encoder to encode the spatio-temporal dependencies. The weights of [st_encoder](./checkpoints/st_encoder/pretrain_stencoder.pth) are pre-trained through a typical multi-step spatio-temporal prediction task.

- `Spatio-temporal Train Data`:
We utilize pre-training data consisting of New York City's taxi, bike, and crime data, including spatio-temporal statistics, recorded timestamps, and information about regional points of interest (POIs). These data are organized in [train_data](https://huggingface.co/datasets/bjdwh/ST_data_urbangpt/tree/main/train_data). Please download it and put it at ./UrbanGPT/ST_data_urbangpt/train_data

#### 3.2. Instruction Tuning [Back to Top]

* **Start tuning:** After the aforementioned steps, you could start the instruction tuning by filling blanks at [urbangpt_train.sh](urbangpt_train.sh). There is an example as below:

```shell
# to fill in the following path to run our UrbanGPT!
model_path=./checkpoints/vicuna-7b-v1.5-16k
instruct_ds=./ST_data_urbangpt/train_data/multi_NYC.json
st_data_path=./ST_data_urbangpt/train_data/multi_NYC_pkl.pkl
pretra_ste=ST_Encoder
output_model=./checkpoints/UrbanGPT

wandb offline
python -m torch.distributed.run --nnodes=1 --nproc_per_node=8 --master_port=20001 \
urbangpt/train/train_mem.py \
--model_name_or_path ${model_path} \
--version v1 \
--data_path ${instruct_ds} \
--st_content ./TAXI.json \
--st_data_path ${st_data_path} \
--st_tower ${pretra_ste} \
--tune_st_mlp_adapter True \
--st_select_layer -2 \
--use_st_start_end \
--bf16 True \
--output_dir ${output_model} \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2400 \
--save_total_limit 1 \
--learning_rate 2e-3 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True \
--report_to wandb

```

## 4. Evaluating UrbanGPT [Back to Top]

#### 4.1. Preparing Checkpoints and Data [Back to Top]

* **Checkpoints:** You could try to evaluate UrbanGPT by using your own model or our released checkpoints.
* **Data:** We split test sets for NYC-taxi datasets and make the instruction data for evaluation. Please refer to the [evaluating](https://huggingface.co/datasets/bjdwh/ST_data_urbangpt).

#### 4.2. Running Evaluation [Back to Top]

You could start the second stage tuning by filling blanks at [urbangpt_eval.sh](urbangpt_eval.sh). There is an example as below:
```
# to fill in the following path to evaluation!
output_model=./checkpoints/tw2t_multi_reg-cla-gird
datapath=./ST_data_urbangpt/NYC_taxi_cross-region/NYC_taxi.json
st_data_path=./ST_data_urbangpt/NYC_taxi_cross-region/NYC_taxi_pkl.pkl
res_path=./result_test/cross-region/NYC_taxi
start_id=0
end_id=51920
num_gpus=8

python ./urbangpt/eval/run_urbangpt.py --model-name ${output_model} --prompting_file ${datapath} --st_data_path ${st_data_path} --output_res_path ${res_path} --start_id ${start_id} --end_id ${end_id} --num_gpus ${num_gpus}
```

#### 4.3. Evaluation Metric Calculation [Back to Top]

You can use [result_test.py](./metric_calculation/result_test.py) to calculate the performance metrics of the predicted results.

---------

## 5. Instructions Generation [Back to Top]

You can use the code in [instruction_generate.py](./instruction_generate/instruction_generate.py) to generate the specific instructions you need. For example:

```
-dataset_name: Choose the dataset. # NYC_multi(for training) NYC_taxi NYC_bike NYC_crime1 NYC_crime2 CHI_taxi (for testing)
# Only one of the following options can be set to True
-for_zeroshot: for zero-shot prediction or not.
-for_supervised: for supervised prediction or not.
-for_ablation: for ablation study or not.

# Create the instruction data for traning
python instruction_generate.py -dataset_name NYC_multi

# Create instruction data for the NYC_taxi dataset to facilitate testing in the zero-shot setting of UrbanGPT
python instruction_generate.py -dataset_name NYC_taxi -for_zeroshot True
```

---------

## Citation

If you find UrbanGPT useful in your research or applications, please kindly cite:

```
@misc{li2024urbangpt,
title={UrbanGPT: Spatio-Temporal Large Language Models},
author={Zhonghang Li and Lianghao Xia and Jiabin Tang and Yong Xu and Lei Shi and Long Xia and Dawei Yin and Chao Huang},
year={2024},
eprint={2403.00813},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

## Acknowledgements
You may refer to related work that serves as foundations for our framework and code repository,
[Vicuna](https://github.com/lm-sys/FastChat). We also partially draw inspirations from [GraphGPT](https://github.com/HKUDS/GraphGPT). The design of our website and README.md was inspired by [NExT-GPT](https://next-gpt.github.io/), and the design of our system deployment was inspired by [gradio](https://www.gradio.app) and [Baize](https://github.com/project-baize/baize-chatbot). Thanks for their wonderful works.