{"id":19401043,"url":"https://github.com/google-research/human-scene-transformer","last_synced_at":"2025-11-02T10:30:34.907Z","repository":{"id":253123391,"uuid":"633062171","full_name":"google-research/human-scene-transformer","owner":"google-research","description":"Human Scene Transformer: A framework for trajectory prediction and wrappers for reframing the JRDB dataset for the prediction task.","archived":false,"fork":false,"pushed_at":"2024-08-14T15:08:01.000Z","size":2286,"stargazers_count":59,"open_issues_count":2,"forks_count":10,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-02-09T23:11:20.401Z","etag":null,"topics":["deep-learning","tensorflow2","trajectory-prediction","transformer-architecture"],"latest_commit_sha":null,"homepage":"https://human-scene-transformer.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-26T17:43:25.000Z","updated_at":"2025-01-07T12:08:20.000Z","dependencies_parsed_at":"2024-08-14T16:52:27.219Z","dependency_job_id":null,"html_url":"https://github.com/google-research/human-scene-transformer","commit_stats":null,"previous_names":["google-research/human-scene-transformer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fhuman-scene-transformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fhuman-scene-transformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fhuman-scene-transformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fhuman-scene-transformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/human-scene-transformer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239394723,"owners_count":19631121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","tensorflow2","trajectory-prediction","transformer-architecture"],"created_at":"2024-11-10T11:16:52.731Z","updated_at":"2025-11-02T10:30:34.859Z","avatar_url":"https://github.com/google-research.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":":trophy: Winner of the [2023 JRDB Trajectory Prediction Challenge](https://jrdb.erc.monash.edu/leaderboards/trajectory) - [Reproduce](#jrdb-trajectory-prediction-challenge-results) our Result!\n\n# Human Scene Transformer\n\nThe (Human) Scene Transformer architecture (as described [here](https://arxiv.org/pdf/2309.17209.pdf) and [here)](https://arxiv.org/pdf/2106.08417.pdf) is a general and extendable trajectory prediction framework which threats trajectory prediction as a sequence to sequence problem and models it in a Transformer architecture.\n\nIt is straightforward to extend with\n\n- additional input features\n- custom environment encoder\n- different loss functions\n- ...\n\n*This is not an officially supported Google product.*\n\n---\n\n![Human Scene Transformer](./human_scene_transformer/images/hero.png)\n\nAnticipating the motion of all humans in dynamic environments such as homes and offices is critical to enable safe and effective robot navigation. Such spaces remain challenging as humans do not follow strict rules of motion and there are often multiple occluded entry points such as corners and doors that create opportunities for sudden encounters. In this work, we present a Transformer based architecture to predict human future trajectories in human-centric environments from input features including human positions, head orientations, and 3D skeletal keypoints from onboard in-the-wild sensory information. The resulting model captures the inherent uncertainty for future human trajectory prediction and achieves state-of-the-art performance on common prediction benchmarks and a human tracking dataset captured from a mobile robot adapted for the prediction task. Furthermore, we identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error in such challenging scenarios.\n\nIf you use this work please cite our paper\n\n```\n@article{salzmann2023hst,\n  title={Robots That Can See: Leveraging Human Pose for Trajectory Prediction},\n  author={Salzmann, Tim and Chiang, Lewis and Ryll, Markus and Sadigh, Dorsa and Parada, Carolina and Bewley, Alex}\n  journal={IEEE Robotics and Automation Letters},\n  title={Robots That Can See: Leveraging Human Pose for Trajectory Prediction},\n  year={2023}, volume={8}, number={11}, pages={7090-7097},\n  doi={10.1109/LRA.2023.3312035}\n}\n```\n\n---\n\n## Prerequisites\n\nInstall requirements via `pip install -r requirements.txt`.\n\nPlease note that this codebase is not compatible with the Intel MKL backend for\ntensorflow. The MKL backend supports tensors up to 5 dimensions which is\nnot sufficient for parts of this codebase. Should you have a MKL backed\ntensorflow installation or are running into MKL related\n[errors](https://github.com/google-research/human-scene-transformer/issues/11),\nplease disable the tensorflow MKL backend by setting the environment variable\n`TF_ENABLE_ONEDNN_OPTS=0` and `TF_DISABLE_MKL=1`.\n\n## Data\n\n### JRDB\n\nWe provide a extensive prep-processing pipeline to convert the JRDB dataset,\nJRDB was created as a detection and tracking dataset rather than a prediction\ndataset. To make the data suitable for a prediction task, we first extract the\nrobot motion from the raw sensor data to account for the robot's motion.\nFurther, on the JRDB training split we combine algorithmic detection with the\nground truth labels from the tracking dataset to create authentic tracks as\ninput and labels for HST.\nNote that we do not purely use the ground truth hand labeled tracks in the JRDB\ntrain dataset as we find them to be overly smoothed giving away the future human\nmovement.\nTo adapt the JRDB dataset for prediction please follow [this](/human_scene_transformer/data) README.\n\nMake sure to adapt `\u003cdata_path\u003e` in `config/\u003cjrdb/pedestrians\u003e/dataset_params.gin` accordingly.\n\nIf you want to use the JRDB dataset for trajectory prediction in PyTorch we\nprovide a [PyTorch Dataset wrapper](/human_scene_transformer/jrdb/torch_dataset.py) for the processed dataset.\n\n### Pedestrians ETH/UCY\nPlease download the raw data [here](https://github.com/StanfordASL/Trajectron-plus-plus/tree/master/experiments/pedestrians/raw).\n\n## Training\n\n### JRDB\n```\npython train.py --model_base_dir=./model/jrdb  --gin_files=./config/jrdb/training_params.gin --gin_files=./config/jrdb/model_params.gin --gin_files=./config/jrdb/dataset_params.gin --gin_files=./config/jrdb/metrics.gin --dataset=JRDB\n```\n\n### Pedestrians ETH/UCY\n```\npython train.py --model_base_dir=./models/pedestrians_eth  --gin_files=..config/pedestrians/training_params.gin --gin_files=..config/pedestrians/model_params.gin --gin_files=./config/pedestrians/dataset_params.gin --gin_files=./config/pedestrians/metrics.gin --dataset=PEDESTRIANS\n```\n\n---\n\n## JRDB Trajectory Prediction Challenge Results\nTo reproduce our winning results in the [2023 JRDB Trajectory Prediction Challenge](https://jrdb.erc.monash.edu/leaderboards/trajectory):\n\n- Make sure that you follow the [data pre-processing instructions](/human_scene_transformer/data) and pay special attention to where the instructions differentiate between the JRDB Challenge dataset and the original paper dataset.\n\n- Download the trained challenge model [here](https://storage.googleapis.com/gresearch/human_scene_transformer/jrdb_challenge_checkpoint.zip)\n\n- Run\n\n```\npython jrdb/eval_challenge.py --model_path=\u003cpath_to_challenge_model_folder\u003e --checkpoint_path=\u003cpath_to_challenge_model_folder\u003e/ckpts/ckpt-20 --dataset_path=\u003cdataset_path\u003e --output_path=\u003cresult_folder\u003e\n```\n\n---\n\n## Evaluation\n\n### JRDB\n```\npython jrdb/eval.py --model_path=./models/jrdb/ --checkpoint_path=./models/jrdb/ckpts/ckpt-30\n```\n\n#### Keypoints Impact Evaluation\n```\npython jrdb/eval_keypoints.py --model_path=./models/jrdb/ --checkpoint_path=./models/jrdb/ckpts/ckpt-30\n```\n\nvs\n\n```\npython jrdb/eval_keypoints.py --model_path=./models/jrdb_no_keypoints/ --checkpoint_path=./models/jrdb_no_keypoints/ckpts/ckpt-30\n```\n\n### Pedestrians ETH/UCY\n```\npython pedestrians/eval.py --model_path=./models/pedestrians_eth/ --checkpoint_path=./models/pedestrians_eth/ckpts/ckpt-20\n```\n\n---\n\n## Results\n\nCompared to the published paper we improved our data processing and fixed small\nbugs in this code release. If you compare against our method please use the\nfollowing updated results.\n\nOn the JRDB dataset with dataset options as set [here](/human_scene_transformer/config/jrdb/dataset_params.py):\n\n|        | AVG  |  @ 1s | @ 2s |  @ 3s | @ 4s  |\n|--------|------|-------|------|-------|-------|\n| MinADE | 0.26 | 0.12  | 0.20 | 0.28  | 0.37  |\n| MinFDE | 0.45 | 0.21  | 0.39 | 0.56  | 0.71  |\n|  NLL   |-0.59 | -0.90 | -0.65| -0.08 | 0.32  |\n\nOn the ETH/UCY Pedestrians Dataset:\n\n|        | ETH  | Hotel | Univ | Zara1 | Zara2 |  Avg  |\n|--------|------|-------|------|-------|-------|-------|\n| MinADE | 0.41 | 0.10  | 0.24 | 0.17  | 0.14  | 0.21  |\n| MinFDE | 0.73 | 0.14  | 0.44 | 0.30  | 0.24  | 0.37  |\n\n### JRDB Train / Test Split\nThe train / test split is implemented [here](/human_scene_transformer/config/jrdb/dataset_params.py).\n\n### Checkpoints\nYou can download trained model checkpoints for both `JRDB` and `Pedestrians (ETH/UCY)` datasets [here](https://storage.googleapis.com/gresearch/human_scene_transformer/checkpoints.zip).\n\nTo evaluate the pre-trained checkpoints you will have to adjust the path to the dataset in the respective `params/operative_config.gin` file.\n\n## Runtime\nEvaluation of forward inference runtime with single output mode:\n\n| #Humans | M1 - CPU | A100 - GPU |\n|---------|----------|------------|\n| 1       |   40Hz   |    12Hz    |\n| 10      |   30Hz   |    11Hz    |\n| 20      |   23Hz   |    11Hz    |\n| 50      |   12Hz   |    11Hz    |\n| 100     |    5Hz   |    11Hz    |\n| 150     |          |    11Hz    |","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fhuman-scene-transformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2Fhuman-scene-transformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fhuman-scene-transformer/lists"}