https://github.com/brjathu/PHALP

Code repository for the paper "Tracking People by Predicting 3D Appearance, Location & Pose". (CVPR 2022 Oral)
https://github.com/brjathu/PHALP
Last synced: 2 months ago
JSON representation
Code repository for the paper "Tracking People by Predicting 3D Appearance, Location & Pose". (CVPR 2022 Oral)
Host: GitHub
URL: https://github.com/brjathu/PHALP
Owner: brjathu
License: other
Created: 2022-02-23T03:53:00.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2024-03-31T20:44:40.000Z (over 1 year ago)
Last Synced: 2025-04-24T07:55:52.749Z (3 months ago)
Language: Python
Homepage:
Size: 123 MB
Stars: 303
Watchers: 9
Forks: 59
Open Issues: 24
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

awesome-human-pose-estimation - [code - CVPR 22, PHALP (3D People Tracking / 2022)
README

        # Tracking People by Predicting 3D Appearance, Location & Pose

Code repository for the paper "Tracking People by Predicting 3D Appearance, Location & Pose". \

[Jathushan Rajasegaran](http://people.eecs.berkeley.edu/~jathushan/), [Georgios Pavlakos](https://geopavlakos.github.io/), [Angjoo Kanazawa](https://people.eecs.berkeley.edu/~kanazawa/), [Jitendra Malik](http://people.eecs.berkeley.edu/~malik/). \

[![arXiv](https://img.shields.io/badge/arXiv-2112.04477-00ff00.svg)](https://arxiv.org/abs/2112.04477)       [![Website shields.io](https://img.shields.io/website-up-down-green-red/http/shields.io.svg)](https://people.eecs.berkeley.edu/~jathushan/PHALP/)     [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-zA9ntIkSj94zgfeeghg0lWEIihzk-XS?usp=sharing)

This code repository provides a code implementation for our paper PHALP, with installation, a demo code to run on any videos, preparing datasets, and evaluating on datasets.

This branch contains code supporting our latest work: [4D-Humans](https://github.com/shubham-goel/4D-Humans). 


For the original PHALP code, please see the [initial release branch](https://github.com/brjathu/PHALP/tree/initial_release).



## Installation

After installing the [PyTorch](https://pytorch.org/get-started/locally/) dependency, you may install our `phalp` package directly as:

```

pip install phalp[all]@git+https://github.com/brjathu/PHALP.git

```

  Step-by-step instructions

```bash

git clone https://github.com/brjathu/PHALP.git

cd PHALP

conda create -n phalp python=3.10

conda activate phalp

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

pip install -e .[all]

```




## Demo

To run our code on a video, please specifiy the input video `video.source` and an output directory `video.output_dir`:

```bash

python scripts/demo.py video.source=assets/videos/gymnasts.mp4 video.output_dir='outputs'

```

The output directory will contain a video rendering of the tracklets and a `.pkl` file containing the tracklets with 3D pose and shape (see structure below).




## Command-line options

### Input Sources

You can specify various kinds of input sources. For example, you can specify a video file, a youtube video, a directory of images:

```bash

# for a video file

python scripts/demo.py video.source=assets/videos/vid.mp4

# for a youtube video

python scripts/demo.py video.source=\'"https://www.youtube.com/watch?v=xEH_5T9jMVU"\'

# for a directory of images

python scripts/demo.py video.source=

```

  Custom bounding boxes

In addition to these options, you can also give images and bounding boxes as inputs, so the model will only do tracking using the given bounding boxes. To do this, you need to specify the `video.source` as a `.pkl` file, where each key is the frame name and the absolute path to the image is computed as `os.path.join(video.base_path, frame_name)`. The value of each key is a dictionary with the following keys: `gt_bbox`, `gt_class`, `gt_track_id`. Please see the following example. `gt_boxes` is a `np.ndarray` of shape `(N, 4)` where each row is a bounding box in the format of `[x1, y1, x2, y2]`. You can also give `gt_class` and `gt_track_id` to store it in the final output.

```python

 gt_data[frame_id] = {

                        "gt_bbox": gt_boxes,

                        "extra_data": {

                            "gt_class": [],

                            "gt_track_id": [],

                        }

                    }

```

Here is an example, of how to give bounding boxes and track-ids to the model and get the renderings.

```bash

mkdir assets/videos/gymnasts

ffmpeg -i assets/videos/gymnasts.mp4 -q:v 2 assets/videos/gymnasts/%06d.jpg

python scripts/demo.py \

render.enable=True \

video.output_dir=test_gt_bbox \

use_gt=True \

video.base_path=assets/videos/gymnasts \

video.source=assets/videos/gt_tracks.pkl

```




### Running on a subset of frames

You can specify the start and end of the video to be tracked, e.g. track from frame 50 to 100:

```bash

python scripts/demo.py video.source=assets/videos/vid.mp4 video.start_frame=50 video.end_frame=100

```

  Tracking without extracting frames

  However, if the video is too long and extracting the frames is too time consuming, you can set `video.extract_video=False`. This will use the torchvision backend and it will only keep the timestamps of the video in memeory. If this is enabled, you can give start time and end time of the video in seconds.

  ```bash

  python scripts/demo.py video.source=assets/videos/vid.mp4 video.extract_video=False video.start_time=1s video.end_time=2s

  ```




### Visualization type

  We support multiple types of visualization in `render.type`: `HUMAN_MESH` (default) renders the full human mesh, `HUMAN_MASK` visualizes the segmentation masks, `HUMAN_BBOX` visualizes the bounding boxes with track-ids, `TRACKID__MESH` renders the full human mesh but for track `` only:

  ```bash

  # render full human mesh

  python scripts/demo.py video.source=assets/videos/vid.mp4 render.type=HUMAN_MESH

  # render segmentation mask

  python scripts/demo.py video.source=assets/videos/vid.mp4 render.type=HUMAN_MASK

  # render bounding boxes with track-ids

  python scripts/demo.py video.source=assets/videos/vid.mp4 render.type=HUMAN_BBOX

  # render a single track id, say 0

  python scripts/demo.py video.source=assets/videos/vid.mp4 render.type=TRACKID_0_MESH

  ```

  

  


  More rendering types

  In addition to these setting, for rendering meshes, PHALP uses head-mask visiualiztion, which only renders the upper body on the person to allow users to see the actually person and the track in the same video. To enable this, please set `render.head_mask=True`.

  ```bash

  # for rendering detected and occluded people

  python scripts/demo.py video.source=assets/videos/vid.mp4 render.head_mask=True

  ```

  You can also visualize the 2D projected keypoints by setting `render.show_keypoints=True` [TODO].




### Track through shot-boundaries

By default, PHALP does not track through shot boundaries. To enable this, please set `detect_shots=True`.

```bash

# for tracking through shot boundaries

python scripts/demo.py video.source=assets/videos/vid.mp4 detect_shots=True

```

  Additional Notes

  * For debugging purposes, you can set `debug=True` to disable rich progress bar.




## Output `.pkl` structure

The `.pkl` file containing tracks, 3D poses, etc. is stored under `/results`, and is a 2-level dictionary:

Detailed structure

```python 
import joblib 
results 
results = { 
  # A dictionary for each frame. 
  'vid_frame0.jpg': { 
    '2d_joints': 
    '3d_joints': 
    'annotations': 
    'appe': 
    'bbox': 
    'camera': 
    'camera_bbox': 
    'center': 
    'class_name': 
    'conf': 
    'frame_path': 
    'loca': 
    'mask': 
    'pose': 
    'scale': 
    'shot': 
    'size': 
    'smpl': 
    'tid': 
    'time': 
    'tracked_bbox': 
    'tracked_ids': 
    'tracked_time': 
  }, 
  'vid_frame1.jpg': { 
    ... 
  }, 
  ... 
} 
```

= joblib.load(/results/.pkl) List[np.array(90,)],   # 45x 2D joints for each detection List[np.array(45,3)],  # 45x 3D joints for each detection List[Any],             # custom annotations for each detection List[np.array(4096,)], # appearance features for each detection List[[x0 y0 w h]],     # 2D bounding box (top-left corner and dimensions) for each track (detections + ghosts) List[[tx ty tz]],      # camera translation (wrt image) for each detection List[[tx ty tz]],      # camera translation (wrt bbox) for each detection List[[cx cy]],         # 2D center of bbox for each detection List[int],             # class ID for each detection (0 for humans) List[float],           # confidence score for each detection 'vid_frame0.jpg',      # Frame identifier List[np.array(99,)],   # location features for each detection List[mask],            # RLE-compressed mask for each detection List[np.array(229,)],  # pose feature (concatenated SMPL params) for each detection List[float],           # max(width, height) for each detection int,                   # Shot number List[[imgw imgh]],     # Image dimensions for each detection List[Dict_SMPL],       # SMPL parameters for each detection: betas (10), body_pose (23x3x3), global_orient (3x3) List[int],             # Track ID for each detection int,                   # Frame number List[[x0 y0 w h]],     # 2D bounding box (top-left corner and dimensions) for each detection List[int],             # Track ID for each detection List[int],             # for each detection, time since it was last seen




## Postprocessing pipeline

Coming soon.




## Training and Evaluation

Coming soon.

## Acknowledgements

Parts of the code are taken or adapted from the following repos:

- [deep sort](https://github.com/nwojke/deep_sort)

- [SMPL-X](https://github.com/vchoutas/smplx)

- [SMPLify-X](https://github.com/vchoutas/smplify-x)

- [SPIN](https://github.com/nkolot/SPIN)

- [VIBE](https://github.com/mkocabas/VIBE)

- [SMALST](https://github.com/silviazuffi/smalst)

- [ProHMR](https://github.com/nkolot/ProHMR)

- [TrackEval](https://github.com/JonathonLuiten/TrackEval)

## Citation

If you find this code useful for your research or the use data generated by our method, please consider citing the following paper:

```bibtex

@inproceedings{rajasegaran2022tracking,

  title={Tracking People by Predicting 3{D} Appearance, Location \& Pose},

  author={Rajasegaran, Jathushan and Pavlakos, Georgios and Kanazawa, Angjoo and Malik, Jitendra},

  booktitle={CVPR},

  year={2022}

}

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/brjathu/PHALP

Awesome Lists containing this project

README