{"id":13572038,"url":"https://github.com/facebookresearch/co-tracker","last_synced_at":"2025-05-13T21:07:42.455Z","repository":{"id":181901941,"uuid":"666048093","full_name":"facebookresearch/co-tracker","owner":"facebookresearch","description":"CoTracker is a model for tracking any point (pixel) on a video.","archived":false,"fork":false,"pushed_at":"2025-01-21T21:30:41.000Z","size":55290,"stargazers_count":4280,"open_issues_count":82,"forks_count":293,"subscribers_count":35,"default_branch":"main","last_synced_at":"2025-04-28T12:15:30.710Z","etag":null,"topics":["optical-flow","point-tracking","track-anything"],"latest_commit_sha":null,"homepage":"https://co-tracker.github.io/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-13T15:29:27.000Z","updated_at":"2025-04-28T10:17:59.000Z","dependencies_parsed_at":"2025-01-07T18:03:57.766Z","dependency_job_id":"d9b050da-4f81-4234-9e0a-42769c25f034","html_url":"https://github.com/facebookresearch/co-tracker","commit_stats":{"total_commits":34,"total_committers":8,"mean_commits":4.25,"dds":0.6470588235294117,"last_synced_commit":"0f9d32869ac51f3bd12c5ead9c206366cfb6caea"},"previous_names":["facebookresearch/co-tracker"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fco-tracker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fco-tracker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fco-tracker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fco-tracker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/co-tracker/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251311334,"owners_count":21569009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["optical-flow","point-tracking","track-anything"],"created_at":"2024-08-01T14:01:11.810Z","updated_at":"2025-04-28T12:15:54.077Z","avatar_url":"https://github.com/facebookresearch.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Repos","对象检测_分割","🐾 Tracking","📊 Data \u0026 Analytics"],"sub_categories":["资源传输下载","Software tools"],"readme":"# CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos\n\n**[Meta AI Research, GenAI](https://ai.facebook.com/research/)**; **[University of Oxford, VGG](https://www.robots.ox.ac.uk/~vgg/)**\n\n[Nikita Karaev](https://nikitakaraevv.github.io/), [Iurii Makarov](https://linkedin.com/in/lvoursl), [Jianyuan Wang](https://jytime.github.io/), [Ignacio Rocco](https://www.irocco.info/), [Benjamin Graham](https://ai.facebook.com/people/benjamin-graham/), [Natalia Neverova](https://nneverova.github.io/), [Andrea Vedaldi](https://www.robots.ox.ac.uk/~vedaldi/), [Christian Rupprecht](https://chrirupp.github.io/)\n\n### [Project Page](https://cotracker3.github.io/) | [Paper #1](https://arxiv.org/abs/2307.07635) | [Paper #2](https://arxiv.org/abs/2410.11831) |  [X Thread](https://twitter.com/n_karaev/status/1742638906355470772) | [BibTeX](#citing-cotracker)\n\n\u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/facebookresearch/co-tracker/blob/main/notebooks/demo.ipynb\"\u003e\n  \u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\n\u003c/a\u003e\n\u003ca href=\"https://huggingface.co/spaces/facebook/cotracker\"\u003e\n  \u003cimg alt=\"Spaces\" src=\"https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue\"\u003e\n\u003c/a\u003e\n\n\u003cimg width=\"1100\" src=\"./assets/teaser.png\" /\u003e\n\n**CoTracker** is a fast transformer-based model that can track any point in a video. It brings to tracking some of the benefits of Optical Flow.\n\nCoTracker can track:\n\n- **Any pixel** in a video\n- A **quasi-dense** set of pixels together\n- Points can be manually selected or sampled on a grid in any video frame\n\nTry these tracking modes for yourself with our [Colab demo](https://colab.research.google.com/github/facebookresearch/co-tracker/blob/master/notebooks/demo.ipynb) or in the [Hugging Face Space 🤗](https://huggingface.co/spaces/facebook/cotracker).\n\n**Updates:**\n\n- [January 21, 2025] 📦 Kubric Dataset used for CoTracker3 now available! This dataset contains **6,000 high-resolution sequences** (512×512px, 120 frames) with slight camera motion, rendered using the Kubric engine. Check it out on [Hugging Face Dataset](https://huggingface.co/datasets/facebook/CoTracker3_Kubric).\n\n- [October 15, 2024] 📣 We're releasing CoTracker3! State-of-the-art point tracking with a lightweight architecture trained with 1000x less data than previous top-performing models. Code for baseline models and the pseudo-labeling pipeline are available in the repo, as well as model checkpoints. Check out our [paper](https://arxiv.org/abs/2410.11831) for more details.\n\n- [September 25, 2024]  CoTracker2.1 is now available! This model has better performance on TAP-Vid benchmarks and follows the architecture of the original CoTracker. Try it out!\n\n- [June 14, 2024]  We have released the code for [VGGSfM](https://github.com/facebookresearch/vggsfm), a model for recovering camera poses and 3D structure from any image sequences based on point tracking! VGGSfM is the first fully differentiable SfM framework that unlocks scalability and outperforms conventional SfM methods on standard benchmarks. \n\n- [December 27, 2023]  CoTracker2 is now available! It can now track many more (up to **265*265**!) points jointly and it has a cleaner and more memory-efficient implementation. It also supports online processing. See the [updated paper](https://arxiv.org/abs/2307.07635) for more details. The old version remains available [here](https://github.com/facebookresearch/co-tracker/tree/8d364031971f6b3efec945dd15c468a183e58212).\n\n- [September 5, 2023] You can now run our Gradio demo [locally](./gradio_demo/app.py).\n\n## Quick start\nThe easiest way to use CoTracker is to load a pretrained model from `torch.hub`:\n\n### Offline mode: \n```pip install imageio[ffmpeg]```, then:\n```python\nimport torch\n# Download the video\nurl = 'https://github.com/facebookresearch/co-tracker/raw/refs/heads/main/assets/apple.mp4'\n\nimport imageio.v3 as iio\nframes = iio.imread(url, plugin=\"FFMPEG\")  # plugin=\"pyav\"\n\ndevice = 'cuda'\ngrid_size = 10\nvideo = torch.tensor(frames).permute(0, 3, 1, 2)[None].float().to(device)  # B T C H W\n\n# Run Offline CoTracker:\ncotracker = torch.hub.load(\"facebookresearch/co-tracker\", \"cotracker3_offline\").to(device)\npred_tracks, pred_visibility = cotracker(video, grid_size=grid_size) # B T N 2,  B T N 1\n```\n### Online mode: \n```python\ncotracker = torch.hub.load(\"facebookresearch/co-tracker\", \"cotracker3_online\").to(device)\n\n# Run Online CoTracker, the same model with a different API:\n# Initialize online processing\ncotracker(video_chunk=video, is_first_step=True, grid_size=grid_size)  \n\n# Process the video\nfor ind in range(0, video.shape[1] - cotracker.step, cotracker.step):\n    pred_tracks, pred_visibility = cotracker(\n        video_chunk=video[:, ind : ind + cotracker.step * 2]\n    )  # B T N 2,  B T N 1\n```\nOnline processing is more memory-efficient and allows for the processing of longer videos. However, in the example provided above, the video length is known! See [the online demo](./online_demo.py) for an example of tracking from an online stream with an unknown video length.\n\n### Visualize predicted tracks: \nAfter [installing](#installation-instructions) CoTracker, you can visualize tracks with:\n```python\nfrom cotracker.utils.visualizer import Visualizer\n\nvis = Visualizer(save_dir=\"./saved_videos\", pad_value=120, linewidth=3)\nvis.visualize(video, pred_tracks, pred_visibility)\n```\n\nWe offer a number of other ways to interact with CoTracker:\n\n1. Interactive Gradio demo:\n   - A demo is available in the [`facebook/cotracker` Hugging Face Space 🤗](https://huggingface.co/spaces/facebook/cotracker).\n   - You can use the gradio demo locally by running [`python -m gradio_demo.app`](./gradio_demo/app.py) after installing the required packages: `pip install -r gradio_demo/requirements.txt`.\n2. Jupyter notebook:\n   - You can run the notebook in\n   [Google Colab](https://colab.research.google.com/github/facebookresearch/co-tracker/blob/master/notebooks/demo.ipynb).\n   - Or explore the notebook located at [`notebooks/demo.ipynb`](./notebooks/demo.ipynb). \n2. You can [install](#installation-instructions) CoTracker _locally_ and then:\n   - Run an *offline* demo with 10 ⨉ 10 points sampled on a grid on the first frame of a video (results will be saved to `./saved_videos/demo.mp4`)):\n\n     ```bash\n     python demo.py --grid_size 10\n     ```\n    - Run an *online* demo:\n\n      ```bash\n      python online_demo.py\n      ```\n\nA GPU is strongly recommended for using CoTracker locally.\n\n\u003cimg width=\"500\" src=\"./assets/bmx-bumps.gif\" /\u003e\n\n\n## Installation Instructions\nYou can use a Pretrained Model via PyTorch Hub, as described above, or install CoTracker from this GitHub repo.\nThis is the best way if you need to run our local demo or evaluate/train CoTracker.\n\nEnsure you have both _PyTorch_ and _TorchVision_ installed on your system. Follow the instructions [here](https://pytorch.org/get-started/locally/) for the installation.\nWe strongly recommend installing both PyTorch and TorchVision with CUDA support, although for small tasks CoTracker can be run on CPU.\n\n\n\n\n### Install a Development Version\n\n```bash\ngit clone https://github.com/facebookresearch/co-tracker\ncd co-tracker\npip install -e .\npip install matplotlib flow_vis tqdm tensorboard\n```\n\nYou can manually download all CoTracker3 checkpoints (baseline and scaled models, as well as single and sliding window architectures) from the links below and place them in the `checkpoints` folder as follows:\n\n```bash\nmkdir -p checkpoints\ncd checkpoints\n# download the online (multi window) model\nwget https://huggingface.co/facebook/cotracker3/resolve/main/scaled_online.pth\n# download the offline (single window) model\nwget https://huggingface.co/facebook/cotracker3/resolve/main/scaled_offline.pth\ncd ..\n```\nYou can also download CoTracker3 checkpoints trained only on Kubric:\n```bash\n# download the online (sliding window) model\nwget https://huggingface.co/facebook/cotracker3/resolve/main/baseline_online.pth\n# download the offline (single window) model\nwget https://huggingface.co/facebook/cotracker3/resolve/main/baseline_offline.pth\n```\nFor old checkpoints, see [this section](#previous-version).\n\n## Evaluation\n\nTo reproduce the results presented in the paper, download the following datasets:\n\n- [TAP-Vid](https://github.com/deepmind/tapnet)\n- [Dynamic Replica](https://dynamic-stereo.github.io/)\n\nAnd install the necessary dependencies:\n\n```bash\npip install hydra-core==1.1.0 mediapy\n```\n\nThen, execute the following command to evaluate the online model on TAP-Vid DAVIS:\n\n```bash\npython ./cotracker/evaluation/evaluate.py --config-name eval_tapvid_davis_first exp_dir=./eval_outputs dataset_root=your/tapvid/path\n```\nAnd the offline model:\n```bash\npython ./cotracker/evaluation/evaluate.py --config-name eval_tapvid_davis_first exp_dir=./eval_outputs dataset_root=/fsx-repligen/shared/datasets/tapvid offline_model=True window_len=60 checkpoint=./checkpoints/scaled_offline.pth\n```\nWe run evaluations jointly on all the target points at a time for faster inference. With such evaluations, the numbers are similar to those presented in the paper. If you want to reproduce the exact numbers from the paper, add the flag `single_point=True`. \n\nThese are the numbers that you should be able to reproduce using the released checkpoint and the current version of the codebase:\n|  | Kinetics, $\\delta_\\text{avg}^\\text{vis}$ | DAVIS, $\\delta_\\text{avg}^\\text{vis}$ |  RoboTAP, $\\delta_\\text{avg}^\\text{vis}$ | RGB-S, $\\delta_\\text{avg}^\\text{vis}$| \n| :---: |:---: | :---: | :---: | :---: |\n| CoTracker2, 27.12.23 | 61.8 | 74.6 | 69.6 | 73.4 | \n| CoTracker2.1, 25.09.24 | 63 | 76.1 | 70.6 | 79.6 | \n| CoTracker3 offline, 15.10.24 | 67.8 | **76.9** | 78.0 | **85.0** | \n| CoTracker3 online, 15.10.24 | **68.3** | 76.7 | **78.8** | 82.7  | \n\n\n## Training\n\n### Baseline\nTo train the CoTracker as described in our paper, you first need to generate annotations for [Google Kubric](https://github.com/google-research/kubric) MOVI-f dataset.\nInstructions for annotation generation can be found [here](https://github.com/deepmind/tapnet).\nYou can also find a discussion on dataset generation in [this issue](https://github.com/facebookresearch/co-tracker/issues/8).\n\nOnce you have the annotated dataset, you need to make sure you followed the steps for evaluation setup and install the training dependencies:\n\n```bash\npip install pip==24.0\npip install pytorch_lightning==1.6.0 tensorboard opencv-python\n```\n\nNow you can launch training on Kubric.\nOur model was trained for 50000 iterations on 32 GPUs (4 nodes with 8 GPUs). \nModify _dataset_root_ and _ckpt_path_ accordingly before running this command. For training on 4 nodes, add `--num_nodes 4`. \n\nHere is an example of how to launch training of the online model on Kubric:\n```bash\n python train_on_kubric.py --batch_size 1 --num_steps 50000 \\\n --ckpt_path ./ --model_name cotracker_three --save_freq 200 --sequence_len 64 \\\n  --eval_datasets tapvid_davis_first tapvid_stacking --traj_per_sample 384 \\\n  --sliding_window_len 16 --train_datasets kubric --save_every_n_epoch 5 \\\n  --evaluate_every_n_epoch 5 --model_stride 4 --dataset_root ${path_to_your_dataset} \\\n   --num_nodes 4 --num_virtual_tracks 64 --mixed_precision --corr_radius 3 \\ \n   --wdecay 0.0005 --linear_layer_for_vis_conf --validate_at_start --add_huber_loss\n```\n\nTraining the offline model on Kubric:\n```bash\npython train_on_kubric.py --batch_size 1 --num_steps 50000 \\\n --ckpt_path ./ --model_name cotracker_three --save_freq 200 --sequence_len 60 \\\n --eval_datasets tapvid_davis_first tapvid_stacking --traj_per_sample 512 \\\n --sliding_window_len 60 --train_datasets kubric --save_every_n_epoch 5 \\\n --evaluate_every_n_epoch 5 --model_stride 4 --dataset_root ${path_to_your_dataset} \\\n --num_nodes 4 --num_virtual_tracks 64 --mixed_precision --offline_model \\\n --random_frame_rate --query_sampling_method random --corr_radius 3 \\\n --wdecay 0.0005 --random_seq_len --linear_layer_for_vis_conf \\\n --validate_at_start --add_huber_loss\n```\n\n### Fine-tuning with pseudo labels\nIn order to launch training with pseudo-labelling, you need to collect your own dataset of real videos. There is a sample class available in [`cotracker/datasets/real_dataset.py`](./cotracker/datasets/real_dataset.py) with keyword-based filtering that we used for training. Your class should implement loading a video and storing it in the `CoTrackerData` class as a field, while pseudo labels will be generated in `train_on_real_data.py`.\n\nYou should have an existing Kubric-trained model for fine-tuning with pseudo labels. Here is an example of how you can launch fine-tuning of the online model:\n```bash\npython ./train_on_real_data.py --batch_size 1 --num_steps 15000 \\\n --ckpt_path ./ --model_name cotracker_three --save_freq 200 --sequence_len 64 \\\n --eval_datasets tapvid_stacking tapvid_davis_first --traj_per_sample 384 \\\n --save_every_n_epoch 15 --evaluate_every_n_epoch 15 --model_stride 4 \\\n --dataset_root ${path_to_your_dataset} --num_nodes 4 --real_data_splits 0 \\\n --num_virtual_tracks 64 --mixed_precision --random_frame_rate \\\n --restore_ckpt ./checkpoints/baseline_online.pth \\\n --lr 0.00005 --real_data_filter_sift --validate_at_start \\\n --sliding_window_len 16 --limit_samples 15000\n\n```\nAnd the offline model:\n```bash\npython train_on_real_data.py --batch_size 1 --num_steps 15000 \\\n --ckpt_path ./ --model_name cotracker_three --save_freq 200 --sequence_len 80 \\\n --eval_datasets tapvid_stacking tapvid_davis_first --traj_per_sample 384 --save_every_n_epoch 15 \\\n --evaluate_every_n_epoch 15 --model_stride 4 --dataset_root ${path_to_your_dataset} \\\n  --num_nodes 4 --real_data_splits 0 --num_virtual_tracks 64 --mixed_precision \\\n  --random_frame_rate --restore_ckpt ./checkpoints/baseline_offline.pth --lr 0.00005 \\\n  --real_data_filter_sift --validate_at_start --offline_model --limit_samples 15000\n```\n\n\n\n## Development\n\n### Building the documentation\n\nTo build CoTracker documentation, first install the dependencies:\n\n```bash\npip install sphinx\npip install sphinxcontrib-bibtex\n```\n\nThen you can use this command to generate the documentation in the `docs/_build/html` folder:\n\n```bash\nmake -C docs html\n```\n\n\n## Previous versions\n### CoTracker v2\nYou could use CoTracker v2 with torch.hub in both offline and online modes.\n#### Offline mode: \n```pip install imageio[ffmpeg]```, then:\n```python\nimport torch\n# Download the video\nurl = 'https://github.com/facebookresearch/co-tracker/blob/main/assets/apple.mp4'\n\nimport imageio.v3 as iio\nframes = iio.imread(url, plugin=\"FFMPEG\")  # plugin=\"pyav\"\n\ndevice = 'cuda'\ngrid_size = 10\nvideo = torch.tensor(frames).permute(0, 3, 1, 2)[None].float().to(device)  # B T C H W\n\n# Run Offline CoTracker:\ncotracker = torch.hub.load(\"facebookresearch/co-tracker\", \"cotracker2\").to(device)\npred_tracks, pred_visibility = cotracker(video, grid_size=grid_size) # B T N 2,  B T N 1\n```\n#### Online mode: \n```python\ncotracker = torch.hub.load(\"facebookresearch/co-tracker\", \"cotracker2_online\").to(device)\n\n# Run Online CoTracker, the same model with a different API:\n# Initialize online processing\ncotracker(video_chunk=video, is_first_step=True, grid_size=grid_size)  \n\n# Process the video\nfor ind in range(0, video.shape[1] - cotracker.step, cotracker.step):\n    pred_tracks, pred_visibility = cotracker(\n        video_chunk=video[:, ind : ind + cotracker.step * 2]\n    )  # B T N 2,  B T N 1\n```\n\nCheckpoint for v2 could be downloaded with the following command:\n```bash\nwget https://huggingface.co/facebook/cotracker/resolve/main/cotracker2.pth\n```\n\n### CoTracker v1\nIt is directly available via pytorch hub:\n```python\nimport torch\nimport einops\nimport timm\nimport tqdm\n\ncotracker = torch.hub.load(\"facebookresearch/co-tracker:v1.0\", \"cotracker_w8\")\n```\nThe old version of the code is available [here](https://github.com/facebookresearch/co-tracker/tree/8d364031971f6b3efec945dd15c468a183e58212).\nYou can also download the corresponding checkpoints:\n```bash\nwget https://dl.fbaipublicfiles.com/cotracker/cotracker_stride_4_wind_8.pth\nwget https://dl.fbaipublicfiles.com/cotracker/cotracker_stride_4_wind_12.pth\nwget https://dl.fbaipublicfiles.com/cotracker/cotracker_stride_8_wind_16.pth\n```\n\n## License\n\nThe majority of CoTracker is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Particle Video Revisited is licensed under the MIT license, TAP-Vid and LocoTrack are licensed under the Apache 2.0 license.\n\n## Acknowledgments\n\nWe would like to thank [PIPs](https://github.com/aharley/pips), [TAP-Vid](https://github.com/deepmind/tapnet), [LocoTrack](https://github.com/cvlab-kaist/locotrack) for publicly releasing their code and data. We also want to thank [Luke Melas-Kyriazi](https://lukemelas.github.io/) for proofreading the paper, [Jianyuan Wang](https://jytime.github.io/), [Roman Shapovalov](https://shapovalov.ro/) and [Adam W. Harley](https://adamharley.com/) for the insightful discussions.\n\n## Citing CoTracker\n\nIf you find our repository useful, please consider giving it a star ⭐ and citing our research papers in your work:\n```bibtex\n@inproceedings{karaev23cotracker,\n  title     = {CoTracker: It is Better to Track Together},\n  author    = {Nikita Karaev and Ignacio Rocco and Benjamin Graham and Natalia Neverova and Andrea Vedaldi and Christian Rupprecht},\n  booktitle = {Proc. {ECCV}},\n  year      = {2024}\n}\n```\n```bibtex\n@inproceedings{karaev24cotracker3,\n  title     = {CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos},\n  author    = {Nikita Karaev and Iurii Makarov and Jianyuan Wang and Natalia Neverova and Andrea Vedaldi and Christian Rupprecht},\n  booktitle = {Proc. {arXiv:2410.11831}},\n  year      = {2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Fco-tracker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2Fco-tracker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Fco-tracker/lists"}