{"id":17716102,"url":"https://github.com/LTContext/LTContext","last_synced_at":"2025-03-14T01:30:32.363Z","repository":{"id":188736299,"uuid":"679282252","full_name":"LTContext/LTContext","owner":"LTContext","description":"[ICCV 2023] How Much Temporal Long-Term Context is Needed for Action Segmentation?","archived":false,"fork":false,"pushed_at":"2024-06-21T09:52:25.000Z","size":342,"stargazers_count":37,"open_issues_count":2,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-07-05T13:54:32.644Z","etag":null,"topics":["computer-vision","deep-learning","iccv2023","video-understanding"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2308.11358","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LTContext.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-08-16T13:47:46.000Z","updated_at":"2024-07-01T09:13:17.000Z","dependencies_parsed_at":"2024-02-21T17:39:07.291Z","dependency_job_id":null,"html_url":"https://github.com/LTContext/LTContext","commit_stats":null,"previous_names":["ltcontext/ltcontext"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LTContext%2FLTContext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LTContext%2FLTContext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LTContext%2FLTContext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LTContext%2FLTContext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LTContext","download_url":"https://codeload.github.com/LTContext/LTContext/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221427313,"owners_count":16819008,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","iccv2023","video-understanding"],"created_at":"2024-10-25T13:00:48.670Z","updated_at":"2024-10-25T13:02:10.409Z","avatar_url":"https://github.com/LTContext.png","language":"Python","funding_links":[],"categories":["Paper List"],"sub_categories":["Fully-Supervised"],"readme":"# LTContext\n\nLTContext is an approach for temporal action segmentation, where it leverages\nsparse attention to capture the long-term context of a video \nand windowed attention to model the local information in the neighboring frames.\n\nHere is an overview of the architecture:\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"figs/ltcontext_arch-1.jpg\" alt=\"LTContext\" width=\"80%\" height=\"auto\"/\u003e\n\u003c/p\u003e\n\nThe attention mechanism consist of Windowed Attention and Long-term Context Attention:\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"figs/attentions-1.jpg\" alt=\"Attentions\" width=\"80%\" height=\"auto\"/\u003e\n\u003c/p\u003e\n\n## Citation\nIf you use this code or our model, please cite our [paper](https://arxiv.org/abs/2308.11358):\n```latex\n@inproceedings{ltc2023bahrami,\n    author    = {Emad Bahrami and Gianpiero Francesca and Juergen Gall},\n    title     = {How Much Temporal Long-Term Context is Needed for Action Segmentation?},\n    booktitle = {IEEE International Conference on Computer Vision (ICCV)},\n    year      = {2023}\n}\n```\n\n## Installation\n\nTo create the [conda](https://docs.conda.io/en/latest/) environment run the following command:\n```bash\nconda env create --name ltc --file environment.yml\nsource activate ltc \n```\n\n\n## Dataset Preparation\nThe features and annotations of the Breakfast dataset can be downloaded from \n[link 1](https://mega.nz/#!O6wXlSTS!wcEoDT4Ctq5HRq_hV-aWeVF1_JB3cacQBQqOLjCIbc8) \nor \n[link 2](https://zenodo.org/record/3625992#.Xiv9jGhKhPY).\n\n### Assembly101\nFollow the instructions at [Assembly101-Download-Scripts](https://github.com/assembly-101/assembly101-download-scripts) to download the TSM features.\nThe annotations for action segmentation can be downloaded from [assembly101-annotations](https://drive.google.com/drive/folders/1QoT-hIiKUrSHMxYBKHvWpW9Z9aCznJB7). \nAfter downloading the annotation put `coarse-annotations` inside `data/assembly101` folder.\nWe noticed loading from `numpy` can be faster, you can convert the `.lmdb` features to `numpy` and use `LTContext_Numpy.yaml` config. \n\n## Training and Evaluating the model\n\nHere is an example of the command to train the model. \n```bash\npython run_net.py \\\n  --cfg configs/Breakfast/LTContext.yaml \\\n  DATA.PATH_TO_DATA_DIR [path_to_your_dataset] \\\n  OUTPUT_DIR [path_to_logging_dir]\n```\nFor more options look at `ltc/config/defaults.py`.\n\nThe value of `DATA.PATH_TO_DATA_DIR` for assembly101 should be the path to the folder containing the TSM features. \n\nIf you want to evaluate a pretrained model use the following command.\n```bash\npython run_net.py \\\n  --cfg configs/Breakfast/LTContext.yaml \\\n  DATA.PATH_TO_DATA_DIR [path_to_your_dataset] \\\n  TRAIN.ENABLE False \\\n  TEST.ENABLE True \\\n  TEST.DATASET 'breakfast' \\\n  TEST.CHECKPOINT_PATH  [path_to_trained_model] \\\n  TEST.SAVE_RESULT_PATH [path_to_save_result]\n```\nCheck the [model card](MODEL_CARD.md) to download the pretrained models.\n\n### Acknowledgement\nThe structure of the code is inspired by [SlowFast](https://github.com/facebookresearch/SlowFast). \nThe MultiHeadAttention is based on [xFormers](https://github.com/facebookresearch/xformers).\nWe thank the authors of these codebases.\n\n## License\n\n\u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc/4.0/\"\u003e\u003cimg alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc/4.0/88x31.png\" /\u003e\u003c/a\u003e\u003cbr /\u003eThis work is licensed under a \u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc/4.0/\"\u003eCreative Commons Attribution-NonCommercial 4.0 International License\u003c/a\u003e.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLTContext%2FLTContext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLTContext%2FLTContext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLTContext%2FLTContext/lists"}