{"id":20497197,"url":"https://github.com/fork123aniket/encoder-decoder-based-video-captioning","last_synced_at":"2025-08-28T22:26:38.496Z","repository":{"id":158619636,"uuid":"570890156","full_name":"fork123aniket/Encoder-Decoder-based-Video-Captioning","owner":"fork123aniket","description":"Implementation of Encoder-Decoder Model for Video Captioning in Tensorflow","archived":false,"fork":false,"pushed_at":"2022-11-27T07:38:31.000Z","size":80413,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T18:43:56.481Z","etag":null,"topics":["encoder-decoder","encoder-decoder-model","keras-model","keras-tensorflow","tensorflow","video-caption","video-captioning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fork123aniket.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-11-26T13:15:14.000Z","updated_at":"2025-01-13T23:31:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"60375c1b-700e-4fab-af00-357b6480ee20","html_url":"https://github.com/fork123aniket/Encoder-Decoder-based-Video-Captioning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fork123aniket/Encoder-Decoder-based-Video-Captioning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fork123aniket%2FEncoder-Decoder-based-Video-Captioning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fork123aniket%2FEncoder-Decoder-based-Video-Captioning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fork123aniket%2FEncoder-Decoder-based-Video-Captioning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fork123aniket%2FEncoder-Decoder-based-Video-Captioning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fork123aniket","download_url":"https://codeload.github.com/fork123aniket/Encoder-Decoder-based-Video-Captioning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fork123aniket%2FEncoder-Decoder-based-Video-Captioning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272568136,"owners_count":24956955,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-28T02:00:10.768Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["encoder-decoder","encoder-decoder-model","keras-model","keras-tensorflow","tensorflow","video-caption","video-captioning"],"created_at":"2024-11-15T18:10:19.775Z","updated_at":"2025-08-28T22:26:38.491Z","avatar_url":"https://github.com/fork123aniket.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Encoder-Decoder-based Video Captioning\n\nThis repository provides an ***Encoder-Decoder Sequence-to-Sequence*** model to generate captions for input videos. Moreover, ***pre-Trained VGG16 model*** is being used to extract features for every frame of the video.\n\nThe ability to be applied for numerous applications mark ***Video Captioning***'s importance. For example, it can be applied to help search videos across web pages in an efficient manner and it can also cluster the videos having a large degree of similarity in terms of their respective generated captions.\n\n## Requirements\n- `Tensorflow`\n- `Keras`\n- `OpenCV`\n- `NumPy`\n- `FuncTools`\n\n## Usage\n### Data\n- The ***MSVD*** dataset developed by Microsoft can be downloaded from [***here***](https://www.dropbox.com/sh/whatkfg5mr4dr63/AACKCO3LwSsHK4_GOmHn4oyYa?dl=0).\n- This data set contains 1450 short YouTube clips that have been manually labeled for training and 100 videos for testing.\n- Each video has been assigned a unique ID and each ID has about 15–20 captions.\n### Training and Testing\n- To extract features for frames of every single input videos using pre-Trained VGG16 model, run `Extract_Features_Using_VGG.py`.\n- To train the developed model, run `training_model.py`.\n- To use the trained ***Video Captioning*** model for inference, run `predict_model.py`.\n- To use the trained model for ***real-time Video-Caption generation***, run `Video_Captioning.py`.\n\n## Results\nFollowing are a few results of the developed ***Video Captioning*** approach on test videos:-\n| Test Video        | Generated Caption           |\n| ------------------- |:----------------------------:|\n| ![alt text](https://github.com/fork123aniket/Encoder-Decoder-based-Video-Captioning/blob/main/input_videos/0lh_UWF9ZP4_62_69.gif) | a woman is mixing some food |\n| \u003cimg src=\"https://github.com/fork123aniket/Encoder-Decoder-based-Video-Captioning/blob/main/input_videos/7NNg0_n-bS8_21_30.gif\" width=\"320\"\u003e | a man is performing on a stage |\n| ![alt text](https://github.com/fork123aniket/Encoder-Decoder-based-Video-Captioning/blob/main/input_videos/ezgif-4-989de822710c.gif) | a man is mixing ingredients in a bowl |\n| \u003cimg src=\"https://github.com/fork123aniket/Encoder-Decoder-based-Video-Captioning/blob/main/input_videos/Je3V7U5Ctj4_569_576.gif\" width=\"320\"\u003e | a man is spreading a tortilla |\n| \u003cimg src=\"https://github.com/fork123aniket/Encoder-Decoder-based-Video-Captioning/blob/main/input_videos/qeKX-N1nKiM_0_5.gif\" width=\"320\"\u003e | a woman is seasoning some food |\n| ![alt text](https://github.com/fork123aniket/Encoder-Decoder-based-Video-Captioning/blob/main/input_videos/TZ860P4iTaM_15_28.gif) | a cat is playing the piano |\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffork123aniket%2Fencoder-decoder-based-video-captioning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffork123aniket%2Fencoder-decoder-based-video-captioning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffork123aniket%2Fencoder-decoder-based-video-captioning/lists"}