https://github.com/xaxm007/video-captioning-transformer

For understanding working of transformer.
https://github.com/xaxm007/video-captioning-transformer

deep-learning note progress transformer video-captioning video-captioning-transformer

Last synced: 6 months ago
JSON representation

For understanding working of transformer.

Host: GitHub
URL: https://github.com/xaxm007/video-captioning-transformer
Owner: xaxm007
Created: 2024-06-10T09:49:54.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-06-29T16:07:38.000Z (over 1 year ago)
Last Synced: 2025-02-09T13:44:31.360Z (8 months ago)
Topics: deep-learning, note, progress, transformer, video-captioning, video-captioning-transformer
Homepage:
Size: 11.7 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Video-Captioning-Transformer
For transformer understanding (This is just a note for me to implement the original project).

# Video Captioning Transformer Project

This project aims to generate captions for videos using a Transformer model. The project integrates multiple repositories, datasets, and pre-trained models to create a comprehensive video captioning solution. Below is a detailed guide on setting up and using the project.

## Table of Contents

1. [Repositories](#repositories)
2. [Datasets](#datasets)
3. [Pre-trained Models](#pre-trained-models)
4. [Dependencies](#dependencies)
5. [Setup Instructions](#setup-instructions)
6. [Usage](#usage)
7. [Notes](#notes)

## Repositories

### Main Repositories

- **Video-Captioning-Transformer**
- Repository: [Video-Captioning-Transformer](https://github.com/Kamino666/Video-Captioning-Transformer/tree/master)
- Description: Transformer model for video captioning.

- **Video-Features**
- Repository: [Video-Features](https://github.com/Kamino666/video_features/tree/master)
- Description: Repository for extracting video features.

## Datasets

- **Dataloader**
- Repository: [MSVD Dataloader](https://github.com/albanie/collaborative-experts/blob/master/misc/datasets/msvd/README.md)
- Description: Dataloader for MSVD dataset.

- **Baidu Dataset**
- Link: [Baidu MSRVTT and MSVD Dataset](https://pan.baidu.com/s/1xG5F856VNEjNXD6JcG_4NA?pwd=aupi#list/path=%2Fsharelink3411495947-318895376070041%2FMSRVTT%20and%20MSVD&parentPath=%2Fsharelink3411495947-318895376070041)
- Password: `aupi`
- Description: MSRVTT and MSVD datasets available for download.

## Pre-trained Models

- **CLIP4Clip Model**
- Model File: [clip4clip_msrvtt.pth](https://drive.google.com/file/d/1-aA6Zc-cK38TjC0JPfbttE009Bh3BtG_/view)
- Paper: [CLIP4Clip Paper](https://arxiv.org/pdf/2104.08860)
- Repository: [CLIP4Clip Repo](https://github.com/ArrowLuo/CLIP4Clip?tab=readme-ov-file)

- **I3D Model**
- Repository: [ID3 Model](https://github.com/hassony2/kinetics_i3d_pytorch)
- Description: Pre-trained I3D model for extracting video features.

## Dependencies

- **mmcv**
- Installation Guide: [mmcv Installation](https://mmcv.readthedocs.io/en/latest/get_started/installation.html)
- Note: Follow the instructions carefully to avoid errors.

## Setup Instructions

### 1. Create Conda Environment

```sh
conda create -n video_captioning python=3.8
conda activate video_captioning
```

To ensure the project runs smoothly, follow these additional steps:

### Setting Up Data Loaders

1. Navigate to the `Video-Captioning-Transformer` repository.

2. Configure the data loader to use the MSVD dataset:
- Edit the configuration file to set the path to your MSVD dataset.
- Example:
```yaml
dataset:
name: MSVD
path: /path/to/your/MSVD/dataset
```

3. Configure the data loader to use the MSRVTT dataset:
- Edit the configuration file to set the path to your MSRVTT dataset.
- Example:
```yaml
dataset:
name: MSRVTT
path: /path/to/your/MSRVTT/dataset
```

### Training the Model

1. Ensure you are in the `Video-Captioning-Transformer` directory.

2. Run the training script with the appropriate configuration:
```sh
python train.py --config configs/train_config.yaml
```
### Additional Transformer Repositories

In addition to the main repositories, the project also integrates the following repositories for enhanced transformer capabilities:

- **BMT (Bidirectional Multimodal Transformer)**
- Repository: [BMT](https://github.com/v-iashin/BMT)
- Description: Bidirectional Multimodal Transformer for multimodal tasks.

- **MDVC (Modality Distillation with Visual Concept)**
- Repository: [MDVC](https://github.com/v-iashin/MDVC)
- Description: Repository for modality distillation with visual concepts.

These repositories offer additional transformer architectures and functionalities, further enhancing the capabilities of the video captioning transformer model.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/xaxm007/video-captioning-transformer

Awesome Lists containing this project

README