https://github.com/es7/adobe-behaviour-simulation-challenge
Problem Statements of InterIIT TechMeet 13.0 by Adobe
https://github.com/es7/adobe-behaviour-simulation-challenge
adobe artificial-intelligence interiit-techmeet machine-learning
Last synced: 9 months ago
JSON representation
Problem Statements of InterIIT TechMeet 13.0 by Adobe
- Host: GitHub
- URL: https://github.com/es7/adobe-behaviour-simulation-challenge
- Owner: ES7
- License: apache-2.0
- Created: 2024-10-11T17:53:38.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-17T10:51:28.000Z (over 1 year ago)
- Last Synced: 2025-08-30T09:06:52.400Z (9 months ago)
- Topics: adobe, artificial-intelligence, interiit-techmeet, machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 1.37 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Adobe-Behaviour-Simulation-Challenge
This repository contains my team's solution for the Mid Prep Problem Statement of InterIIT TechMeet 13.0, **"Adobe Behaviour Simulation Challenge"** by Adobe.
## Data
- **1. Image_Captioning.ipynb:** In this notebook we have downloaded the media files and generate caption for them using **BLIP** model.
- **2. 100K_Data_Generation.ipynb:** In this notebook we have reduced the length of the actual dataset (300K) and only consider the first 100K samples.
- **3. Data_Cleaning.ipynb:** In this notebook we have combined the tokenized content text with the generated captions along with their embeddings, and then splitted this into train and test datasets.
## Task 1
- **Tokenization.ipynb:** In this notebook we have tokenized the content (captions) of the given dataset using the **DistilBERT** tokenizer.
- **Creating_Prompts.ipynb:** In this notebook we have written prompt to convert the data into more organized format for the bucketing process.
- **Bucketing.ipynb:** In this notebook we have created 7 buckets (classes) and trained the **DistilBERT** model for this classification task.

## Task 2
- **FineTune.ipynb:** In this notebook we have Fine Tuned the **Bloom 7b** and **Qwen 1.5b** models using **LoRA** and **PEFT** techniques.
- **Inference.ipynb:** In this notebook, using the fine tuned models we have performed predictions on the first 500 samples.
- **Test_Metrics.ipynb:** In this notebook we have used **BLEU (1-4)**, **ROUGE (1,2,L)** and **CIDEr** metrics to evaluate the model's predictions with the original captions.

## Team
* [**Ebad Sayed**](https://github.com/ES7)
* [**Aaditya Jain**](https://github.com/a-aditya-jain)