{"id":26451072,"url":"https://github.com/adi2334/image-caption-generator","last_synced_at":"2026-04-21T10:02:40.771Z","repository":{"id":283081400,"uuid":"949506487","full_name":"Adi2334/Image-Caption-Generator","owner":"Adi2334","description":"This project implements an image captioning model using a CNN-LSTM architecture. The model takes an image as input and generates a descriptive caption using natural language processing techniques","archived":false,"fork":false,"pushed_at":"2025-03-18T13:14:21.000Z","size":39250,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-18T13:45:59.233Z","etag":null,"topics":["cnn","computer-vision","deep-learning","imagecaptioning","lstm","machine-learning","neural-network","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Adi2334.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-16T16:04:32.000Z","updated_at":"2025-03-18T13:14:25.000Z","dependencies_parsed_at":"2025-03-18T13:56:03.365Z","dependency_job_id":null,"html_url":"https://github.com/Adi2334/Image-Caption-Generator","commit_stats":null,"previous_names":["adi2334/image-caption-generator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Adi2334/Image-Caption-Generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adi2334%2FImage-Caption-Generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adi2334%2FImage-Caption-Generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adi2334%2FImage-Caption-Generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adi2334%2FImage-Caption-Generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Adi2334","download_url":"https://codeload.github.com/Adi2334/Image-Caption-Generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adi2334%2FImage-Caption-Generator/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265553168,"owners_count":23787032,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cnn","computer-vision","deep-learning","imagecaptioning","lstm","machine-learning","neural-network","tensorflow"],"created_at":"2025-03-18T16:31:35.341Z","updated_at":"2026-04-21T10:02:40.713Z","avatar_url":"https://github.com/Adi2334.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **Image Caption Generator using Deep Learning**  \n\n## **Overview**  \nThis project implements an **image captioning model** using a **CNN-LSTM architecture**. The model takes an image as input and generates a descriptive caption using natural language processing techniques. It is trained on a dataset containing images and their corresponding textual descriptions.\n\n## **Dataset**  \n- The model is trained on **Flickr8k** dataset.  \n- It consists of **8000** images with multiple captions per image.  \n#### **Data Augmentation**  \nTo improve model performance, images were **horizontally flipped**.\n\n## **Model Architecture**  \nThe model consists of three main components:  \n1. **Image Feature Extractor (CNN)**  \n   - Uses **Xception** to extract feature from images.  \n2. **Sequence Processor (LSTM)**  \n   - An **embedding layer** processes input text sequences.  \n   - An **LSTM network** learns dependencies between words in a sentence.  \n3. **Decoder (Dense Layer with Softmax)**  \n   - Combines image features and text sequences.  \n   - Generates the next word in the caption.  \n\n![model-image](./model_3.png)\nTo view the model architecture in detail you may use [Netron](https://netron.app/) by uploading saved model.\n\n\u003c!-- ## **Training Details**  \n- **Optimizer:** Adam  \n- **Loss function:** Categorical Crossentropy  \n- **Batch size:** 64  \n- **Training \u0026 Validation split:** 30,000 images for training, 1,000 images for validation.   --\u003e\n\n## **Evaluation Metrics**  \nThe model is evaluated using the following metrics:  \n📌 **BLEU-1:** 0.6131  \n📌 **BLEU-2:** 0.5453  \n📌 **BLEU-3:** 0.4483  \n📌 **BLEU-4:** 0.3635  \n📌 **ROUGE-L:** 0.3314  \n📌 **CIDEr:** 0.0497  \n📌 **SPICE:** 0.0451  \n\n## **How to Use**  \n#### **1. Clone the Repository**  \n```bash\ngit clone https://github.com/yourusername/image-captioning.git\ncd image-captioning\n```\n\n#### **2. Install Dependencies**  \n```bash\npip install -r requirements.txt\n```\n#### **3. Extract Features**  \n```bash\nmkdir data\npython utils/preprocess.py\npython utils/feature_extract.py\npython utils/data_loader.py\n```\n\n#### **4. Training**  \nYou can also use pretrained weigths.\n```bash\npython train.py\n```\n\n#### **5. Run the Model**  \nTo test the model with your own images:  \n```bash\npython test.py --image_path path/to/image.jpg\n```\n\n#### **6. Streamlit Web App**  \nRun the **Streamlit** interface for uploading images and generating captions:  \n```bash\nstreamlit run Streamlit.py\n```\n\n#### **7. Evaluation of Model**  \nEvaluate the model based on some **NLP** metrics commonly used for :  \n```bash\npython evaluation/test_cap.py\npython evaluation/evaluation.py\n```\n\n\n## **Results**  \nExample output from the model:  \n\n| **Input Image** |  ![example-image](./OIP3.jpg) |\n|---------------|-------------------|\n|**Generated Caption** | *\"man in the water\"* |\n\n## **Future Improvements**  \n🔹 Train on a larger dataset for improved generalization.  \n🔹 Experiment with **Transformer-based models** (e.g., ViT + GPT-2, BLIP).  \n\n## **Contributor**  \n👤 **[Aditya Nikam](https://www.linkedin.com/in/aditya-nikam-4885bb232/)** student at IIT Kanpur\n    - contact (23alpha34@gmail.com / adityarn21@iitk.ac.in) \n  \n----------------------------------------------------------------\n\n\u003c!-- ### **Code documentation** --\u003e\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadi2334%2Fimage-caption-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadi2334%2Fimage-caption-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadi2334%2Fimage-caption-generator/lists"}