{"id":20649628,"url":"https://github.com/rsn601kri/amazon_ml_hack-24","last_synced_at":"2025-10-03T19:08:48.917Z","repository":{"id":257130878,"uuid":"857403618","full_name":"RSN601KRI/Amazon_ML_Hack-24","owner":"RSN601KRI","description":"As digital marketplaces expand, many products lack detailed textual descriptions. This makes it essential to obtain key details directly from images. Our task is to build a model that can accurately identify and extract these details, providing valuable information for product listings.","archived":false,"fork":false,"pushed_at":"2024-09-14T16:47:16.000Z","size":7638,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-09T19:18:20.416Z","etag":null,"topics":["aiml","deep-learning","image-processing","ocr-recognition","python"],"latest_commit_sha":null,"homepage":"https://colab.research.google.com/drive/1F1H8yXJLJ-SHOBfyl2BfEPTx6Bokxoc6#scrollTo=dWy82Yp3YpZd","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RSN601KRI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-14T15:09:32.000Z","updated_at":"2024-09-14T16:50:52.000Z","dependencies_parsed_at":"2024-09-15T01:16:41.533Z","dependency_job_id":"3c1658f5-cfa8-4e80-8b08-f508e1241c21","html_url":"https://github.com/RSN601KRI/Amazon_ML_Hack-24","commit_stats":null,"previous_names":["rsn601kri/amazon_ml_hack-24"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/RSN601KRI/Amazon_ML_Hack-24","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RSN601KRI%2FAmazon_ML_Hack-24","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RSN601KRI%2FAmazon_ML_Hack-24/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RSN601KRI%2FAmazon_ML_Hack-24/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RSN601KRI%2FAmazon_ML_Hack-24/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RSN601KRI","download_url":"https://codeload.github.com/RSN601KRI/Amazon_ML_Hack-24/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RSN601KRI%2FAmazon_ML_Hack-24/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278212556,"owners_count":25949142,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-03T02:00:06.070Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aiml","deep-learning","image-processing","ocr-recognition","python"],"created_at":"2024-11-16T17:15:32.603Z","updated_at":"2025-10-03T19:08:48.882Z","avatar_url":"https://github.com/RSN601KRI.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Amazon-ML-Challenge\n\n## About Amazon ML Challenge\n![](https://he-s3.s3.amazonaws.com/media/cache/0a/be/0abe8c0908dcb9e67941600739f6d651.png)\u003c/br\u003e\nAmazon ML Challenge is a two-stage competition where students from all engineering campuses across India will get a unique opportunity to work on Amazon’s dataset to bring in fresh ideas and build innovative solutions for a real-world problem statement. The top three winning teams will receive pre-placement interviews (PPIs) for ML roles at Amazon along with cash prizes and certificates.\n\n## \u003cb\u003eImage Entity Extraction from Product Images\u003c/b\u003e\u003c/br\u003e\n\n## ML Challenge Stages\n![amazon mll](https://github.com/user-attachments/assets/b77bb2ad-9316-419e-aa2c-36cbd046b544)\n\n## Datset Link🔗\nhttps://unstop.com/hackathons/amazon-ml-challenge-amazon-1100713\n\n## Overview\n\nThis project aims to develop a machine-learning model for extracting entity values from product images. The goal is to automate the extraction of key product details such as weight, dimensions, and other attributes directly from images. This capability is crucial for digital marketplaces where product information is often incomplete or missing.\n\n## Problem Statement\n\nAs digital marketplaces expand, many products lack detailed textual descriptions. This makes it essential to obtain key details directly from images. Our task is to build a model that can accurately identify and extract these details, providing valuable information for product listings.\n\n## Full Train/Test dataset details:\u003c/br\u003e\n\nindex: A unique identifier (ID) for the data sample.\u003c/br\u003e\nimage_link: Public URL where the product image is available for download. Example link - https://m.media-amazon.com/images/I/71XfHPR36-L.jpg  To download images, use the download_images function from src/utils.py. See sample code in src/test.ipynb.\u003c/br\u003e\ngroup_id: Category code of the product.\u003c/br\u003e\nentity_name: Product entity name. For example, “item_weight”.\u003c/br\u003e\nentity_value: Product entity value. For example, “34 gram”.\u003c/br\u003e\nNote: For test.csv, you will not see the column entity_value as it is the target variable.\u003c/br\u003e\n\n## Team Members\n1. [Roshni Kumari](https://github.com/RSN601KRI)\n2. [Antima Mishra](https://github.com/antima-123bit)\n3. [Raushan Kumar](https://github.com/raushan0422)\n4. [Varshita R](https://www.linkedin.com/in/varshitha-r-616b15241/)\n\n## Data Description\n\nThe dataset consists of the following files:\n\n- `dataset/train.csv`: Training data with labels.\n- `dataset/test.csv`: Test data without labels (for predictions).\n- `dataset/sample_test.csv`: Sample test input file.\n- `dataset/sample_test_out.csv`: Sample output file showing the correct format.\n\n**Columns:**\n\n- **index**: Unique identifier for each data sample.\n- **image_link**: URL to download the product image.\n- **group_id**: Category code of the product.\n- **entity_name**: Name of the product entity (e.g., \"item_weight\").\n- **entity_value**: Value of the product entity (e.g., \"34 gram\").\n\n## Objective\n\nDevelop a machine learning model to extract entity values from product images and generate predictions in the format \"x unit\", where `x` is a float number and `unit` is one of the allowed units.\n\n## Approach\n\n1. **Data Preparation:**\n   - **Image Downloading:** Use the `download_images` function from `src/utils.py` to fetch images.\n   - **Preprocessing:** Normalize and resize images for model input.\n\n2. **Feature Extraction:**\n   - Use Convolutional Neural Networks (CNNs) for feature extraction. Consider architectures like ResNet, Inception, or EfficientNet.\n   - Fine-tune pre-trained models if necessary.\n\n3. **Entity Extraction:**\n   - Implement a Multi-Label Classification model or object detection models (e.g., YOLO, Faster R-CNN) to identify and classify entities within images.\n\n4. **Post-Processing:**\n   - Format predictions to match the required output format and ensure predictions are invalid units as listed in `src/constants.py`.\n\n## Evaluation\n\n- **Metrics:** The performance will be evaluated based on the F1 score, which considers Precision and Recall.\n- **Scoring Formula:**\n  \\[\n  \\text{F1 Score} = 2 \\times \\frac{\\text{Precision} \\times \\text{Recall}}{\\text{Precision} + \\text{Recall}}\n  \\]\n\n![Amazon ML](https://github.com/user-attachments/assets/fdfacc36-6b3c-444a-b72a-09865cfc062a)\n\n![output](https://github.com/user-attachments/assets/eea549b3-c5a1-46ce-9ee2-46f4a3452f71)\n\n## Resources\n\n- [TensorFlow Tutorial on CNNs](https://www.tensorflow.org/tutorials/images/cnn)\n- [PyTorch Image Classification](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)\n- [YOLO (You Only Look Once)](https://pjreddie.com/darknet/yolo/)\n- [Faster R-CNN Tutorial](https://github.com/facebookresearch/detectron2/blob/main/tools/train_net.py)\n- [Precision, Recall, F1 Score Explained](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics)\n\n## Files\n\n- **`src/sanity.py`**: Ensures the final output file meets formatting requirements.\n- **`src/utils.py`**: Contains functions for downloading images.\n- **`src/constants.py`**: Lists allowed units for entity values.\n- **`sample_code.py`**: Sample code for generating output files (optional).\n\n## Submission\n\n- Generate predictions for `dataset/test.csv` and format them according to `dataset/sample_test_out.csv`.\n- Submit the `test_out.csv` file in the portal with the exact formatting.\n\n## Conclusion\n\nBy automating the extraction of product details from images, this project aims to enhance data accuracy, improve efficiency, and provide a better user experience in digital marketplaces.\n\n# Connect With Me\nLinkedIn : https://www.linkedin.com/in/roshnikumari1/\u003cbr/\u003e\nEmail : roshni06k2004@gmail.com\u003cbr/\u003e\nTwitter : www.twitter.com/RoshniK29147303\u003c/br\u003e\nWebsite : https://bento.me/roshnikri \u003c/br\u003e\n# Personal\nName: Roshni Kumari\u003cbr/\u003e\nUniversity: Galgotias University, Noida(UP)\n\n# Gratitude\nThank You, if you like it please leave a Star.⭐\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frsn601kri%2Famazon_ml_hack-24","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frsn601kri%2Famazon_ml_hack-24","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frsn601kri%2Famazon_ml_hack-24/lists"}