https://github.com/saman-nia/tasks
https://github.com/saman-nia/tasks
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/saman-nia/tasks
- Owner: saman-nia
- Created: 2024-04-17T18:17:04.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-22T03:23:31.000Z (about 1 year ago)
- Last Synced: 2025-01-12T18:11:25.156Z (4 months ago)
- Language: Jupyter Notebook
- Size: 1.06 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Pinterest Fashion Dataset Analysis
This project is an extensive analysis of the Pinterest Fashion Dataset, focusing on data cleaning, exploratory data analysis, and building a retrieval-augmented generation system for product recommendation based on demographic data.
## Overview
The analysis is divided into two main parts:
1. **Data Analysis**: Involves cleaning, preprocessing, and exploratory data analysis (EDA) of the fashion dataset.
2. **Retrieval-Augmented Generation (RAG)**: Development of a system that utilizes user queries and demographic data to recommend fashion items.## Installation
Before running the notebooks, ensure you have the following libraries installed:
## Part 1: Data Analysis
The initial phase involves loading the dataset into a pandas DataFrame, followed by cleaning and preprocessing:
- Handling missing values by imputation or removal.
- Converting data formats, e.g., changing the price column to numeric.### Exploratory Data Analysis (EDA)
Key insights derived from the EDA include relationships between price, click-through rates, and ratings. Visualizations are created using histograms and scatter plots to understand these relationships better.
### Visualizations
Here are some of the key visualizations generated during the analysis:
## Histogram of Prices
![]()
## Histogram of Prices
![]()
## Scatter Plot of Click-through Rates vs. Ratings
![]()
## Part 2: Retrieval-Augmented Generation (RAG)
This section creates a function to retrieve products based on user queries:
### Product Recommendation System
A simple RAG system is developed that takes a user’s age, gender, and location as input to recommend relevant products. The system utilizes either a built retrieval model or existing data to make these recommendations.
## Distribution of Product Ratings
![]()
### Example Recommendations
**For a 35-year-old female in California interested in Shoes**
> Based on your interests in Shoes and considering your location in California, we recommend Converse because it has a high rating of 5 stars and is priced at just $49.50, fitting well within your budget.
**For a 55-year-old male in California interested in Sunglasses**
> Based on your interests in Sunglasses and considering your location in California, we recommend Burberry because it has a high rating of 5 stars and is priced at just $87.50, fitting well within your budget.
## Conclusion
This repository contains comprehensive analyses and a system capable of dynamically recommending products based on user preferences and demographics. The methodologies and technologies employed demonstrate effective data analysis and machine learning techniques in a real-world application.