https://github.com/karenwky/recommendation_system_allrecipes
recommending recipes with content-based filtering approach
https://github.com/karenwky/recommendation_system_allrecipes
content-based-filtering pandas recommendation-system
Last synced: 6 months ago
JSON representation
recommending recipes with content-based filtering approach
- Host: GitHub
- URL: https://github.com/karenwky/recommendation_system_allrecipes
- Owner: karenwky
- License: mit
- Created: 2019-09-27T14:34:14.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-09-05T13:24:57.000Z (about 1 year ago)
- Last Synced: 2024-09-18T06:43:12.993Z (about 1 year ago)
- Topics: content-based-filtering, pandas, recommendation-system
- Language: Jupyter Notebook
- Homepage:
- Size: 8.58 MB
- Stars: 6
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Recommendation System: Allrecipes.com
Recommending recipes with content-based filtering approach by feature extraction (nutrition values).## Data Source
Data from [Kaggle](https://www.kaggle.com/elisaxxygao/foodrecsysv1) user elisaxxygao, containing two sets of data about user interaction information and recipe information, recipe images are also provided.## Feature Extraction
![]()
For comparing similarity between recipes, 7 nutritions are selected and daily percent values are extracted which are based on a 2,000 calorie diet.## Distance Calculation Methods
After doing normalization for the nutrition data, three distance calculation methods are applied to experiment Top 3 recommendation results.
![]()
1. Cosine Distance

2. Euclidean Distance

3. Hamming Distance
Nutrition data are similar and different results are generated by various distance calculation methods. Thus, a hybrid recommender is created to integrate recommendations from three approaches.
## Hybrid Recommender
```python
"""
Hybrid Nutrition Recommender which integrates Top 2 recommendations from 3 different distance approaches
(cosine, euclidean, hamming) and sort the results by selected criteria(s)df_normalized: normalized nutrition data
recipe_id: find similar recipes based on the selected recipe
sort_order: must be in list, 4 options available: ['aver_rate'], ['review_nums'], ['aver_rate', 'review_nums'], ['review_nums', 'aver_rate']
N: Top N recipe(s)return 1) recipe id, recipe name and image of Top N recommendation,
2) nutrition data of selected recipe and Top N recommendation,
3) average rating and number of review of Top N recommendation
"""def nutrition_hybrid_recommender(recipe_id, sort_order, N):
start = time()
allRecipes_cosine = pd.DataFrame(df_normalized.index)
allRecipes_cosine = allRecipes_cosine[allRecipes_cosine.recipe_id != recipe_id]
allRecipes_cosine["distance"] = allRecipes_cosine["recipe_id"].apply(lambda x: cosine(df_normalized.loc[recipe_id], df_normalized.loc[x]))
allRecipes_euclidean = pd.DataFrame(df_normalized.index)
allRecipes_euclidean = allRecipes_euclidean[allRecipes_euclidean.recipe_id != recipe_id]
allRecipes_euclidean["distance"] = allRecipes_euclidean["recipe_id"].apply(lambda x: euclidean(df_normalized.loc[recipe_id], df_normalized.loc[x]))
allRecipes_hamming = pd.DataFrame(df_normalized.index)
allRecipes_hamming = allRecipes_hamming[allRecipes_hamming.recipe_id != recipe_id]
allRecipes_hamming["distance"] = allRecipes_hamming["recipe_id"].apply(lambda x: hamming(df_normalized.loc[recipe_id], df_normalized.loc[x]))
Top2Recommendation_cosine = allRecipes_cosine.sort_values(["distance"]).head(2).sort_values(by=['distance', 'recipe_id'])
Top2Recommendation_euclidean = allRecipes_euclidean.sort_values(["distance"]).head(2).sort_values(by=['distance', 'recipe_id'])
Top2Recommendation_hamming = allRecipes_hamming.sort_values(["distance"]).head(2).sort_values(by=['distance', 'recipe_id'])
recipe_df = recipe.set_index('recipe_id')
hybrid_Top6Recommendation = pd.concat([Top2Recommendation_cosine, Top2Recommendation_euclidean, Top2Recommendation_hamming])
aver_rate_list = []
review_nums_list = []
for recipeid in hybrid_Top6Recommendation.recipe_id:
aver_rate_list.append(recipe_df.at[recipeid, 'aver_rate'])
review_nums_list.append(recipe_df.at[recipeid, 'review_nums'])
hybrid_Top6Recommendation['aver_rate'] = aver_rate_list
hybrid_Top6Recommendation['review_nums'] = review_nums_list
TopNRecommendation = hybrid_Top6Recommendation.sort_values(by=sort_order, ascending=False).head(N).drop(columns=['distance'])
recipe_id = [recipe_id]
recipe_list = []
image_list = []
image_path = "./foodrecsysv1/raw-data-images/{}.jpg"
for recipeid in TopNRecommendation.recipe_id:
recipe_id.append(recipeid) # list of recipe id of selected recipe and recommended recipe(s)
recipe_list.append("{} {}".format(recipeid, recipe_df.at[recipeid, 'recipe_name']))
image_list.append(image_path.format(recipeid))
image_array = []
for imagepath in image_list:
img = image.load_img(imagepath)
img = image.img_to_array(img, dtype='int')
image_array.append(img)
fig = plt.figure(figsize=(15,15))
gs1 = gridspec.GridSpec(1, N)
axs = []
for x in range(N):
axs.append(fig.add_subplot(gs1[x]))
axs[-1].imshow(image_array[x])
[axi.set_axis_off() for axi in axs]
for axi, x in zip(axs, recipe_list):
axi.set_title(x)
end = time()
running_time = end - start
print('time cost: %.5f sec' %running_time)
return df_normalized.loc[recipe_id, :], TopNRecommendation
```
![]()
1. Sort by average rating

![]()
![]()
2. Sort by number of reviews

![]()
![]()
Average rating and number of reviews are different popularity standards, and with these two sorting criterias similar results are generated. It is surprised that with nutrition information, even alcohol recipes can be detected and recommended.## Deployment
![]()
Integrate Top 10 recommendation from three distance calculation approaches, then generate Top N recommendation sorted by various criterias, e.g. average rating or number of reviews## Detailed Presentation
* Check out complete workflow with [Jupyter Notebook](./code).
* Check out complete code of [Flask Deployment](./flask_deployment).## Skills Acquired
* Pandas: feature extraction, data cleaning and data imputation
* Keras: image processing (process image files to array and show them according to recommended recipes)
* Matplotlib: using GridSpec to do subplots visualization within a for loop
* Flask: deployment of recommender engine into web application## Acknowledgements
Subplots code reference from Stack Overflow user [armatita](https://stackoverflow.com/questions/46713186/matplotlib-loop-make-subplot-for-each-category?rq=1) and [Nirmal](https://stackoverflow.com/questions/25862026/turn-off-axes-in-subplots). Thank you coders for sharing your experience! =]