https://github.com/karenwky/recommendation_system_allrecipes

recommending recipes with content-based filtering approach
https://github.com/karenwky/recommendation_system_allrecipes

content-based-filtering pandas recommendation-system

Last synced: 6 months ago
JSON representation

recommending recipes with content-based filtering approach

Host: GitHub
URL: https://github.com/karenwky/recommendation_system_allrecipes
Owner: karenwky
License: mit
Created: 2019-09-27T14:34:14.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2024-09-05T13:24:57.000Z (about 1 year ago)
Last Synced: 2024-09-18T06:43:12.993Z (about 1 year ago)
Topics: content-based-filtering, pandas, recommendation-system
Language: Jupyter Notebook
Homepage:
Size: 8.58 MB
Stars: 6
Watchers: 1
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Recommendation System: Allrecipes.com

Recommending recipes with content-based filtering approach by feature extraction (nutrition values). 

## Data Source

Data from [Kaggle](https://www.kaggle.com/elisaxxygao/foodrecsysv1) user elisaxxygao, containing two sets of data about user interaction information and recipe information, recipe images are also provided.

## Feature Extraction

 


For comparing similarity between recipes, 7 nutritions are selected and daily percent values are extracted which are based on a 2,000 calorie diet. 

## Distance Calculation Methods

After doing normalization for the nutrition data, three distance calculation methods are applied to experiment Top 3 recommendation results. 

 


1. Cosine Distance 


   ![cosine_222388](/images/cosine_222388.png)

   

2. Euclidean Distance 


   ![euclidean_222388](/images/euclidean_222388.png)

   

3. Hamming Distance 


   ![hamming_222388](/images/hamming_222388.png)

Nutrition data are similar and different results are generated by various distance calculation methods. Thus, a hybrid recommender is created to integrate recommendations from three approaches. 

## Hybrid Recommender

```python

"""

Hybrid Nutrition Recommender which integrates Top 2 recommendations from 3 different distance approaches 

(cosine, euclidean, hamming) and sort the results by selected criteria(s)

df_normalized: normalized nutrition data

recipe_id: find similar recipes based on the selected recipe

sort_order: must be in list, 4 options available: ['aver_rate'], ['review_nums'], ['aver_rate', 'review_nums'], ['review_nums', 'aver_rate']

N: Top N recipe(s)

return 1) recipe id, recipe name and image of Top N recommendation, 

2) nutrition data of selected recipe and Top N recommendation,

3) average rating and number of review of Top N recommendation

"""

def nutrition_hybrid_recommender(recipe_id, sort_order, N):

    start = time()

    

    allRecipes_cosine = pd.DataFrame(df_normalized.index)

    allRecipes_cosine = allRecipes_cosine[allRecipes_cosine.recipe_id != recipe_id]

    allRecipes_cosine["distance"] = allRecipes_cosine["recipe_id"].apply(lambda x: cosine(df_normalized.loc[recipe_id], df_normalized.loc[x]))

    

    allRecipes_euclidean = pd.DataFrame(df_normalized.index)

    allRecipes_euclidean = allRecipes_euclidean[allRecipes_euclidean.recipe_id != recipe_id]

    allRecipes_euclidean["distance"] = allRecipes_euclidean["recipe_id"].apply(lambda x: euclidean(df_normalized.loc[recipe_id], df_normalized.loc[x]))

    

    allRecipes_hamming = pd.DataFrame(df_normalized.index)

    allRecipes_hamming = allRecipes_hamming[allRecipes_hamming.recipe_id != recipe_id]

    allRecipes_hamming["distance"] = allRecipes_hamming["recipe_id"].apply(lambda x: hamming(df_normalized.loc[recipe_id], df_normalized.loc[x]))

    

    Top2Recommendation_cosine = allRecipes_cosine.sort_values(["distance"]).head(2).sort_values(by=['distance', 'recipe_id'])

    Top2Recommendation_euclidean = allRecipes_euclidean.sort_values(["distance"]).head(2).sort_values(by=['distance', 'recipe_id'])

    Top2Recommendation_hamming = allRecipes_hamming.sort_values(["distance"]).head(2).sort_values(by=['distance', 'recipe_id'])

    

    recipe_df = recipe.set_index('recipe_id')

    hybrid_Top6Recommendation = pd.concat([Top2Recommendation_cosine, Top2Recommendation_euclidean, Top2Recommendation_hamming])

    aver_rate_list = []

    review_nums_list = []

    for recipeid in hybrid_Top6Recommendation.recipe_id:

        aver_rate_list.append(recipe_df.at[recipeid, 'aver_rate'])

        review_nums_list.append(recipe_df.at[recipeid, 'review_nums'])

    hybrid_Top6Recommendation['aver_rate'] = aver_rate_list

    hybrid_Top6Recommendation['review_nums'] = review_nums_list

    TopNRecommendation = hybrid_Top6Recommendation.sort_values(by=sort_order, ascending=False).head(N).drop(columns=['distance'])

    

    recipe_id = [recipe_id]   

    recipe_list = []

    image_list = []

    image_path = "./foodrecsysv1/raw-data-images/{}.jpg"

    for recipeid in TopNRecommendation.recipe_id:

        recipe_id.append(recipeid)   # list of recipe id of selected recipe and recommended recipe(s)

        recipe_list.append("{}  {}".format(recipeid, recipe_df.at[recipeid, 'recipe_name']))

        image_list.append(image_path.format(recipeid))

    

    image_array = []

    for imagepath in image_list:

        img = image.load_img(imagepath)

        img = image.img_to_array(img, dtype='int')

        image_array.append(img)

        

    fig = plt.figure(figsize=(15,15))

    gs1 = gridspec.GridSpec(1, N)

    axs = []

    for x in range(N):

        axs.append(fig.add_subplot(gs1[x]))

        axs[-1].imshow(image_array[x])

    [axi.set_axis_off() for axi in axs]

    for axi, x in zip(axs, recipe_list):

        axi.set_title(x)

    

    end = time()

    running_time = end - start

    print('time cost: %.5f sec' %running_time)

    return df_normalized.loc[recipe_id, :], TopNRecommendation

```

 


1. Sort by average rating

   ![hybrid_ar](/images/hybrid_ar.png) 


    


    

   

2. Sort by number of reviews

   ![hybrid_rn](/images/hybrid_rn.png) 


    


    

     

Average rating and number of reviews are different popularity standards, and with these two sorting criterias similar results are generated. It is surprised that with nutrition information, even alcohol recipes can be detected and recommended. 

## Deployment

 


Integrate Top 10 recommendation from three distance calculation approaches, then generate Top N recommendation sorted by various criterias, e.g. average rating or number of reviews

## Detailed Presentation

* Check out complete workflow with [Jupyter Notebook](./code).

* Check out complete code of [Flask Deployment](./flask_deployment).

## Skills Acquired

* Pandas: feature extraction, data cleaning and data imputation

* Keras: image processing (process image files to array and show them according to recommended recipes)

* Matplotlib: using GridSpec to do subplots visualization within a for loop

* Flask: deployment of recommender engine into web application

## Acknowledgements

Subplots code reference from Stack Overflow user [armatita](https://stackoverflow.com/questions/46713186/matplotlib-loop-make-subplot-for-each-category?rq=1) and [Nirmal](https://stackoverflow.com/questions/25862026/turn-off-axes-in-subplots). Thank you coders for sharing your experience! =]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/karenwky/recommendation_system_allrecipes

Awesome Lists containing this project

README