The first step is importing the necessary libraries.

import pandas as pd
import difflib
import numpy as np
import pickle

Loading Data¶

So let's load again the two datasets we need.

recipe_data = pd.read_csv('/work/RAW_recipes.csv',header=0,sep=",")
recipe_data.head()

user_data = pd.read_csv('/work/PP_users.csv',header=0,sep=",")
user_data.head()

Okay, we can see the data on each file. The column names are self-explanatory, so we can get started.

Data Preparation and exploration.¶

We must first prepare the data in a dataset that is compatible with Surprise. The surprise algorithm will utilize this dataset to read the items, users, and recipe ratings. The ratings are required for the dataset, but we will not utilize them; I'll explain why later in this post.

The first step is to write a function that reads the items (recipes) and user ratings.

def getRecipeRatings(idx):
  user_items = [int(s) for s in user_data.loc[idx]['items'].replace('[','').replace(']','').replace(',','').split()]
  user_ratings = [float(s) for s in user_data.loc[idx]['ratings'].replace('[','').replace(']','').replace(',','').split()]
  df = pd.DataFrame(list(zip(user_items,user_ratings)),columns = ['Item','Rating'])
  df.insert(loc=0,column='User',value = user_data.loc[idx].u)
  return df

We'll make a dataset with one row for each User, Item, and Rating in this step. We only run this piece of code once, thus the code is commented. Pickle is used to read the saved dataset for other runs.

#recipe_ratings = pd.DataFrame(columns = ['User','Item','Rating'])
#for idx,row in user_data.iterrows():
#  recipe_ratings = recipe_ratings.append(getRecipeRatings(row['u']),ignore_index=True)

Pickle saves the dataset to disk (first run), then reads it for subsequent runs.

#recipe_ratings.to_pickle('/work/recipe_ratings.pkl')
recipe_ratings = pd.read_pickle('/work/recipe_ratings.pkl')

Let's check the rating distribution.

import seaborn as sns
sns.barplot(x=recipe_ratings.Rating.value_counts().index, y=recipe_ratings.Rating.value_counts())

Good, we see that the majority of the ratings are 5.0, indicating that there are a lot of satisfied users with the recipes.

Only the recipes with more than 30 ratings are selected to reduce the dataset size and save time.

recipe_counts = recipe_ratings.groupby(['Item']).size()
filtered_recipes = recipe_counts[recipe_counts>30]
filtered_recipes_list = filtered_recipes.index.tolist()
filtered_recipes_list = filtered_recipes.index.tolist()
len(filtered_recipes_list)

2349

recipe_ratings = recipe_ratings[recipe_ratings['Item'].isin(filtered_recipes_list)]

recipe_ratings.count()

User      174359
Item      174359
Rating    174359
dtype: int64

The ratings distribution in the filtered dataset is similar to the distribution in the entire dataset.

sns.barplot(x=recipe_ratings.Rating.value_counts().index, y=recipe_ratings.Rating.value_counts())

Identifying the Similarity Between Recipes¶

Let's create our custom model with scikit-surprise. The first step is creating a dataset with the filtered recipes.

recipe_filtered = recipe_data.loc[filtered_recipes_list]

recipe_filtered

len(recipe_filtered)

2349

Let's import the nltk libraries and download the necessary packages.

import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('words')
nltk.download('omw-1.4')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Unzipping corpora/omw-1.4.zip.

True

Let's import the corpora from nltk, which is a collection of known words, so we can filter out non-English and strange words from our recipe text afterwards.

words = set(nltk.corpus.words.words())

Let's get started by creating the necessary functions to retrieve the terms we're looking for. The following activities are performed by these functions:

Tokenizing the sentences is the first stage; we extract the words from the dataset using RegexpTokenizer, and then we eliminate the stopwords (words that are very common but not very important in the text such as conjunctions or prepositions)
The second stage is to lemmatize the sentence, which involves reducing a word's forms to its base word and keeping only verbs and nouns in this case (using nltk pos tagger).

lemmatizer = WordNetLemmatizer()

def nltk_pos_tagger(nltk_tag):
    if nltk_tag.startswith('V'):
        return wordnet.VERB
    elif nltk_tag.startswith('N'):
        return wordnet.NOUN
    else:          
        return None

def tokenize_sentence(sentence):
    tokenizer = nltk.RegexpTokenizer(r"[^\d\W]+")
    tokenized = tokenizer.tokenize(sentence)
    stopwords = nltk.corpus.stopwords.words('english')
    finalsentence = [word for word in tokenized if word not in stopwords]
    return(finalsentence)

def lemmatize_sentence(sentence):

    nltk_tagged = nltk.pos_tag(sentence)  
    wordnet_tagged = map(lambda x: (x[0], nltk_pos_tagger(x[1])), nltk_tagged)
    lemmatized_sentence = []
    
    for word, tag in wordnet_tagged:
        if not (tag is None):    
            lemmatized_sentence.append(lemmatizer.lemmatize(word, tag))
    return (lemmatized_sentence)

def tokenize_lemmatize(sentence):
  tokenized = tokenize_sentence(sentence)
  lemmatized = lemmatize_sentence(tokenized)
  selectedwords = [word for word in lemmatized if word in words]
  final = list(dict.fromkeys(selectedwords))
  return(final)

Let's use the previous functions to construct a new column containing the lemmatized words from the recipe name, steps, and ingredients.

recipe_filtered['recsys'] = recipe_filtered.apply(lambda row: tokenize_lemmatize(row['name']+row['steps']+row['ingredients']),axis=1)

The next step is to calculate a similarity score between the recipes. To do so, we must vectorize the words we obtained in the previous steps, which means assigning a numeric value to each dish based on the words it contains. Using the TfidfVectorizer package, we can accomplish this. We get a matrix with one vector for each recipe (the matrix contains one row per recipe).

from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')
recipe_filtered['recsys'] = recipe_filtered['recsys'].fillna('')
tfidf_matrix = tfidf.fit_transform(recipe_filtered['recsys'].astype(str))

tfidf_matrix.shape

(2349, 2579)

We can check the words from the recipes that we've vectorized.

tfidf.get_feature_names_out ()[1:100]

array(['absorb', 'absorption', 'accent', 'accompany', 'accord',
       'accumulate', 'accustom', 'achieve', 'acorn', 'act', 'activate',
       'ad', 'adapt', 'add', 'addition', 'adjust', 'adjustment',
       'advance', 'advise', 'agar', 'age', 'air', 'airtight', 'airy',
       'aka', 'ake', 'al', 'ala', 'alcohol', 'ale', 'allergic', 'allergy',
       'alley', 'alligator', 'allow', 'allspice', 'almond', 'alternate',
       'altitude', 'alum', 'aluminum', 'amaze', 'amber', 'ambrosia',
       'amino', 'anchovy', 'angel', 'angle', 'anise', 'anoint', 'apart',
       'appear', 'appearance', 'appetizer', 'apple', 'applesauce',
       'application', 'apply', 'approximate', 'apricot', 'area', 'arent',
       'armadillo', 'aroma', 'aromatize', 'arrange', 'arrangement',
       'arrow', 'arrowroot', 'artichoke', 'ash', 'aside', 'ask',
       'asparagus', 'assemble', 'assembly', 'assort', 'assure', 'ate',
       'atlas', 'atop', 'attach', 'attachment', 'attain', 'attempt',
       'aunt', 'autumn', 'avocado', 'avoid', 'baby', 'backbone', 'bacon',
       'bag', 'bagel', 'baggie', 'baguette', 'bailey', 'bake', 'baker'],
      dtype=object)

The linear_kernel function, which is comparable to the cosine similarity in this circumstance, can now be used to calculate the similarity between the recipes. The distance between two vectors is defined as the cosine of the angle between them. Visit the sklearn metrics page for further information.

# Import linear_kernel
from sklearn.metrics.pairwise import linear_kernel

recipe_cs = linear_kernel(tfidf_matrix, tfidf_matrix)

We can now see which recipes are the most similar to a particular one by sorting one of the rows of our matrix.

idx = (-recipe_cs[4]).argsort()[:10]
idx

array([   4, 1123,  600,  108,  291,  717, 2289,  111, 1660,  730])

As can be seen below, the recipes obtained appear to be similar. There's a lot of cheesecake!

recipe_filtered.iloc[idx]

Customizing our Surprise Algorithm.¶

Let's make some changes to the surprise base algorithm. The first step is to load the required libraries.

from surprise import Dataset
from surprise import Reader
from surprise import PredictionImpossible
from surprise import AlgoBase

To overwrite the base algorithm, we must first create a class that inherits from AlgoBase. The init, fit, and estimate methods must then be rewritten.

The init method simply calls the base init method.
The fit method handles the calculation of similarities. We must first invoke the base fit method before calculating the similiarity matrix. We simply assign the precalculated similarity matrix to a class object, as we have already done. (In this method, we could also call the code to calculate the similarities.)
Finally, the estimate method must return an estimated rating for a given user-item(recipe) pair. To accomplish this, we compute the total similarity of the given item to the other items rated by the user.

Note: The proper way to do this would be to calculate an average of the similarities weighted by the user ratings, but the problem with our dataset is that the majority of the ratings are 4 or 5, which results in very high estimated ratings (a lot of estimated ratings of 5), which prevents us from creating a properly ordered list of recommended recipes, so our estimated rating will be just the total similarity of the given recipe with the recipes the user has rated. We will receive extremely low estimated ratings, but the rating itself is unimportant; we simply need a "ranking" of the best recipes for this user so that it will work here.

class recipeAlgo(AlgoBase):

    def __init__(self):

        # Always call base method before doing anything.
        AlgoBase.__init__(self)

    def fit(self, trainset):

        # Here again: call base method before doing anything.
        AlgoBase.fit(self, trainset)

        self.similarities = recipe_cs

        return self

    def estimate(self, u, i):

        if not (self.trainset.knows_user(u) and self.trainset.knows_item(i)):
            raise PredictionImpossible('User and/or item is unkown.')

        sim_recipes = []
        #We have to search of the similarity between the input item (i)
        # and the recipes the user (u) rated
        item_idx=recipe_filtered.index.get_loc(self.trainset.to_raw_iid(i))
        for rating in self.trainset.ur[u]:
            rating_idx = recipe_filtered.index.get_loc(self.trainset.to_raw_iid(rating[0]))
            recipeSimilarity = self.similarities[item_idx,rating_idx]
            sim_recipes.append((recipeSimilarity, rating[1]))

        highest_sims = sorted(sim_recipes,key=lambda x: x[0])

        totalSimilarity=0
        ratingWeighted=0

        #Now we use the similarities to predict a rating
        for(similarity,rating) in highest_sims[:10]:
            totalSimilarity += similarity

        return totalSimilarity

Ok let's fit our algorithm, all we need to do is create our data object, create a suprise trainSet from it and fit the algorithm

reader = Reader(rating_scale=(0, 5))

data = Dataset.load_from_df(recipe_ratings[['User', 'Item', 'Rating']], reader)
trainSet = data.build_full_trainset()

algo = recipeAlgo()
algo.fit(trainSet)

<__main__.recipeAlgo at 0x7fb1522aba10>

We have successfully fitted our algorithm. Let's now create a test set with only one user to see how our recommender system performs. Our test set will include all of the recipes that the user hasn't rated yet, allowing our recommender system to assign a predicted rating to each of them and sort the best recipes for this user.

anti_testset_user = []
targetUser = 0 #inner_id of the target user
fillValue = trainSet.global_mean
user_item_ratings = trainSet.ur[targetUser]
user_items = [item for (item,_) in (user_item_ratings)]
user_items
ratings = trainSet.all_ratings()
for iid in trainSet.all_items():
  if(iid not in user_items):
    anti_testset_user.append((trainSet.to_raw_uid(targetUser),trainSet.to_raw_iid(iid),fillValue))

len(anti_testset_user)

2344

Let's call the test method of our recommender system.

predictions = algo.test(anti_testset_user)

As you can see below, the estimated ratings are extremely low; our algorithm generates the estimated rating by adding the similarities between a given recipe rated by the user and the most similar recipes rated by other users. However, we have a good classification of the best recipes for our test user based on content similarity.

pred = pd.DataFrame(predictions)
pred.sort_values(by=['est'],inplace=True,ascending = False)
pred

pred = pd.DataFrame(predictions)
pred.sort_values(by=['est'],inplace=True,ascending = False)
recipe_list = pred.head(10)['iid'].to_list()
recipe_data.loc[recipe_list]

Above this text are the suggested recipes, and below are the actual recipes rated by our user. Only by looking at the title can we see that there are similarities in the recipes; the user appears to enjoy braised meat, garlic, and chicken, and these ingredients are included in the recommended recipes.

recipe_data.loc[recipe_ratings[recipe_ratings['User']==0]['Item']]

Created in deepnote.com Created in Deepnote

	name	id	minutes	contributor_id	submitted	tags	nutrition	n_steps	steps	description	ingredients	n_ingredients
0	arriba baked winter squash mexican style	137739	55	47892	2005-09-16	['60-minutes-or-less', 'time-to-make', 'course...	[51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0]	11	['make a choice and proceed with recipe', 'dep...	autumn is my favorite time of year to cook! th...	['winter squash', 'mexican seasoning', 'mixed ...	7
1	a bit different breakfast pizza	31490	30	26278	2002-06-17	['30-minutes-or-less', 'time-to-make', 'course...	[173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0]	9	['preheat oven to 425 degrees f', 'press dough...	this recipe calls for the crust to be prebaked...	['prepared pizza crust', 'sausage patty', 'egg...	6
2	all in the kitchen chili	112140	130	196586	2005-02-25	['time-to-make', 'course', 'preparation', 'mai...	[269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0]	6	['brown ground beef in large pot', 'add choppe...	this modified version of 'mom's' chili was a h...	['ground beef', 'yellow onions', 'diced tomato...	13
3	alouette potatoes	59389	45	68585	2003-04-14	['60-minutes-or-less', 'time-to-make', 'course...	[368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0]	11	['place potatoes in a large pot of lightly sal...	this is a super easy, great tasting, make ahea...	['spreadable cheese with garlic and herbs', 'n...	11
4	amish tomato ketchup for canning	44061	190	41706	2002-10-25	['weeknight', 'time-to-make', 'course', 'main-...	[352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0]	5	['mix all ingredients& boil for 2 1 / 2 hours ...	my dh's amish mother raised him on this recipe...	['tomato juice', 'apple cider vinegar', 'sugar...	8

	u	techniques	items	n_items	ratings	n_ratings
0	0	[8, 0, 0, 5, 6, 0, 0, 1, 0, 9, 1, 0, 0, 0, 1, ...	[1118, 27680, 32541, 137353, 16428, 28815, 658...	31	[5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, ...	31
1	1	[11, 0, 0, 2, 12, 0, 0, 0, 0, 14, 5, 0, 0, 0, ...	[122140, 77036, 156817, 76957, 68818, 155600, ...	39	[5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, ...	39
2	2	[13, 0, 0, 7, 5, 0, 1, 2, 1, 11, 0, 1, 0, 0, 1...	[168054, 87218, 35731, 1, 20475, 9039, 124834,...	27	[3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, ...	27
3	3	[498, 13, 4, 218, 376, 3, 2, 33, 16, 591, 10, ...	[163193, 156352, 102888, 19914, 169438, 55772,...	1513	[5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 5.0, 5.0, 5.0, ...	1513
4	4	[161, 1, 1, 86, 93, 0, 0, 11, 2, 141, 0, 16, 0...	[72857, 38652, 160427, 55772, 119999, 141777, ...	376	[5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 5.0, 4.0, 5.0, ...	376

	name	id	minutes	contributor_id	submitted	tags	nutrition	n_steps	steps	description	ingredients	n_ingredients
66	my muffuletta sandwich	78655	20	12875	2003-12-12	['30-minutes-or-less', 'time-to-make', 'course...	[181.1, 26.0, 6.0, 17.0, 2.0, 11.0, 2.0]	3	['mix everything in food processor', 'chop fin...	watched a documentary about the	['ciabatta', 'provolone cheese', 'genoa salami...	17
156	better than cinnabon cinnamon rolls	149887	85	245081	2006-01-01	['time-to-make', 'course', 'preparation', 'occ...	[365.4, 23.0, 105.0, 10.0, 10.0, 46.0, 17.0]	18	['combine warm water , sugar and yeast', 'set ...	i snagged this off of a website and made my ow...	['active dry yeast', 'warm water', 'granulated...	13
193	flipped roast turkey	268169	195	120121	2007-11-27	['time-to-make', 'course', 'main-ingredient', ...	[498.9, 40.0, 3.0, 8.0, 119.0, 41.0, 0.0]	15	['preheat oven to 450f', 'remove neck and gibl...	of all the recipes i've posted on 'zaar, one o...	['whole turkey', 'olive oil', 'butter', 'onion...	7
267	sangria fruit cups non alcoholic	232044	260	327600	2007-06-03	['course', 'main-ingredient', 'preparation', '...	[122.1, 0.0, 104.0, 4.0, 4.0, 0.0, 9.0]	8	['bring orange juice to a boil', 'add to jelly...	a wonderful light dessert recipe from the peop...	['orange juice', 'strawberry gelatin', 'peach ...	9
279	splenda d cheesecake sugar free low carb	185799	80	340556	2006-09-13	['time-to-make', 'course', 'main-ingredient', ...	[566.5, 80.0, 16.0, 14.0, 21.0, 150.0, 4.0]	27	['grahm cracker crumb crust:', 'mix together: ...	this is my own recipe for a yummy, creamy, thi...	['graham cracker crumbs', 'splenda granular', ...	13
...	...	...	...	...	...	...	...	...	...	...	...	...
177548	rotini pasta with broccoli cream sauce	89894	40	24386	2004-04-24	['60-minutes-or-less', 'time-to-make', 'course...	[913.3, 58.0, 36.0, 16.0, 56.0, 110.0, 41.0]	10	['cook broccoli in boiling salted water until ...	a sure to please pasta dish made with a garlic...	['fresh broccoli', 'rotini pasta', 'garlic', '...	7
177592	rotkrautsalat red cabbage salad	54471	40	54716	2003-02-21	['bacon', '60-minutes-or-less', 'time-to-make'...	[245.9, 30.0, 21.0, 23.0, 10.0, 25.0, 3.0]	8	['fry bacon in medium-size fry pan until crisp...	NaN	['bacon', 'vegetable oil', 'sugar', 'salt', 'v...	9
177884	rum raisin muffins	373886	45	65720	2009-05-23	['60-minutes-or-less', 'time-to-make', 'course...	[233.8, 12.0, 75.0, 5.0, 6.0, 23.0, 12.0]	17	['soak raisins and currants in rum to cover ov...	rum-soaked raisins and currants dot these glaz...	['golden raisin', 'dried currant', 'dark rum',...	15
178034	russian stroganoff with bacon	51732	105	67992	2003-01-16	['weeknight', 'time-to-make', 'course', 'main-...	[528.7, 60.0, 12.0, 32.0, 79.0, 85.0, 1.0]	7	['mix flour , salt and pepper and place the ro...	this recipe came from betty feezor's show on w...	['round steaks', 'salt and pepper', 'flour', '...	8
178044	russian tea non tea	102396	25	116469	2004-10-20	['30-minutes-or-less', 'time-to-make', 'course...	[187.6, 0.0, 179.0, 0.0, 3.0, 0.0, 16.0]	8	['in a large saucepan , bring 2 cups of water ...	although this doesn't have tea in it, it is a ...	['water', 'ground cinnamon', 'ground ginger', ...	9

	name	id	minutes	contributor_id	submitted	tags	nutrition	n_steps	steps	description	ingredients	n_ingredients	recsys
279	splenda d cheesecake sugar free low carb	185799	80	340556	2006-09-13	['time-to-make', 'course', 'main-ingredient', ...	[566.5, 80.0, 16.0, 14.0, 21.0, 150.0, 4.0]	27	['grahm cracker crumb crust:', 'mix together: ...	this is my own recipe for a yummy, creamy, thi...	['graham cracker crumbs', 'splenda granular', ...	13	[cheesecake, cracker, crumb, crust, mix, spice...
80236	elsie s cherry delight dessert	267164	30	274666	2007-11-21	['30-minutes-or-less', 'time-to-make', 'course...	[320.0, 25.0, 50.0, 6.0, 4.0, 49.0, 13.0]	11	['combine butter and graham crackers', 'mix we...	can be made with blueberry pie filling as an a...	['butter', 'graham crackers', 'cherry pie fill...	6	[cherry, delight, combine, butter, graham, cra...
39832	cheesecake boo raj	108865	390	56112	2005-01-20	['time-to-make', 'course', 'preparation', 'for...	[768.0, 92.0, 129.0, 22.0, 25.0, 166.0, 15.0]	16	['use a 12" spring form pan- i\'ve used a 10" ...	my beloved 8-year-old nephew nicknamed himself	['graham crackers', 'pecans', 'brown sugar', '...	12	[cheesecake, boo, raj, use, spring, form, pan,...
6098	apple peach breakfast bake	241280	30	512834	2007-07-18	['30-minutes-or-less', 'time-to-make', 'course...	[397.5, 27.0, 164.0, 9.0, 16.0, 46.0, 17.0]	8	['preheat the oven to 400f', 'melt the butter ...	this makes an easy, delicious, and elegant bre...	['butter', 'brown sugar', 'cinnamon', 'nutmeg'...	13	[apple, peach, breakfast, bake, preheat, f, bu...
18868	basset hound cheesecake	447526	80	870705	2011-01-27	['time-to-make', 'course', 'main-ingredient', ...	[414.9, 49.0, 92.0, 12.0, 13.0, 82.0, 8.0]	9	['preheat oven to 350', 'mix graham cracker cr...	we call this basset hound cheesecake because t...	['graham cracker crumbs', 'nuts', 'butter', 'c...	9	[basset, hound, cheesecake, preheat, mix, grah...
49659	chocolate cheesecake	17910	70	15609	2002-01-24	['weeknight', 'time-to-make', 'course', 'cuisi...	[4873.3, 441.0, 1214.0, 255.0, 204.0, 665.0, 1...	9	['set aside one cup of dry cake mix', 'mix rem...	this is one of the easy recipes i've gotten ov...	["devil's food cake mix", 'oil', 'eggs', 'crea...	9	[chocolate, cheesecake, set, cup, cake, mix, r...
172451	red velvet cheese cake	499710	685	2549237	2013-05-02	['weeknight', 'course', 'main-ingredient', 'pr...	[798.0, 81.0, 291.0, 19.0, 19.0, 150.0, 25.0]	18	['stir together graham cracker crums , melted ...	this is a recipe posted by my sister on facebo...	['chocolate graham cracker crumbs', 'butter', ...	13	[velvet, cheese, cake, graham, cracker, melt, ...
6446	apple cheesecake pie	14218	40	20754	2001-11-13	['60-minutes-or-less', 'time-to-make', 'course...	[368.6, 29.0, 123.0, 10.0, 10.0, 39.0, 15.0]	13	['preheat oven to 325 degrees', 'combine cream...	simply delicious!	['graham cracker pie crusts', 'cream cheese', ...	8	[apple, cheesecake, pie, preheat, degree, comb...
121514	lemon delight	322575	15	942142	2008-09-03	['15-minutes-or-less', 'time-to-make', 'course...	[397.2, 26.0, 165.0, 13.0, 11.0, 49.0, 18.0]	15	['put 1 can of carnation milk in the freezer f...	very light and fluffy, lemon desert.	['graham wafer crumbs', 'butter', 'brown sugar...	7	[delight, put, carnation, milk, freezer, prepa...
51060	chocolate mousse cake 1977	78042	65	106624	2003-12-06	['weeknight', 'time-to-make', 'course', 'prepa...	[385.9, 56.0, 17.0, 2.0, 18.0, 86.0, 5.0]	33	['butter a 9 inch spring form pan', 'combine h...	easy, almost flourlesss chocolate cake.	['hazelnuts', 'butter', 'semisweet chocolate',...	8	[chocolate, mousse, cake, butter, inch, spring...

	uid	iid	r_ui	est	details
1884	0	44513	4.60216	0.936429	{'was_impossible': False}
56	0	15728	4.60216	0.896340	{'was_impossible': False}
1311	0	80022	4.60216	0.871413	{'was_impossible': False}
1168	0	161791	4.60216	0.871043	{'was_impossible': False}
1631	0	133192	4.60216	0.864810	{'was_impossible': False}
...	...	...	...	...	...
1794	0	135807	4.60216	0.032356	{'was_impossible': False}
966	0	16435	4.60216	0.029600	{'was_impossible': False}
2267	0	1080	4.60216	0.027599	{'was_impossible': False}
555	0	28552	4.60216	0.018871	{'was_impossible': False}
673	0	106202	4.60216	0.000000	{'was_impossible': False}

Using NLP to Create a Recommender System

Loading Data¶

Data Preparation and exploration.¶

Identifying the Similarity Between Recipes¶

Customizing our Surprise Algorithm.¶

Conclusion

	name	id	minutes	contributor_id	submitted	tags	nutrition	n_steps	steps	description	ingredients	n_ingredients
44513	chicken in stilton	40623	55	31914	2002-09-19	['60-minutes-or-less', 'time-to-make', 'main-i...	[481.2, 43.0, 14.0, 14.0, 13.0, 87.0, 4.0]	8	['chop onion , crush garlic and fry in butter ...	luscious company fare, very rich and tasty.	['mushroom', 'garlic', 'onion', 'chicken', 'wh...	11
15728	balsamic braised chicken	112258	85	17803	2005-02-27	['time-to-make', 'course', 'main-ingredient', ...	[423.7, 35.0, 66.0, 25.0, 64.0, 30.0, 6.0]	10	['sprinkle chicken pieces evenly with pepper a...	when i was given this free recipe in the groce...	['chicken thighs', 'chicken drumsticks', 'pepp...	11
80022	elegant garlic chicken for two	335764	40	871001	2008-11-08	['60-minutes-or-less', 'time-to-make', 'course...	[477.6, 38.0, 14.0, 8.0, 59.0, 76.0, 3.0]	9	['in a medium frying pan , lightly brown garli...	an easy elegant dinner for two.	['boneless skinless chicken breasts', 'butter'...	7
161791	poached salmon with a mustard dill sauce	268514	40	594139	2007-11-28	['60-minutes-or-less', 'time-to-make', 'course...	[517.9, 31.0, 27.0, 17.0, 133.0, 29.0, 2.0]	5	['in a large pan , add water , wine , lemon ju...	here is a light delicious salmon recipe, i mad...	['salmon fillets', 'water', 'lemon juice', 'dr...	13
133192	mediterranean spaghetti with tomatoes and feta	170625	30	38418	2006-05-30	['30-minutes-or-less', 'time-to-make', 'course...	[290.9, 32.0, 35.0, 35.0, 26.0, 59.0, 4.0]	8	['heat oil in large non-stick skillet over med...	you can use grape tomatoes in place of the cho...	['olive oil', 'dried oregano', 'garlic clove',...	10
132677	meatless cassoulet au vin	233218	70	283251	2007-06-07	['time-to-make', 'course', 'main-ingredient', ...	[270.7, 1.0, 20.0, 16.0, 27.0, 0.0, 13.0]	10	['in a large saucepan , bring navy beans , 6 c...	this is from my 365 ways to cook vegetarian co...	['dried navy beans', 'garlic cloves', 'onion',...	13
151059	oven baked chicken with fresh mozzarella tom...	345986	40	883141	2008-12-30	['60-minutes-or-less', 'time-to-make', 'course...	[396.6, 22.0, 14.0, 29.0, 97.0, 31.0, 4.0]	12	['preheat oven to 400 degrees f', 'coat casser...	vine-ripened tomatoes and fresh mozzarella che...	['italian seasoning', 'garlic powder', 'onion ...	15
1022	1950 s hamburger goulash	109232	50	160974	2005-01-24	['60-minutes-or-less', 'time-to-make', 'main-i...	[518.4, 39.0, 29.0, 47.0, 53.0, 48.0, 15.0]	15	['cook macaroni in boiling salted water until ...	we grew up with this casserole that we called ...	['macaroni', 'tomato paste', 'water', 'baking ...	16
26686	boiled bacon with parsley sauce fresh ham	423263	94	865936	2010-05-04	['ham', 'time-to-make', 'course', 'main-ingred...	[428.9, 28.0, 11.0, 146.0, 105.0, 38.0, 3.0]	11	['place the ham in a large pot or dutch oven a...	sounds gross, right? this is a traditional br...	['ham', 'onions', 'carrots', 'celery ribs', 'b...	11
164043	portabella sandwich with garlic and lemon	179067	15	183872	2006-07-24	['15-minutes-or-less', 'time-to-make', 'course...	[208.6, 5.0, 39.0, 13.0, 18.0, 5.0, 12.0]	6	['melt butter in skillet over medium heat', 'a...	i can't remember where i got this recipe, but ...	['butter', 'portabella mushrooms', 'salt', 'ga...	9

	name	id	minutes	contributor_id	submitted	tags	nutrition	n_steps	steps	description	ingredients	n_ingredients
27680	braised pork chops	92678	65	39835	2004-06-04	['time-to-make', 'course', 'main-ingredient', ...	[743.2, 86.0, 11.0, 8.0, 75.0, 128.0, 7.0]	18	['preheat oven to 325-degrees f', 'seaon the p...	this is the first successful pork chop i made ...	['pork chops', 'salt', 'pepper', 'flour', 'but...	15
90038	garlic bread with sauteed mushrooms	88274	20	101275	2004-04-05	['30-minutes-or-less', 'time-to-make', 'course...	[222.9, 8.0, 4.0, 16.0, 15.0, 4.0, 12.0]	6	['heat the 1 tblsp olive oil in a non stick sk...	i've been on a mediterranean kick in the last ...	['olive oil', 'button mushrooms', 'parsley', '...	7
71578	deviled mushrooms	28367	20	23302	2002-05-13	['30-minutes-or-less', 'time-to-make', 'course...	[209.4, 15.0, 6.0, 9.0, 13.0, 25.0, 8.0]	4	['melt the butter in a saucepan , and saute th...	yum	['unsalted butter', 'mushrooms', 'plain flour'...	9
27749	braised veal shanks	185494	200	305531	2006-09-11	['time-to-make', 'course', 'main-ingredient', ...	[537.7, 25.0, 0.0, 31.0, 177.0, 24.0, 1.0]	19	['lay veal in a single layer in a 9x13 inch ba...	my family loves this dish. i have also made i...	['veal shanks', 'lemon', 'beef broth', 'dried ...	10
89385	funky chicken	265625	45	265694	2007-11-13	['60-minutes-or-less', 'time-to-make', 'course...	[558.7, 33.0, 182.0, 110.0, 74.0, 30.0, 19.0]	6	['preheat oven to 350f', 'in a baking dish , p...	i just threw this together right now--yes, rig...	['chicken thighs', 'chicken drumsticks', 'hone...	9