The first step is importing the necessary libraries.

import pandas as pd
import difflib
import numpy as np
import pickle

Loading Data¶

The dataset is divided into four files: recipes, user ratings, and interactions. Two of them have RAW data, while the others have processed data. We will use processed ratings data and raw recipe data for this recommender system. It simply works best for our needs.

Let's load and check the data

recipe_data = pd.read_csv('/work/RAW_recipes.csv',header=0,sep=",")
recipe_data.head()

user_data = pd.read_csv('/work/PP_users.csv',header=0,sep=",")
user_data.head()

Okay, we can see the data on each file. The column names are self-explanatory, so we can get started.

Data Preparation and Exploration.¶

To build this simple recommender system, we must first prepare the data in a Surprise-compatible dataset. We're only interested in user ratings, so we'll pull them from the recipe ratings dataset.

The first step is to write a function that reads the items (recipes) and user ratings.

def getRecipeRatings(idx):
  user_items = [int(s) for s in user_data.loc[idx]['items'].replace('[','').replace(']','').replace(',','').split()]
  user_ratings = [float(s) for s in user_data.loc[idx]['ratings'].replace('[','').replace(']','').replace(',','').split()]
  df = pd.DataFrame(list(zip(user_items,user_ratings)),columns = ['Item','Rating'])
  df.insert(loc=0,column='User',value = user_data.loc[idx].u)
  return df

Then, create a dataset with one row for each User, Item, and Rating.

#recipe_ratings = pd.DataFrame(columns = ['User','Item','Rating'])
#for idx,row in user_data.iterrows():
#  recipe_ratings = recipe_ratings.append(getRecipeRatings(row['u']),ignore_index=True)

Because the dataset is large and the previous code takes time to execute, we only want to create it once so that we can use pickle to save it to disk and read it back whenever we need to. This saves us a significant amount of time.

#recipe_ratings.to_pickle('/work/recipe_ratings.pkl')
recipe_ratings = pd.read_pickle('/work/recipe_ratings.pkl')

It's a good idea to do some data exploration, so let's get started. We know this is high-quality data, so we'll just make a bar chart to see how the ratings are distributed.

import seaborn as sns
sns.barplot(x=recipe_ratings.Rating.value_counts().index, y=recipe_ratings.Rating.value_counts())

<AxesSubplot:ylabel='Rating'>

Good, we see that the majority of the ratings are 5.0, indicating that there are a lot of satisfied users with the recipes.

Because the dataset is large, we will reduce it to save time and avoid running out of memory. Let's only look at the recipes with the most ratings. Let's get rid of the recipes with fewer than 10 ratings.

recipe_counts = recipe_ratings.groupby(['Item']).size()
filtered_recipes = recipe_counts[recipe_counts>30]
filtered_recipes_list = filtered_recipes.index.tolist()
filtered_recipes_list = filtered_recipes.index.tolist()
len(filtered_recipes_list)

2349

recipe_ratings = recipe_ratings[recipe_ratings['Item'].isin(filtered_recipes_list)]

recipe_ratings.count()

User      174359
Item      174359
Rating    174359
dtype: int64

Let's take a look at the new rating distribution. As we can see, it is similar to the distribution of the entire dataset.

sns.barplot(x=recipe_ratings.Rating.value_counts().index, y=recipe_ratings.Rating.value_counts())

<AxesSubplot:ylabel='Rating'>

Okay, we now have a dataset with over 300,000 ratings and approximately 11000 recipes. Enough for our purposes and manageable via Google Colab. Let's get to work on the model!

Model Creation¶

In Google Colab, we have to install the Surprise package in order to start working with it.

The package surprise includes a number of prediction algorithms that will assist us in developing the recommendation system and selecting a number of recipes that a given user might enjoy. We have the option of using basic collaborative filtering algorithms (KNN) or Matrix Factorization algorithms such as SVD or SVDpp.

KNN-based algorithms choose user or item neighbors based on similarity (taking into account the mean or z-score normalization of each item or user rating). We can specify whether we want to run the user-based or item-based algorithm using the user_based parameter.

Matrix Factorization algorithms translate the user-item matrix into a lower-dimensional space and predict ratings from there.

More information on the definition and behavior of the algorithms can be found on the surprise documentation site.

We'll run some of them through cross-validation to compare the metrics (RMSE) and (MAE) and see how they work with this dataset.

As a baseline, let's run the most basic algorithm (NormalPredictor), which makes random predictions, and then see how the other algorithms improve the evaluation metrics.

from surprise import NormalPredictor
from surprise import Dataset
from surprise import Reader
from surprise import SVD
from surprise import SVDpp
from surprise import KNNBasic
from surprise.model_selection import cross_validate

reader = Reader(rating_scale=(0, 5))

data = Dataset.load_from_df(recipe_ratings[['User', 'Item', 'Rating']], reader)

trainSet = data.build_full_trainset()

algo = NormalPredictor()

cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm NormalPredictor on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2205  1.2199  1.2267  1.2288  1.2277  1.2247  0.0037  
MAE (testset)     0.7915  0.7909  0.7889  0.7919  0.7920  0.7910  0.0011  
Fit time          0.45    0.49    0.48    0.47    0.47    0.47    0.01    
Test time         0.87    0.64    0.88    0.67    0.62    0.73    0.12

{'test_rmse': array([1.22054598, 1.21993152, 1.2266604 , 1.22876997, 1.22772276]),
 'test_mae': array([0.79151228, 0.79090318, 0.78889573, 0.79188567, 0.79197489]),
 'fit_time': (0.4522593021392822,
  0.4864497184753418,
  0.48149585723876953,
  0.47049403190612793,
  0.4713156223297119),
 'test_time': (0.8718886375427246,
  0.6412787437438965,
  0.879267692565918,
  0.6652803421020508,
  0.6168420314788818)}

Let's see the predictions this algorithm yields for a given user. We need to fit the algorithm with the whole trainset, then make predictions with a test set that contains the user-item pairs that do not exist in the training set. This testSet can be easily built with the function build_anti_testset(), but in this case in order to save resources and time we are going to build a testset for just one user. We need to iterate over all the ratings in the trainSet and select the items that the user has not rated. We also need to fill a rating value for those (user,item) pairs, so we are going to use the trainSet global mean (which is the default value used by surprise).

anti_testset_user = []
targetUser = 0 #inner_id of the target user
fillValue = trainSet.global_mean
user_item_ratings = trainSet.ur[targetUser]
user_items = [item for (item,_) in (user_item_ratings)]
user_items
ratings = trainSet.all_ratings()
for iid in trainSet.all_items():
  if(iid not in user_items):
    anti_testset_user.append((trainSet.to_raw_uid(targetUser),trainSet.to_raw_iid(iid),fillValue))

predictions = algo.test(anti_testset_user)

predictions[0]

Prediction(uid=0, iid=122140, r_ui=4.602159911447072, est=5, details={'was_impossible': False})

Let's see the 10 recipes with better estimated rating for this user. I like to convert the predictions object into a DataFrame so that I can work better with it.

pred = pd.DataFrame(predictions)
pred.sort_values(by=['est'],inplace=True,ascending = False)
recipe_list = pred.head(10)['iid'].to_list()
recipe_data.loc[recipe_list]

OK, with that baseline let's check if other algorithms can improve the metrics. Let's try with neighbourhoud based algorithm (KNNBasic) computing similarities between items.

sim_options = {'name': 'cosine',
               'user_based': False  # compute  similarities between items
               }
algo = KNNBasic(sim_options=sim_options)
# Run 5-fold cross-validation and print results
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Computing the cosine similarity matrix...
/root/venv/lib/python3.7/site-packages/surprise/prediction_algorithms/algo_base.py:249: RuntimeWarning: invalid value encountered in double_scalars
  sim = construction_func[name](*args)
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.0297  1.0383  1.0111  1.0327  1.0371  1.0298  0.0098  
MAE (testset)     0.5512  0.5532  0.5438  0.5554  0.5558  0.5519  0.0044  
Fit time          6.14    5.84    5.38    5.73    6.00    5.82    0.26    
Test time         4.65    4.48    4.86    4.75    5.15    4.78    0.22

{'test_rmse': array([1.02966818, 1.03834889, 1.01109845, 1.03267812, 1.03710726]),
 'test_mae': array([0.5512142 , 0.55320937, 0.54383441, 0.55537928, 0.55583223]),
 'fit_time': (6.140854120254517,
  5.837968826293945,
  5.376391887664795,
  5.7278733253479,
  5.996432781219482),
 'test_time': (4.649652004241943,
  4.480146169662476,
  4.859352111816406,
  4.7482147216796875,
  5.14623761177063)}

This algorithm clearly outperformed our baseline. As can be seen, the MAE and RMSE means are better (lower) than those of the NormalPredictors. Let's see which recipes do this algorithm recommends.

predictions = algo.test(anti_testset_user)
pred = pd.DataFrame(predictions)
pred.sort_values(by=['est'],inplace=True,ascending = False)
recipe_list = pred.head(10)['iid'].to_list()
recipe_data.loc[recipe_list]

Let's take a look at a Matrix Factorization algorithm now.

algo = SVD()
# Run 5-fold cross-validation and print results
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9561  0.9709  0.9529  0.9460  0.9738  0.9599  0.0107  
MAE (testset)     0.5572  0.5622  0.5583  0.5558  0.5641  0.5595  0.0031  
Fit time          16.23   16.77   17.04   16.65   16.26   16.59   0.31    
Test time         0.85    0.60    0.68    0.69    0.61    0.68    0.09

{'test_rmse': array([0.95612274, 0.97089603, 0.95285402, 0.94597191, 0.97379423]),
 'test_mae': array([0.55724148, 0.56222468, 0.55830495, 0.55583515, 0.56405309]),
 'fit_time': (16.2300808429718,
  16.76616668701172,
  17.043593406677246,
  16.6538667678833,
  16.259265184402466),
 'test_time': (0.8469138145446777,
  0.6002140045166016,
  0.6778244972229004,
  0.6859123706817627,
  0.6077754497528076)}

We appear to have improved the KNNBasic algorithm slightly. The mean of the MAE is similar, but we improved the RMSE, resulting in smaller errors in our rating predictions.

predictions = algo.test(anti_testset_user)
pred = pd.DataFrame(predictions)
pred.sort_values(by=['est'],inplace=True,ascending = False)
recipe_list = pred.head(10)['iid'].to_list()
recipe_data.loc[recipe_list]

Parameter tuning with GridSearchCV¶

Scikit-surprise also allows us to tune the algorithms through GridSearchCV, which allows to execute the algorithm repeteadly using a predefined list of parameters values and returning the best set of parameters given the defined error metrics.

from surprise.model_selection import GridSearchCV

param_grid = {'n_factors': [100,150],
              'n_epochs': [20,25,30],
              'lr_all':[0.005,0.01,0.1],
              'reg_all':[0.02,0.05,0.1]}
grid_search = GridSearchCV(SVD, param_grid, measures=['rmse','mae'], cv=3)
grid_search.fit(data)

Let's see the scores for the best parameters found.

print(grid_search.best_score['rmse'])
print(grid_search.best_score['mae'])

0.95384628819102
0.5554452372062167

Because the model takes time to run, it's a good idea to save it to disk so we can reuse it and save time.

# save the model to disk
pickle.dump(grid_search, open('/work/surprise_grid_search_svd.sav', 'wb'))
#Load the model from disk
grid_search = pickle.load(open('/work/surprise_grid_search_svd.sav', 'rb'))

Let's take a look at the best parameters found by GridSearchCV.

print(grid_search.best_params['rmse'])

{'n_factors': 100, 'n_epochs': 25, 'lr_all': 0.005, 'reg_all': 0.1}

We can now repeat the cross validation with the best parameters and compare the results.

algo = grid_search.best_estimator['rmse']

cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9474  0.9523  0.9415  0.9564  0.9593  0.9514  0.0064  
MAE (testset)     0.5533  0.5551  0.5518  0.5560  0.5550  0.5542  0.0015  
Fit time          19.02   22.06   20.95   19.59   19.24   20.17   1.16    
Test time         0.83    0.96    0.87    0.86    0.73    0.85    0.07

{'test_rmse': array([0.94737198, 0.95230069, 0.94149614, 0.95642868, 0.95931833]),
 'test_mae': array([0.55329538, 0.55506764, 0.55180418, 0.55596874, 0.55499898]),
 'fit_time': (19.02249050140381,
  22.055828332901,
  20.950928211212158,
  19.59484338760376,
  19.244446277618408),
 'test_time': (0.832690954208374,
  0.9575753211975098,
  0.8661718368530273,
  0.8619349002838135,
  0.7340149879455566)}

By tuning the method parameters, we were able to slightly outperform the previous SVD technique.

Created in deepnote.com Created in Deepnote

	name	id	minutes	contributor_id	submitted	tags	nutrition	n_steps	steps	description	ingredients	n_ingredients
0	arriba baked winter squash mexican style	137739	55	47892	2005-09-16	['60-minutes-or-less', 'time-to-make', 'course...	[51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0]	11	['make a choice and proceed with recipe', 'dep...	autumn is my favorite time of year to cook! th...	['winter squash', 'mexican seasoning', 'mixed ...	7
1	a bit different breakfast pizza	31490	30	26278	2002-06-17	['30-minutes-or-less', 'time-to-make', 'course...	[173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0]	9	['preheat oven to 425 degrees f', 'press dough...	this recipe calls for the crust to be prebaked...	['prepared pizza crust', 'sausage patty', 'egg...	6
2	all in the kitchen chili	112140	130	196586	2005-02-25	['time-to-make', 'course', 'preparation', 'mai...	[269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0]	6	['brown ground beef in large pot', 'add choppe...	this modified version of 'mom's' chili was a h...	['ground beef', 'yellow onions', 'diced tomato...	13
3	alouette potatoes	59389	45	68585	2003-04-14	['60-minutes-or-less', 'time-to-make', 'course...	[368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0]	11	['place potatoes in a large pot of lightly sal...	this is a super easy, great tasting, make ahea...	['spreadable cheese with garlic and herbs', 'n...	11
4	amish tomato ketchup for canning	44061	190	41706	2002-10-25	['weeknight', 'time-to-make', 'course', 'main-...	[352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0]	5	['mix all ingredients& boil for 2 1 / 2 hours ...	my dh's amish mother raised him on this recipe...	['tomato juice', 'apple cider vinegar', 'sugar...	8

	u	techniques	items	n_items	ratings	n_ratings
0	0	[8, 0, 0, 5, 6, 0, 0, 1, 0, 9, 1, 0, 0, 0, 1, ...	[1118, 27680, 32541, 137353, 16428, 28815, 658...	31	[5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, ...	31
1	1	[11, 0, 0, 2, 12, 0, 0, 0, 0, 14, 5, 0, 0, 0, ...	[122140, 77036, 156817, 76957, 68818, 155600, ...	39	[5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, ...	39
2	2	[13, 0, 0, 7, 5, 0, 1, 2, 1, 11, 0, 1, 0, 0, 1...	[168054, 87218, 35731, 1, 20475, 9039, 124834,...	27	[3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, ...	27
3	3	[498, 13, 4, 218, 376, 3, 2, 33, 16, 591, 10, ...	[163193, 156352, 102888, 19914, 169438, 55772,...	1513	[5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 5.0, 5.0, 5.0, ...	1513
4	4	[161, 1, 1, 86, 93, 0, 0, 11, 2, 141, 0, 16, 0...	[72857, 38652, 160427, 55772, 119999, 141777, ...	376	[5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 5.0, 4.0, 5.0, ...	376

	name	id	minutes	contributor_id	submitted	tags	nutrition	n_steps	steps	description	ingredients	n_ingredients
122140	lemon raspberry yogurt muffins	125573	28	69474	2005-06-11	['30-minutes-or-less', 'time-to-make', 'course...	[194.3, 8.0, 56.0, 13.0, 8.0, 15.0, 10.0]	11	['combine eggs , milk , yogurt , butter and va...	i think i found this in a gooseberry patch coo...	['eggs', 'milk', 'lemon yogurt', 'butter', 'le...	11
161785	poached salamon with green bean salad	502339	25	39835	2013-06-25	['weeknight', '30-minutes-or-less', 'time-to-m...	[385.1, 30.0, 36.0, 23.0, 64.0, 15.0, 6.0]	12	['in a large skillet combine lemon , tarragon ...	we enjoyed this fish recipe - with some change...	['tarragon', 'lemon', 'white pearl onions', 's...	8
82121	fantastic never fail pan yorkshire pudding	146196	90	89831	2005-11-25	['time-to-make', 'course', 'main-ingredient', ...	[126.8, 4.0, 0.0, 7.0, 11.0, 5.0, 6.0]	13	['mix the flour and salt together until well b...	this recipe has never failed me yet and you ma...	['all-purpose flour', 'salt', 'eggs', 'water',...	6
136789	miso eggplant	359711	15	949477	2009-03-08	['15-minutes-or-less', 'time-to-make', 'course...	[82.6, 5.0, 35.0, 3.0, 2.0, 2.0, 4.0]	8	['slice eggplant into 1 / 4-inch circles', 'fr...	from "ofukuro no aji" cookbook.	['japanese eggplant', 'sesame seed oil', 'wate...	8
47837	chili s enchilada soup	392171	60	1366254	2009-09-28	['weeknight', '60-minutes-or-less', 'time-to-m...	[234.1, 17.0, 18.0, 38.0, 35.0, 29.0, 4.0]	10	['add oil to a large pot over medium heat', 'a...	my very pregnant best friend has been craving ...	['vegetable oil', 'boneless skinless chicken b...	12
112962	italian rolled peppers with mushrooms and ricotta	108462	30	37636	2005-01-16	['30-minutes-or-less', 'time-to-make', 'course...	[252.3, 28.0, 24.0, 21.0, 23.0, 32.0, 4.0]	18	['preheat broiler in oven', 'place peppers on ...	makes a wonderful low-carb, meatless entree or...	['bell peppers', 'white button mushrooms', 'ol...	10
106202	holyfield s ear	277137	5	330545	2008-01-07	['15-minutes-or-less', 'time-to-make', 'course...	[44.0, 0.0, 18.0, 0.0, 0.0, 0.0, 1.0]	2	['fill a standard shot glass 1 / 2 full of cre...	just another bloody red cocktail.	['coffee liqueur', 'creme de noyaux']	2
128131	macaroni salad with dill	363181	50	1071114	2009-03-28	['60-minutes-or-less', 'time-to-make', 'course...	[328.9, 31.0, 19.0, 17.0, 8.0, 14.0, 11.0]	3	['cook , drain , and rinse pasta', 'combine al...	quick, easy to make macaroni salad with a touc...	['cooked macaroni', 'mayonnaise', 'celery', 'r...	7
8524	armadillo eggs	125025	100	191220	2005-06-07	['time-to-make', 'course', 'main-ingredient', ...	[976.1, 127.0, 10.0, 56.0, 100.0, 133.0, 1.0]	17	['you will need toothpicks for this', "i buy t...	my dad grills a lot and is always making these...	['chicken breasts', 'bacon', 'fresh jalapeno']	3
160999	pistachio cream pie	190622	15	136511	2006-10-14	['15-minutes-or-less', 'time-to-make', 'course...	[428.8, 44.0, 89.0, 16.0, 13.0, 70.0, 12.0]	8	['in a small mixing bowl , beat the cream chee...	this pie reminds me of the watergate salad my ...	['cream cheese', 'milk', 'instant pistachio pu...	7

	name	id	minutes	contributor_id	submitted	tags	nutrition	n_steps	steps	description	ingredients	n_ingredients
79996	elderberry muffins	394509	30	283251	2009-10-13	['30-minutes-or-less', 'time-to-make', 'course...	[250.1, 14.0, 78.0, 8.0, 9.0, 11.0, 12.0]	2	['cream sugar and oleo , add additional ingred...	from my recipe files. uses dried elderberries.	['sugar', 'oleo', 'milk', 'egg', 'nutmeg', 'fl...	12
39145	cheese on a pedestal	274374	40	37449	2007-12-27	['60-minutes-or-less', 'time-to-make', 'course...	[114.7, 14.0, 0.0, 9.0, 13.0, 29.0, 0.0]	5	['cut the cheeses into wedges and cover with p...	there's nothing like good cheese paired with f...	['white cheddar cheese', 'stilton cheese', 'wa...	6
4683	amazing potato salad	86277	75	58300	2004-03-11	['time-to-make', 'course', 'main-ingredient', ...	[286.0, 17.0, 17.0, 10.0, 14.0, 11.0, 13.0]	19	['in a large saucepan , cover the eggs with wa...	i found this recipe in my food and wine magazi...	['eggs', 'baking potatoes', 'salt', 'mayonnais...	13
111487	inside out burger	385856	30	574975	2009-08-17	['weeknight', '30-minutes-or-less', 'time-to-m...	[686.2, 67.0, 10.0, 19.0, 92.0, 90.0, 7.0]	7	['heat grill to med-high heat', 'pat ground ch...	i came across this recipe idea when my husband...	['ground chuck', 'black pepper', 'american che...	10
136641	minted glazed carrots	478129	20	47892	2012-04-17	['30-minutes-or-less', 'time-to-make', 'course...	[202.3, 24.0, 39.0, 9.0, 2.0, 48.0, 5.0]	9	['lightly steam the carrots in a steamer baske...	found on bunch carrots from siri produce inc. ...	['carrots', 'butter', 'sugar', 'of fresh mint'...	5
31607	burnt tongue bbq sauce	67025	1620	93190	2003-07-18	['weeknight', 'time-to-make', 'course', 'main-...	[1826.8, 3.0, 1047.0, 350.0, 18.0, 1.0, 153.0]	12	['combine all ingredients in a large saucepan'...	this bbq sauce has a nice sweet flavor combine...	['ketchup', 'distilled white vinegar', 'dark c...	14
111127	indian split pea and vegetable soup	249754	30	283251	2007-08-29	['30-minutes-or-less', 'time-to-make', 'course...	[375.3, 11.0, 34.0, 49.0, 36.0, 20.0, 21.0]	13	['remove the spinach from the freezer', 'in a ...	this is from one of my food & wine cookbooks. ...	['frozen spinach', 'split peas', 'water', 'gin...	12
72081	dijon potato salad	360260	30	653438	2009-03-11	['30-minutes-or-less', 'time-to-make', 'course...	[195.9, 10.0, 6.0, 2.0, 7.0, 4.0, 10.0]	12	['place a steamer basket in a saucepan filled ...	another recipe i found in everyday food. i ha...	['new potatoes', 'white wine vinegar', 'dijon ...	6
135848	mimi s chili	103441	230	163986	2004-11-05	['time-to-make', 'course', 'main-ingredient', ...	[325.9, 21.0, 65.0, 53.0, 54.0, 25.0, 8.0]	10	['wash beans and place in pot with the next tw...	it is the best i have had and make it often. ...	['dry pinto beans', 'dried kidney beans', 'dri...	16
172034	red devil s food cake	203780	50	122878	2007-01-07	['60-minutes-or-less', 'time-to-make', 'course...	[362.1, 16.0, 140.0, 18.0, 10.0, 13.0, 20.0]	16	['prepare greased baking pans', 'preheat oven ...	this is the best chocolate cake recipe i've ev...	['shortening', 'sugar', 'salt', 'vanilla', 'co...	9

	name	id	minutes	contributor_id	submitted	tags	nutrition	n_steps	steps	description	ingredients	n_ingredients
153429	passion strawberry fruit leather dehydrator ...	229712	363	11176	2007-05-23	['course', 'main-ingredient', 'cuisine', 'prep...	[59.4, 0.0, 13.0, 0.0, 1.0, 0.0, 5.0]	7	['pure chopped strawberries with the applesauc...	once you know how to plug your dehydrator in, ...	['fresh strawberries', 'passionfruit syrup', '...	4
110718	impossible tortilla casserole	95044	60	24386	2004-07-05	['60-minutes-or-less', 'time-to-make', 'course...	[324.2, 26.0, 10.0, 21.0, 34.0, 39.0, 7.0]	9	['preheat oven to 375', 'line greased 9x13" pa...	have not tried this yet, but it sounds good an...	['8-inch flour tortillas', 'ground beef', 'oni...	9
127869	macadamia bars	495	155	22015	1999-09-10	['weeknight', 'time-to-make', 'course', 'prepa...	[116.8, 13.0, 19.0, 1.0, 2.0, 19.0, 2.0]	17	['crusts: in food processor or bowl , combine ...	this is an adopted recipe. i have not made it...	['all-purpose flour', 'sugar', 'butter', 'wate...	9
9143	asian californian mexican faux crab salad	239467	15	21752	2007-07-09	['15-minutes-or-less', 'time-to-make', 'course...	[108.0, 13.0, 7.0, 0.0, 3.0, 6.0, 2.0]	15	['in a glass bowl , toss together the ingredie...	i found a recipe on a fun food blog -- singleg...	['red pepper', 'tomatoes', 'scallions', 'garli...	15
143777	nif s easy crock pot smothered roast beef	350056	365	65502	2009-01-16	['course', 'main-ingredient', 'preparation', '...	[340.9, 18.0, 11.0, 21.0, 103.0, 22.0, 2.0]	4	['place roast in crock pot', 'mix other ingred...	a really easy dish to throw together. you can ...	['beef roast', 'condensed cream of mushroom so...	6
87426	fresh veggie pockets	52572	15	41706	2003-01-28	['15-minutes-or-less', 'time-to-make', 'course...	[296.6, 10.0, 13.0, 27.0, 35.0, 6.0, 14.0]	3	['in a bowl , combine the cream cheese , sunfl...	this low-fat delicious sandwich is just right ...	['fat free cream cheese', 'sunflower seeds', '...	9
172561	red white blue berry pops kid fun	93475	420	94272	2004-06-16	['lactose', 'time-to-make', 'course', 'main-in...	[16.4, 0.0, 16.0, 0.0, 0.0, 0.0, 1.0]	12	['in each 3-oz', 'cup , pour two tablespoons o...	or, for adults who think they're kids! i've al...	['strawberry juice', 'lemonade', 'raspberry ju...	4
82291	farmgirl s funky chicken	84495	140	95567	2004-02-20	['time-to-make', 'main-ingredient', 'preparati...	[749.2, 74.0, 67.0, 60.0, 113.0, 66.0, 7.0]	10	['preheat oven to 450', 'wash and dry chicken'...	easy, tasty, hearty chicken recipe. when time ...	['chicken', 'relish', 'salt', 'oil', 'ground b...	6
31649	bush pesto	332481	5	422893	2008-10-23	['15-minutes-or-less', 'time-to-make', 'course...	[1797.2, 300.0, 7.0, 16.0, 27.0, 153.0, 2.0]	3	['puree the nuts , parmesan and 1 / 4 cup of t...	pesto with an aussie spin from bushfoodrecipes...	['unsalted macadamia nuts', 'parmesan cheese',...	7
75591	easy cheesy shepherd s pie	194724	30	166019	2006-11-09	['30-minutes-or-less', 'time-to-make', 'course...	[764.6, 66.0, 12.0, 45.0, 89.0, 116.0, 16.0]	10	['preheat oven to 400f', 'grease an oblong bak...	this is such an easy, quick and kid friendly r...	['ground beef', 'condensed cheddar cheese soup...	9

Using Scikit-Surprise to Create a Simple Recipe Collaborative Filtering Recommender System.

Loading Data¶

Data Preparation and Exploration.¶

Model Creation¶

Parameter tuning with GridSearchCV¶

Conclusion