Today we will build a data-driven beer recommendation system.
Today we will build a data-driven beer recommendation system.
We have learnt Regression in BUS41000. However, there are many other type of problems in data science.
Hard to
Hard to do science!
Engineering is advancing fast, but note the danger of Alchemy!
Let’s build a recommendation system for beers!
Data source from https://www.beeradvocate.com/
Respect Beer!
## Observations: 1,586,614 ## Variables: 13 ## $ brewery_id <int> 10325, 10325, 10325, 10325, 1075, 1075, 107... ## $ brewery_name <chr> "Vecchio Birraio", "Vecchio Birraio", "Vecc... ## $ review_time <int> 1234817823, 1235915097, 1235916604, 1234725... ## $ review_overall <dbl> 1.5, 3.0, 3.0, 3.0, 4.0, 3.0, 3.5, 3.0, 4.0... ## $ review_aroma <dbl> 2.0, 2.5, 2.5, 3.0, 4.5, 3.5, 3.5, 2.5, 3.0... ## $ review_appearance <dbl> 2.5, 3.0, 3.0, 3.5, 4.0, 3.5, 3.5, 3.5, 3.5... ## $ review_profilename <chr> "stcules", "stcules", "stcules", "stcules",... ## $ beer_style <chr> "Hefeweizen", "English Strong Ale", "Foreig... ## $ review_palate <dbl> 1.5, 3.0, 3.0, 2.5, 4.0, 3.0, 4.0, 2.0, 3.5... ## $ review_taste <dbl> 1.5, 3.0, 3.0, 3.0, 4.5, 3.5, 4.0, 3.5, 4.0... ## $ beer_name <chr> "Sausa Weizen", "Red Moon", "Black Horse Bl... ## $ beer_abv <dbl> 5.0, 6.2, 6.5, 5.0, 7.7, 4.7, 4.7, 4.7, 4.7... ## $ beer_beerid <int> 47986, 48213, 48215, 47969, 64883, 52159, 5...
It consists of ~1.5 millions reviews posted on BeerAdvocate from 1999 to 2011.
About 180MB, “moderate” data.
Core idea of Collaborative Filtering: If two beers get close ratings among many common users, they are similar! If you like beer1, likely you will like beer2 too.
common_reviewers_by_name("Founders Double Trouble", "Coors Light")
## [1] "JoEBoBpr" "ZAP" "connecticutpoet" ## [4] "brewdlyhooked13" "northyorksammy" "Foxman" ## [7] "NJpadreFan" "Phyl21ca" "garymuchow" ## [10] "WVbeergeek" "NeroFiddled" "Suds" ## [13] "cokes" "Brent" "coldmeat23" ## [16] "Reaper16" "notchucknorris" "Rifugium" ## [19] "FosterJM" "psuKinger" "Slatetank" ## [22] "ZenAgnostic" "BierFan" "stewart124" ## [25] "heebes" "NODAK" "Shrews629" ## [28] "bnes09" "JayS2629" "greenmonstah" ## [31] "vandemonian" "beerprovedwright" "Vdubb86" ## [34] "TheKingofWichita" "scottfrie" "lacqueredmouse" ## [37] "xnicknj" "akorsak" "TMoney2591" ## [40] "perrymarcus" "kbutler1" "Beerandraiderfan" ## [43] "kwjd" "DrewCapzz" "DrJay" ## [46] "jdhilt" "Bitterbill" "Soonami" ## [49] "Chico1985" "MrHungryMonkey" "lovindahops" ## [52] "wcintula" "Gtreid" "wordemupg" ## [55] "Bfarr" "buschbeer" "MrStark" ## [58] "jmich24" "garuda" "Tilley4" ## [61] "zimm421" "drabmuh" "tigg924" ## [64] "Wasatch" "gtermi" "Scotchboy" ## [67] "rhoadsrage" "Clydesdale" "DNICE555" ## [70] "hopheadjuice" "jdklks" "katan" ## [73] "Gavage" "hardy008" "Strix" ## [76] "alleykatking" "rfgetz" "womencantsail" ## [79] "woosterbill" "scottyshades" "jimj21" ## [82] "projectflam86" "pmcadamis" "jrallen34" ## [85] "jjanega08" "dasenebler" "Jesse13713" ## [88] "JoeyBeerBelly" "Jimmys" "Onenote81" ## [91] "civilizedpsycho" "Blakaeris" "bashiba" ## [94] "Mdog" "ColForbinBC" "nlmartin" ## [97] "ChopperSmith" "Duhast500" "mothman" ## [100] "sarahspat" "biboergosum" "LordAdmNelson" ## [103] "Metalmonk" "BeerFMAndy" "Thorpe429" ## [106] "BDLbrewster" "sonicdescent" "CHickman" ## [109] "Hojaminbag" "champ103" "youngleo" ## [112] "woodske1" "BirdFlu" "Stunner97" ## [115] "onix1agr" "cnally" "match1112" ## [118] "WesWes" "KBoudreau66" "youngblood" ## [121] "CampusCrew" "BradLikesBrew" "ChainGangGuy" ## [124] "flipper2gv" "Mistofminn" "WakeandBake" ## [127] "colts9016" "largadeer" "BeerCon5" ## [130] "ThreeWiseMen" "PerzentRizen" "bark" ## [133] "illidurit" "Goliath" "oakbluff" ## [136] "philbe311" "cvstrickland" "beerthulhu" ## [139] "PatrickJR" "biglobo8971" "jayhawk73" ## [142] "argock" "Badbobx" "Haybeerman" ## [145] "happygnome" "TheManiacalOne" "PhillyStyle" ## [148] "ckeegan04" "ibbjamin" "Lexluthor33" ## [151] "CrellMoset" "Doomcifer" "cdkrenz" ## [154] "buckeyesox" "daledeee" "baos" ## [157] "johnnnniee" "Overlord" "clayrock81" ## [160] "dsa7783" "berserker256" "Huhzubendah" ## [163] "ktrillionaire" "kirok1999" "Brad007" ## [166] "TheBierBand" "drpimento" "MrMcGibblets" ## [169] "DarkerTheBetter" "zdk9" "beveritt" ## [172] "prototypic" "mikesgroove" "soupyman10" ## [175] "Gmann" "Risser09" "RoamingGnome" ## [178] "UCLABrewN84" "rayjay" "zeff80" ## [181] "gunnerman" "jwc215" "neenerzig" ## [184] "AltBock" "AmericanGothic" "mdaschaf" ## [187] "mdfb79" "ummswimmin" "Drew966" ## [190] "Viggo" "aforbes10" "feloniousmonk" ## [193] "JerzDevl2000" "tempest" "BEERchitect" ## [196] "chilidog" "BuckeyeNation" "ommegangpbr" ## [199] "Billolick" "gskitt"
reviews_beer1 = get_review_metrics(beer_name_to_id("Founders Double Trouble"), common_reviewers) glimpse(reviews_beer1)
## Observations: 200 ## Variables: 6 ## $ beer_name <chr> "Founders Double Trouble", "Founders Double... ## $ review_profilename <chr> "aforbes10", "akorsak", "alleykatking", "Al... ## $ review_overall <dbl> 4.0, 3.5, 4.5, 4.0, 4.5, 4.0, 4.5, 4.0, 5.0... ## $ review_aroma <dbl> 3.5, 4.0, 3.5, 4.0, 4.0, 4.5, 4.5, 4.0, 4.5... ## $ review_palate <dbl> 4.0, 3.5, 3.5, 3.5, 4.5, 4.0, 4.0, 4.0, 5.0... ## $ review_taste <dbl> 4.0, 4.0, 4.0, 4.0, 4.5, 4.0, 4.5, 4.5, 4.5...
head(reviews_beer1)
## # A tibble: 6 x 6 ## beer_name review_profilen… review_overall review_aroma review_palate ## <chr> <chr> <dbl> <dbl> <dbl> ## 1 Founders Dou… aforbes10 4 3.5 4 ## 2 Founders Dou… akorsak 3.5 4 3.5 ## 3 Founders Dou… alleykatking 4.5 3.5 3.5 ## 4 Founders Dou… AltBock 4 4 3.5 ## 5 Founders Dou… AmericanGothic 4.5 4 4.5 ## 6 Founders Dou… argock 4 4.5 4 ## # ... with 1 more variable: review_taste <dbl>
reviews_beer2 = get_review_metrics(beer_name_to_id("Coors Light"), common_reviewers) glimpse(reviews_beer2)
## Observations: 200 ## Variables: 6 ## $ beer_name <chr> "Coors Light", "Coors Light", "Coors Light"... ## $ review_profilename <chr> "aforbes10", "akorsak", "alleykatking", "Al... ## $ review_overall <dbl> 2.0, 3.5, 1.5, 1.5, 2.0, 3.0, 3.5, 1.0, 1.5... ## $ review_aroma <dbl> 2.0, 3.0, 1.5, 1.0, 2.0, 2.0, 3.5, 1.5, 2.5... ## $ review_palate <dbl> 2.0, 3.0, 2.0, 1.5, 2.0, 2.0, 4.0, 1.0, 2.0... ## $ review_taste <dbl> 2.0, 3.0, 2.0, 1.5, 2.0, 2.0, 4.0, 1.0, 2.0...
head(reviews_beer2)
## # A tibble: 6 x 6 ## beer_name review_profilename review_overall review_aroma review_palate ## <chr> <chr> <dbl> <dbl> <dbl> ## 1 Coors Light aforbes10 2 2 2 ## 2 Coors Light akorsak 3.5 3 3 ## 3 Coors Light alleykatking 1.5 1.5 2 ## 4 Coors Light AltBock 1.5 1 1.5 ## 5 Coors Light AmericanGothic 2 2 2 ## 6 Coors Light argock 3 2 2 ## # ... with 1 more variable: review_taste <dbl>
name1 = "Coors Light" name2 = "Founders Double Trouble" visual_beer_scatterplots(name1, name2)
Weighted/Aggregated version of Pearson Correlation!
\[ Sim_{\color{red} aroma}(beer1, beer2) = corr\left(R_{\color{red} aroma}(\cdot, beer1), R_{\color{red} aroma}(\cdot, beer2)\right) \] where \(R_{\color{red} aroma}(\cdot, beer)\) is the common reviewer’s rating for \(beer\) on the aroma.
Finally,
\[ \scriptsize Sim_{\bf aggre}(beer1, beer2) = \alpha_{\color{red} aroma} \cdot Sim_{\color{red} aroma}(beer1, beer2) + \alpha_{\color{blue} taste} \cdot Sim_{\color{blue} taste}(beer1, beer2) + \ldots \]
\(Sim_{\bf aggre}(beer1, beer2)\) measures as a whole how similar two beers are, it is accurate as long as number of common reviews increase.
Other metrics/similarity measures?
Fat Tire Amber Ale vs Dale’s Pale Ale
calc_similarity(b1, b2)
## [1] 0.7294634
Fat Tire Amber Ale vs Michelob Ultra
calc_similarity(b1, b2)
## [1] 0.2099679
Let’s pick 40 most widely rated beers, and then build a recommendation system.
Build a Beer Network with each pair of beer measured by Similarity!
> mybeer = "Stone IPA (India Pale Ale)" > find_similar_beers(mybeer) sim beer_name beer_style brewery_name 33 1.3288956 Stone Ruination IPA American Double / Imperial IPA Stone Brewing Co. 3 1.2825522 Arrogant Bastard Ale American Strong Ale Stone Brewing Co. 28 1.0833418 Sierra Nevada Celebration Ale American IPA Sierra Nevada Brewing Co. 36 0.9583520 Tröegs Nugget Nectar American Amber / Red Ale Tröegs Brewing Company 1 0.9496483 60 Minute IPA American IPA Dogfish Head Brewery
Try a Stout?
> mybeer = "Young's Double Chocolate Stout" > find_similar_beers(mybeer) sim beer_name beer_style brewery_name 25 0.6735954 Samuel Smith's Oatmeal Stout Oatmeal Stout Samuel Smith Old Brewery (Tadcaster) 4 0.6000354 Ayinger Celebrator Doppelbock Doppelbock Privatbrauerei Franz Inselkammer KG / Brauerei Aying 13 0.5952654 Guinness Draught Irish Dry Stout Guinness Ltd. 6 0.5816315 Brooklyn Black Chocolate Stout Russian Imperial Stout Brooklyn Brewery 10 0.5803285 Duvel Belgian Strong Pale Ale Brouwerij Duvel Moortgat NV
Limited to certain beer_style
> mybeer = "Sierra Nevada Pale Ale" > find_similar_beers(mybeer, "American IPA") sim beer_name beer_style brewery_name 28 1.0959539 Sierra Nevada Celebration Ale American IPA Sierra Nevada Brewing Co. 1 0.9040252 60 Minute IPA American IPA Dogfish Head Brewery 32 0.7870552 Stone IPA (India Pale Ale) American IPA Stone Brewing Co. 29 0.7696222 Sierra Nevada Torpedo Extra IPA American IPA Sierra Nevada Brewing Co. 22 0.7258228 Racer 5 India Pale Ale American IPA Bear Republic Brewing Co. > find_similar_beers(mybeer) sim beer_name beer_style brewery_name 23 1.1295512 Samuel Adams Boston Lager Vienna Lager Boston Beer Company (Samuel Adams) 28 1.0959539 Sierra Nevada Celebration Ale American IPA Sierra Nevada Brewing Co. 1 0.9040252 60 Minute IPA American IPA Dogfish Head Brewery 27 0.8722255 Sierra Nevada Bigfoot Barleywine Style Ale American Barleywine Sierra Nevada Brewing Co. 32 0.7870552 Stone IPA (India Pale Ale) American IPA Stone Brewing Co.
How to visualize the relationship among beers?
What we have? Pair-wise similarity
Spectral Clustering!
Thousands of reviews, news articles, financial report …
How to teach machines to learn automatically:
Topics of an article: topic modeling, Latent Dirichlet Allocation (LDA)
Relationship/similarities among words: word embedding, word2vec using 1-hidden-layer Neural Networks
With this tool, one can leverage beer reviews to build a better beer recommendation system!