Today we will build a data-driven beer recommendation system.

Overview

We have learnt Regression in BUS41000. However, there are many other type of problems in data science.

  • Regression, supervised machine learning
  • Clustering, unsupervised machine learning
  • Algorithm with massive data: online, streaming, sketching
  • Networks and relational data
  • Topic modeling
  • Deep learning for images/video, audio/NLP

Challenges

Hard to

  • define/select features
  • select models
  • compute, due to non-convex, high-dim, large-scale nature
  • scale up the algorithm to fit larger data

Hard to do science!

Engineering is advancing fast, but note the danger of Alchemy!

Take a concrete task

Glimpse of the data

## Observations: 1,586,614
## Variables: 13
## $ brewery_id         <int> 10325, 10325, 10325, 10325, 1075, 1075, 107...
## $ brewery_name       <chr> "Vecchio Birraio", "Vecchio Birraio", "Vecc...
## $ review_time        <int> 1234817823, 1235915097, 1235916604, 1234725...
## $ review_overall     <dbl> 1.5, 3.0, 3.0, 3.0, 4.0, 3.0, 3.5, 3.0, 4.0...
## $ review_aroma       <dbl> 2.0, 2.5, 2.5, 3.0, 4.5, 3.5, 3.5, 2.5, 3.0...
## $ review_appearance  <dbl> 2.5, 3.0, 3.0, 3.5, 4.0, 3.5, 3.5, 3.5, 3.5...
## $ review_profilename <chr> "stcules", "stcules", "stcules", "stcules",...
## $ beer_style         <chr> "Hefeweizen", "English Strong Ale", "Foreig...
## $ review_palate      <dbl> 1.5, 3.0, 3.0, 2.5, 4.0, 3.0, 4.0, 2.0, 3.5...
## $ review_taste       <dbl> 1.5, 3.0, 3.0, 3.0, 4.5, 3.5, 4.0, 3.5, 4.0...
## $ beer_name          <chr> "Sausa Weizen", "Red Moon", "Black Horse Bl...
## $ beer_abv           <dbl> 5.0, 6.2, 6.5, 5.0, 7.7, 4.7, 4.7, 4.7, 4.7...
## $ beer_beerid        <int> 47986, 48213, 48215, 47969, 64883, 52159, 5...

Glimpse of the data

It consists of ~1.5 millions reviews posted on BeerAdvocate from 1999 to 2011.

About 180MB, “moderate” data.

Idea: similarity between a pair of beers

Core idea of Collaborative Filtering: If two beers get close ratings among many common users, they are similar! If you like beer1, likely you will like beer2 too.

Find common review

common_reviewers_by_name("Founders Double Trouble", "Coors Light")
##   [1] "JoEBoBpr"         "ZAP"              "connecticutpoet" 
##   [4] "brewdlyhooked13"  "northyorksammy"   "Foxman"          
##   [7] "NJpadreFan"       "Phyl21ca"         "garymuchow"      
##  [10] "WVbeergeek"       "NeroFiddled"      "Suds"            
##  [13] "cokes"            "Brent"            "coldmeat23"      
##  [16] "Reaper16"         "notchucknorris"   "Rifugium"        
##  [19] "FosterJM"         "psuKinger"        "Slatetank"       
##  [22] "ZenAgnostic"      "BierFan"          "stewart124"      
##  [25] "heebes"           "NODAK"            "Shrews629"       
##  [28] "bnes09"           "JayS2629"         "greenmonstah"    
##  [31] "vandemonian"      "beerprovedwright" "Vdubb86"         
##  [34] "TheKingofWichita" "scottfrie"        "lacqueredmouse"  
##  [37] "xnicknj"          "akorsak"          "TMoney2591"      
##  [40] "perrymarcus"      "kbutler1"         "Beerandraiderfan"
##  [43] "kwjd"             "DrewCapzz"        "DrJay"           
##  [46] "jdhilt"           "Bitterbill"       "Soonami"         
##  [49] "Chico1985"        "MrHungryMonkey"   "lovindahops"     
##  [52] "wcintula"         "Gtreid"           "wordemupg"       
##  [55] "Bfarr"            "buschbeer"        "MrStark"         
##  [58] "jmich24"          "garuda"           "Tilley4"         
##  [61] "zimm421"          "drabmuh"          "tigg924"         
##  [64] "Wasatch"          "gtermi"           "Scotchboy"       
##  [67] "rhoadsrage"       "Clydesdale"       "DNICE555"        
##  [70] "hopheadjuice"     "jdklks"           "katan"           
##  [73] "Gavage"           "hardy008"         "Strix"           
##  [76] "alleykatking"     "rfgetz"           "womencantsail"   
##  [79] "woosterbill"      "scottyshades"     "jimj21"          
##  [82] "projectflam86"    "pmcadamis"        "jrallen34"       
##  [85] "jjanega08"        "dasenebler"       "Jesse13713"      
##  [88] "JoeyBeerBelly"    "Jimmys"           "Onenote81"       
##  [91] "civilizedpsycho"  "Blakaeris"        "bashiba"         
##  [94] "Mdog"             "ColForbinBC"      "nlmartin"        
##  [97] "ChopperSmith"     "Duhast500"        "mothman"         
## [100] "sarahspat"        "biboergosum"      "LordAdmNelson"   
## [103] "Metalmonk"        "BeerFMAndy"       "Thorpe429"       
## [106] "BDLbrewster"      "sonicdescent"     "CHickman"        
## [109] "Hojaminbag"       "champ103"         "youngleo"        
## [112] "woodske1"         "BirdFlu"          "Stunner97"       
## [115] "onix1agr"         "cnally"           "match1112"       
## [118] "WesWes"           "KBoudreau66"      "youngblood"      
## [121] "CampusCrew"       "BradLikesBrew"    "ChainGangGuy"    
## [124] "flipper2gv"       "Mistofminn"       "WakeandBake"     
## [127] "colts9016"        "largadeer"        "BeerCon5"        
## [130] "ThreeWiseMen"     "PerzentRizen"     "bark"            
## [133] "illidurit"        "Goliath"          "oakbluff"        
## [136] "philbe311"        "cvstrickland"     "beerthulhu"      
## [139] "PatrickJR"        "biglobo8971"      "jayhawk73"       
## [142] "argock"           "Badbobx"          "Haybeerman"      
## [145] "happygnome"       "TheManiacalOne"   "PhillyStyle"     
## [148] "ckeegan04"        "ibbjamin"         "Lexluthor33"     
## [151] "CrellMoset"       "Doomcifer"        "cdkrenz"         
## [154] "buckeyesox"       "daledeee"         "baos"            
## [157] "johnnnniee"       "Overlord"         "clayrock81"      
## [160] "dsa7783"          "berserker256"     "Huhzubendah"     
## [163] "ktrillionaire"    "kirok1999"        "Brad007"         
## [166] "TheBierBand"      "drpimento"        "MrMcGibblets"    
## [169] "DarkerTheBetter"  "zdk9"             "beveritt"        
## [172] "prototypic"       "mikesgroove"      "soupyman10"      
## [175] "Gmann"            "Risser09"         "RoamingGnome"    
## [178] "UCLABrewN84"      "rayjay"           "zeff80"          
## [181] "gunnerman"        "jwc215"           "neenerzig"       
## [184] "AltBock"          "AmericanGothic"   "mdaschaf"        
## [187] "mdfb79"           "ummswimmin"       "Drew966"         
## [190] "Viggo"            "aforbes10"        "feloniousmonk"   
## [193] "JerzDevl2000"     "tempest"          "BEERchitect"     
## [196] "chilidog"         "BuckeyeNation"    "ommegangpbr"     
## [199] "Billolick"        "gskitt"

Extract review feature

reviews_beer1 = get_review_metrics(beer_name_to_id("Founders Double Trouble"), 
    common_reviewers)
glimpse(reviews_beer1)
## Observations: 200
## Variables: 6
## $ beer_name          <chr> "Founders Double Trouble", "Founders Double...
## $ review_profilename <chr> "aforbes10", "akorsak", "alleykatking", "Al...
## $ review_overall     <dbl> 4.0, 3.5, 4.5, 4.0, 4.5, 4.0, 4.5, 4.0, 5.0...
## $ review_aroma       <dbl> 3.5, 4.0, 3.5, 4.0, 4.0, 4.5, 4.5, 4.0, 4.5...
## $ review_palate      <dbl> 4.0, 3.5, 3.5, 3.5, 4.5, 4.0, 4.0, 4.0, 5.0...
## $ review_taste       <dbl> 4.0, 4.0, 4.0, 4.0, 4.5, 4.0, 4.5, 4.5, 4.5...
head(reviews_beer1)
## # A tibble: 6 x 6
##   beer_name     review_profilen… review_overall review_aroma review_palate
##   <chr>         <chr>                     <dbl>        <dbl>         <dbl>
## 1 Founders Dou… aforbes10                   4            3.5           4  
## 2 Founders Dou… akorsak                     3.5          4             3.5
## 3 Founders Dou… alleykatking                4.5          3.5           3.5
## 4 Founders Dou… AltBock                     4            4             3.5
## 5 Founders Dou… AmericanGothic              4.5          4             4.5
## 6 Founders Dou… argock                      4            4.5           4  
## # ... with 1 more variable: review_taste <dbl>

Extract review feature

reviews_beer2 = get_review_metrics(beer_name_to_id("Coors Light"), 
    common_reviewers)
glimpse(reviews_beer2)
## Observations: 200
## Variables: 6
## $ beer_name          <chr> "Coors Light", "Coors Light", "Coors Light"...
## $ review_profilename <chr> "aforbes10", "akorsak", "alleykatking", "Al...
## $ review_overall     <dbl> 2.0, 3.5, 1.5, 1.5, 2.0, 3.0, 3.5, 1.0, 1.5...
## $ review_aroma       <dbl> 2.0, 3.0, 1.5, 1.0, 2.0, 2.0, 3.5, 1.5, 2.5...
## $ review_palate      <dbl> 2.0, 3.0, 2.0, 1.5, 2.0, 2.0, 4.0, 1.0, 2.0...
## $ review_taste       <dbl> 2.0, 3.0, 2.0, 1.5, 2.0, 2.0, 4.0, 1.0, 2.0...
head(reviews_beer2)
## # A tibble: 6 x 6
##   beer_name   review_profilename review_overall review_aroma review_palate
##   <chr>       <chr>                       <dbl>        <dbl>         <dbl>
## 1 Coors Light aforbes10                     2            2             2  
## 2 Coors Light akorsak                       3.5          3             3  
## 3 Coors Light alleykatking                  1.5          1.5           2  
## 4 Coors Light AltBock                       1.5          1             1.5
## 5 Coors Light AmericanGothic                2            2             2  
## 6 Coors Light argock                        3            2             2  
## # ... with 1 more variable: review_taste <dbl>

Coors Light vs Founders Double Trouble

name1 = "Coors Light"
name2 = "Founders Double Trouble"
visual_beer_scatterplots(name1, name2)

Similarity metric for a pair of beer

Weighted/Aggregated version of Pearson Correlation!

\[ Sim_{\color{red} aroma}(beer1, beer2) = corr\left(R_{\color{red} aroma}(\cdot, beer1), R_{\color{red} aroma}(\cdot, beer2)\right) \] where \(R_{\color{red} aroma}(\cdot, beer)\) is the common reviewer’s rating for \(beer\) on the aroma.

Finally,

\[ \scriptsize Sim_{\bf aggre}(beer1, beer2) = \alpha_{\color{red} aroma} \cdot Sim_{\color{red} aroma}(beer1, beer2) + \alpha_{\color{blue} taste} \cdot Sim_{\color{blue} taste}(beer1, beer2) + \ldots \]

\(Sim_{\bf aggre}(beer1, beer2)\) measures as a whole how similar two beers are, it is accurate as long as number of common reviews increase.

Other metrics/similarity measures?

Fat Tire Amber Ale vs Dale’s Pale Ale

calc_similarity(b1, b2)
## [1] 0.7294634

Fat Tire Amber Ale vs Michelob Ultra

calc_similarity(b1, b2)
## [1] 0.2099679

Measure all pairs and build “Beer Recommender System”

Let’s pick 40 most widely rated beers, and then build a recommendation system.

Build a Beer Network with each pair of beer measured by Similarity!

> mybeer = "Stone IPA (India Pale Ale)"
> find_similar_beers(mybeer)
         sim                     beer_name                     beer_style              brewery_name
33 1.3288956           Stone Ruination IPA American Double / Imperial IPA         Stone Brewing Co.
3  1.2825522          Arrogant Bastard Ale            American Strong Ale         Stone Brewing Co.
28 1.0833418 Sierra Nevada Celebration Ale                   American IPA Sierra Nevada Brewing Co.
36 0.9583520          Tröegs Nugget Nectar       American Amber / Red Ale    Tröegs Brewing Company
1  0.9496483                 60 Minute IPA                   American IPA      Dogfish Head Brewery

Beer recommendation

Try a Stout?

> mybeer = "Young's Double Chocolate Stout"
> find_similar_beers(mybeer)
         sim                      beer_name              beer_style                                         brewery_name
25 0.6735954   Samuel Smith's Oatmeal Stout           Oatmeal Stout                 Samuel Smith Old Brewery (Tadcaster)
4  0.6000354  Ayinger Celebrator Doppelbock              Doppelbock Privatbrauerei Franz Inselkammer KG / Brauerei Aying
13 0.5952654               Guinness Draught         Irish Dry Stout                                        Guinness Ltd.
6  0.5816315 Brooklyn Black Chocolate Stout  Russian Imperial Stout                                     Brooklyn Brewery
10 0.5803285                          Duvel Belgian Strong Pale Ale                          Brouwerij Duvel Moortgat NV

Beer recommendation

Limited to certain beer_style

> mybeer = "Sierra Nevada Pale Ale"
> find_similar_beers(mybeer, "American IPA")
         sim                       beer_name   beer_style              brewery_name
28 1.0959539   Sierra Nevada Celebration Ale American IPA Sierra Nevada Brewing Co.
1  0.9040252                   60 Minute IPA American IPA      Dogfish Head Brewery
32 0.7870552      Stone IPA (India Pale Ale) American IPA         Stone Brewing Co.
29 0.7696222 Sierra Nevada Torpedo Extra IPA American IPA Sierra Nevada Brewing Co.
22 0.7258228          Racer 5 India Pale Ale American IPA Bear Republic Brewing Co.
> find_similar_beers(mybeer)
         sim                                  beer_name          beer_style                       brewery_name
23 1.1295512                  Samuel Adams Boston Lager        Vienna Lager Boston Beer Company (Samuel Adams)
28 1.0959539              Sierra Nevada Celebration Ale        American IPA          Sierra Nevada Brewing Co.
1  0.9040252                              60 Minute IPA        American IPA               Dogfish Head Brewery
27 0.8722255 Sierra Nevada Bigfoot Barleywine Style Ale American Barleywine          Sierra Nevada Brewing Co.
32 0.7870552                 Stone IPA (India Pale Ale)        American IPA                  Stone Brewing Co.

Beer clusters?

How to visualize the relationship among beers?

What we have? Pair-wise similarity

Spectral Clustering!

  • Drawback: doesn’t scale well computatoinally \(O(n^2)\).
  • Large-scale, Locality Sensitive Hashing \(O(n)\) is much better for approximately finding near neighbors.

Clustering with 40 most reviewed beers

Textual data

Thousands of reviews, news articles, financial report …

How to teach machines to learn automatically:

  • Topics of an article: topic modeling, Latent Dirichlet Allocation (LDA)

  • Relationship/similarities among words: word embedding, word2vec using 1-hidden-layer Neural Networks

With this tool, one can leverage beer reviews to build a better beer recommendation system!

Reference

  1. Data and part of the code acquired through here and here.

  2. Collaborative filtering GIF illustration thanks to here