class: inverse, center, middle # 36-315: Statistical Graphics and Visualization ## Lecture 13 Meghan Hall <br> Department of Statistics & Data Science <br> Carnegie Mellon University <br> June 23, 2021 --- layout: true <div class="my-footer"><span>cmu-36315.netlify.app</span></div> --- # Logistics <br> .large[Group project assignments] <br> .medium[check-in due Friday] <br> .large[Assignments left] <br> .medium[two labs & one HW] <br> .large[Dropping second graphic critique] --- # Today <br> .large[Text analysis] <br> .medium[text mining & natural language processing is an entire field] <br> .medium[just touching a piece with the `tidytext` package] <br> .medium[and more data manipulation practice] <br> .large[Tables] <br> .medium[for presentations and for reports] <br> .medium[using `gt` and `kableExtra`] --- # The `tidytext` package <br> .large[We've used the `separate` function before] --- # `separate` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> dessert </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> prunes, animal crackers, cream cheese </td> </tr> <tr> <td style="text-align:left;"> phyllo dough, gorgonzola cheese, pineapple rings, blueberries </td> </tr> <tr> <td style="text-align:left;"> brioche, cantaloupe, pecans, avocados </td> </tr> </tbody> </table> -- ```r separate(dessert, into = c("ingredient 1", "ingredient 2", "ingredient 3", "ingredient 4"), sep = ", ") ``` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> ingredient 1 </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> ingredient 2 </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> ingredient 3 </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> ingredient 4 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> prunes </td> <td style="text-align:left;"> animal crackers </td> <td style="text-align:left;"> cream cheese </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> phyllo dough </td> <td style="text-align:left;"> gorgonzola cheese </td> <td style="text-align:left;"> pineapple rings </td> <td style="text-align:left;"> blueberries </td> </tr> <tr> <td style="text-align:left;"> brioche </td> <td style="text-align:left;"> cantaloupe </td> <td style="text-align:left;"> pecans </td> <td style="text-align:left;"> avocados </td> </tr> </tbody> </table> --- # Today's data <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> points </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> price </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> variety </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> description </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> title </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Much like the regular bottling from 2012, this comes across as rather rough and tannic, with rustic, earthy, herbal characteristics. Nonetheless, if you think of it as a pleasantly unfussy country wine, it's a good companion to a hearty winter stew. </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 19 </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:left;"> Soft, supple plum envelopes an oaky structure in this Cabernet, supported by 15% Merlot. Coffee and chocolate complete the picture, finishing strong at the end, resulting in a value-priced wine of attractive flavor and immediate accessibility. </td> <td style="text-align:left;"> Kirkland Signature 2011 Mountain Cuvée Cabernet Sauvignon (Napa Valley) </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 34 </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:left;"> Slightly reduced, this wine offers a chalky, tannic backbone to an otherwise juicy explosion of rich black cherry, the whole accented throughout by firm oak and cigar box. </td> <td style="text-align:left;"> Louis M. Martini 2012 Cabernet Sauvignon (Alexander Valley) </td> </tr> </tbody> </table> --- # The `tidytext` package <br> .large[We've used the `separate` function before] <br> .medium[but that becomes unwieldy very quickly] <br> .large[`tidytext` has `unnest_tokens`] <br> .medium[token: a meaningful unit of text (here, a word)] <br> .medium["tidy text" has one row per token] --- # Tokens ```r wine_ratings %>% unnest_tokens(word, description) ``` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> points </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> price </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> variety </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> title </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> much </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> like </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> the </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> regular </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> bottling </td> </tr> </tbody> </table> --- # The `tidytext` package <br> .large[We've used the `separate` function before] <br> .medium[but that becomes unwieldy very quickly] <br> .large[`tidytext` has `unnest_tokens`] <br> .medium[token: a meaningful unit of text (here, a word)] <br> .medium["tidy text" has one row per token] <br> .large[What about "stop words"] --- # Stop words ```r wine_ratings %>% unnest_tokens(word, description) %>% count(word, sort = TRUE) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> and </td> <td style="text-align:right;"> 85755 </td> </tr> <tr> <td style="text-align:left;"> the </td> <td style="text-align:right;"> 56821 </td> </tr> <tr> <td style="text-align:left;"> a </td> <td style="text-align:right;"> 46048 </td> </tr> <tr> <td style="text-align:left;"> of </td> <td style="text-align:right;"> 41492 </td> </tr> <tr> <td style="text-align:left;"> with </td> <td style="text-align:right;"> 28442 </td> </tr> <tr> <td style="text-align:left;"> this </td> <td style="text-align:right;"> 28259 </td> </tr> <tr> <td style="text-align:left;"> is </td> <td style="text-align:right;"> 22377 </td> </tr> <tr> <td style="text-align:left;"> in </td> <td style="text-align:right;"> 19742 </td> </tr> <tr> <td style="text-align:left;"> wine </td> <td style="text-align:right;"> 19195 </td> </tr> <tr> <td style="text-align:left;"> flavors </td> <td style="text-align:right;"> 17075 </td> </tr> </tbody> </table> --- # Stop words `anti_join` is a **filtering join**: drops all observations in the initial data frame that have a match in the second data frame ```r wine_ratings %>% unnest_tokens(word, description) %>% anti_join(stop_words) %>% count(word, sort = TRUE) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> wine </td> <td style="text-align:right;"> 19195 </td> </tr> <tr> <td style="text-align:left;"> flavors </td> <td style="text-align:right;"> 17075 </td> </tr> <tr> <td style="text-align:left;"> fruit </td> <td style="text-align:right;"> 12242 </td> </tr> <tr> <td style="text-align:left;"> cherry </td> <td style="text-align:right;"> 8531 </td> </tr> <tr> <td style="text-align:left;"> acidity </td> <td style="text-align:right;"> 8337 </td> </tr> <tr> <td style="text-align:left;"> finish </td> <td style="text-align:right;"> 7888 </td> </tr> </tbody> </table> --- # Stop words `stop_words` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> lexicon </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> a </td> <td style="text-align:left;"> SMART </td> </tr> <tr> <td style="text-align:left;"> a's </td> <td style="text-align:left;"> SMART </td> </tr> <tr> <td style="text-align:left;"> able </td> <td style="text-align:left;"> SMART </td> </tr> <tr> <td style="text-align:left;"> about </td> <td style="text-align:left;"> SMART </td> </tr> <tr> <td style="text-align:left;"> above </td> <td style="text-align:left;"> SMART </td> </tr> <tr> <td style="text-align:left;"> according </td> <td style="text-align:left;"> SMART </td> </tr> <tr> <td style="text-align:left;"> accordingly </td> <td style="text-align:left;"> SMART </td> </tr> <tr> <td style="text-align:left;"> across </td> <td style="text-align:left;"> SMART </td> </tr> <tr> <td style="text-align:left;"> actually </td> <td style="text-align:left;"> SMART </td> </tr> <tr> <td style="text-align:left;"> after </td> <td style="text-align:left;"> SMART </td> </tr> </tbody> </table> --- # Custom stop words ```r wine_stop_words <- tribble( ~"word", ~"lexicon", "aromas", "custom", "drink", "custom", "finish", "custom", "flavors", "custom", "wine", "custom", "acidity", "custom", "notes", "custom", "palate", "custom", "red", "custom", "chardonnay", "custom", "nose", "custom", "vineyard", "custom", "pinot", "custom", "noir", "custom", "cabernet", "custom", "black", "custom" ) ``` --- # Custom stop words `bind_rows` stacks data frames together if they have the same variables ```r stop_words_custom <- bind_rows(wine_stop_words, stop_words) ``` --- # Custom stop words ```r wine_ratings %>% unnest_tokens(word, description) %>% anti_join(stop_words_custom) %>% count(word, sort = TRUE) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> fruit </td> <td style="text-align:right;"> 12242 </td> </tr> <tr> <td style="text-align:left;"> cherry </td> <td style="text-align:right;"> 8531 </td> </tr> <tr> <td style="text-align:left;"> oak </td> <td style="text-align:right;"> 7194 </td> </tr> <tr> <td style="text-align:left;"> tannins </td> <td style="text-align:right;"> 6440 </td> </tr> <tr> <td style="text-align:left;"> ripe </td> <td style="text-align:right;"> 6224 </td> </tr> <tr> <td style="text-align:left;"> rich </td> <td style="text-align:right;"> 4783 </td> </tr> <tr> <td style="text-align:left;"> apple </td> <td style="text-align:right;"> 3712 </td> </tr> <tr> <td style="text-align:left;"> dry </td> <td style="text-align:right;"> 3661 </td> </tr> <tr> <td style="text-align:left;"> spice </td> <td style="text-align:right;"> 3633 </td> </tr> <tr> <td style="text-align:left;"> texture </td> <td style="text-align:right;"> 3594 </td> </tr> </tbody> </table> --- # Graphing word frequency use `slice_max` (new version of `top_n`) to filter by `n` ```r top_15 <- wine_ratings %>% unnest_tokens(word, description) %>% anti_join(stop_words_custom) %>% count(word) %>% * slice_max(n, n = 15) ``` -- ```r top_15 %>% ggplot(aes(x = n, y = reorder(word, n))) + geom_bar(stat = "identity") + scale_x_continuous(expand = expansion(mult = c(0, .1)), name = "Count") + labs(y = NULL, title = "Common words in wine reviews") + cmu_theme() + theme(panel.grid.major.y = element_blank(), plot.title.position = "plot") ``` --- # Graphing word frequency <img src="figs/Lec13/wine-14-1.png" width="504" style="display: block; margin: auto;" /> --- # Graphing word frequency `slice_max` respects `group_by` ```r top_15_variety <- wine_ratings %>% unnest_tokens(word, description) %>% anti_join(stop_words_custom) %>% * group_by(variety) %>% count(word) %>% slice_max(n, n = 15) ``` -- ```r top_15_variety %>% ggplot(aes(x = n, y = reorder(word, n))) + geom_bar(stat = "identity") + scale_x_continuous(expand = expansion(mult = c(0, .1)), name = "Count") + * facet_wrap(~variety) + labs(y = NULL, title = "Common words in wine reviews") + cmu_theme() + theme(panel.grid.major.y = element_blank(), plot.title.position = "plot") ``` --- # Graphing word frequency <img src="figs/Lec13/wine-16-1.png" width="504" style="display: block; margin: auto;" /> --- # Graphing word frequency add `scales = "free"` as an argument to `facet_wrap` ```r top_15_variety %>% ggplot(aes(x = n, y = reorder(word, n))) + geom_bar(stat = "identity") + scale_x_continuous(expand = expansion(mult = c(0, .1)), name = "Count") + * facet_wrap(~variety, scales = "free") + labs(y = NULL, title = "Common words in wine reviews") + cmu_theme() + theme(panel.grid.major.y = element_blank(), plot.title.position = "plot", strip.background = element_rect(fill = "light gray"), strip.text = element_text(color = "black")) ``` --- # Graphing word frequency <img src="figs/Lec13/wine-17-1.png" width="504" style="display: block; margin: auto;" /> --- # Graphing word frequency `reorder_within` is a nifty function within `tidytext` ```r top_15_variety %>% ggplot(aes(x = n, * y = reorder_within(word, n, variety))) + geom_bar(stat = "identity") + scale_x_continuous(expand = expansion(mult = c(0, .1)), name = "Count", * labels = label_number_si()) + * scale_y_reordered(sep = "___") + facet_wrap(~variety, scales = "free") + labs(y = NULL, title = "Common words in wine reviews") + cmu_theme() + theme(panel.grid.major.y = element_blank(), plot.title.position = "plot", strip.background = element_rect(fill = "light gray"), strip.text = element_text(color = "black")) ``` --- # Graphing word frequency <img src="figs/Lec13/wine-18-1.png" width="504" style="display: block; margin: auto;" /> --- # Comparing proportions ```r wine_ratings %>% unnest_tokens(word, description) %>% anti_join(stop_words_custom) %>% * mutate(word = str_extract(word, "[a-z']+")) %>% count(word, variety) %>% filter(n > 10) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> variety </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ability </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 14 </td> </tr> <tr> <td style="text-align:left;"> ability </td> <td style="text-align:left;"> Chardonnay </td> <td style="text-align:right;"> 13 </td> </tr> <tr> <td style="text-align:left;"> ability </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 11 </td> </tr> <tr> <td style="text-align:left;"> abound </td> <td style="text-align:left;"> Chardonnay </td> <td style="text-align:right;"> 31 </td> </tr> <tr> <td style="text-align:left;"> abound </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 18 </td> </tr> <tr> <td style="text-align:left;"> abrasive </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 15 </td> </tr> <tr> <td style="text-align:left;"> absolutely </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 22 </td> </tr> <tr> <td style="text-align:left;"> absolutely </td> <td style="text-align:left;"> Chardonnay </td> <td style="text-align:right;"> 13 </td> </tr> </tbody> </table> --- # Comparing proportions ```r wine_ratings %>% unnest_tokens(word, description) %>% anti_join(stop_words_custom) %>% mutate(word = str_extract(word, "[a-z']+")) %>% count(word, variety) %>% filter(n > 10) %>% * group_by(variety) %>% * mutate(proportion = n / sum(n)) %>% ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> variety </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> n </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> proportion </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ability </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 14 </td> <td style="text-align:right;"> 0.0000903 </td> </tr> <tr> <td style="text-align:left;"> ability </td> <td style="text-align:left;"> Chardonnay </td> <td style="text-align:right;"> 13 </td> <td style="text-align:right;"> 0.0000727 </td> </tr> <tr> <td style="text-align:left;"> ability </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> 0.0000497 </td> </tr> <tr> <td style="text-align:left;"> abound </td> <td style="text-align:left;"> Chardonnay </td> <td style="text-align:right;"> 31 </td> <td style="text-align:right;"> 0.0001735 </td> </tr> <tr> <td style="text-align:left;"> abound </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 0.0000813 </td> </tr> <tr> <td style="text-align:left;"> abrasive </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 15 </td> <td style="text-align:right;"> 0.0000967 </td> </tr> <tr> <td style="text-align:left;"> absolutely </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 0.0001418 </td> </tr> </tbody> </table> --- # Comparing proportions ```r proportions <- wine_ratings %>% unnest_tokens(word, description) %>% anti_join(stop_words_custom) %>% mutate(word = str_extract(word, "[a-z']+")) %>% count(word, variety) %>% filter(n > 10) %>% group_by(variety) %>% mutate(proportion = n / sum(n)) %>% select(-c(n)) %>% pivot_wider(names_from = variety, values_from = proportion) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Cabernet Sauvignon </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Chardonnay </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Pinot Noir </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> ability </td> <td style="text-align:right;"> 0.0000903 </td> <td style="text-align:right;"> 0.0000727 </td> <td style="text-align:right;"> 0.0000497 </td> </tr> <tr> <td style="text-align:left;"> abound </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> 0.0001735 </td> <td style="text-align:right;"> 0.0000813 </td> </tr> <tr> <td style="text-align:left;"> abrasive </td> <td style="text-align:right;"> 0.0000967 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> <tr> <td style="text-align:left;"> absolutely </td> <td style="text-align:right;"> 0.0001418 </td> <td style="text-align:right;"> 0.0000727 </td> <td style="text-align:right;"> 0.0001627 </td> </tr> <tr> <td style="text-align:left;"> abundance </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> 0.0001287 </td> <td style="text-align:right;"> 0.0001175 </td> </tr> </tbody> </table> --- # Comparing proportions ```r proportions %>% ggplot(aes(x = Chardonnay, y = `Pinot Noir`)) + geom_jitter(alpha = 0.2) + geom_text(aes(label = word), check_overlap = TRUE) + geom_abline(color = "light gray", linetype = 2) + cmu_theme() ``` --- # Comparing proportions <img src="figs/Lec13/wine-25-1.png" width="504" style="display: block; margin: auto;" /> --- # Sentiment analysis ```r sentiment <- get_sentiments("bing") ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> sentiment </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2-faces </td> <td style="text-align:left;"> negative </td> </tr> <tr> <td style="text-align:left;"> abnormal </td> <td style="text-align:left;"> negative </td> </tr> <tr> <td style="text-align:left;"> abolish </td> <td style="text-align:left;"> negative </td> </tr> <tr> <td style="text-align:left;"> abominable </td> <td style="text-align:left;"> negative </td> </tr> <tr> <td style="text-align:left;"> abominably </td> <td style="text-align:left;"> negative </td> </tr> <tr> <td style="text-align:left;"> abominate </td> <td style="text-align:left;"> negative </td> </tr> <tr> <td style="text-align:left;"> abomination </td> <td style="text-align:left;"> negative </td> </tr> <tr> <td style="text-align:left;"> abort </td> <td style="text-align:left;"> negative </td> </tr> <tr> <td style="text-align:left;"> aborted </td> <td style="text-align:left;"> negative </td> </tr> <tr> <td style="text-align:left;"> aborts </td> <td style="text-align:left;"> negative </td> </tr> </tbody> </table> --- # Sentiment analysis `inner_join` drops unmatched observations (so be careful) ```r word_sentiment <- wine_ratings %>% unnest_tokens(word, description) %>% anti_join(stop_words_custom) %>% mutate(word = str_extract(word, "[a-z']+")) %>% inner_join(sentiment) %>% count(word, sentiment, sort = TRUE) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> sentiment </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> rich </td> <td style="text-align:left;"> positive </td> <td style="text-align:right;"> 4783 </td> </tr> <tr> <td style="text-align:left;"> soft </td> <td style="text-align:left;"> positive </td> <td style="text-align:right;"> 3552 </td> </tr> <tr> <td style="text-align:left;"> sweet </td> <td style="text-align:left;"> positive </td> <td style="text-align:right;"> 3228 </td> </tr> <tr> <td style="text-align:left;"> fresh </td> <td style="text-align:left;"> positive </td> <td style="text-align:right;"> 3019 </td> </tr> <tr> <td style="text-align:left;"> crisp </td> <td style="text-align:left;"> positive </td> <td style="text-align:right;"> 2928 </td> </tr> </tbody> </table> --- # Sentiment analysis ```r word_sentiment %>% * group_by(sentiment) %>% * slice_max(n, n = 15) %>% mutate(sentiment = factor(sentiment, levels = c("positive","negative"))) %>% ggplot(aes(x = n, * y = reorder_within(word, n, sentiment))) + geom_bar(stat = "identity") + scale_x_continuous(expand = expansion(mult = c(0, .1)), name = "Count", labels = label_number_si()) + * scale_y_reordered(sep = "___") + facet_wrap(~sentiment, scales = "free") + labs(y = NULL, title = "Common words in wine reviews, by sentiment") + cmu_theme() + theme(panel.grid.major.y = element_blank(), plot.title.position = "plot", strip.background = element_rect(fill = "light gray"), strip.text = element_text(color = "black")) ``` --- # Sentiment analysis <img src="figs/Lec13/sentiment-5-1.png" width="504" style="display: block; margin: auto;" /> --- # Sentiment analysis ```r word_sentiment %>% slice_max(n, n = 15) %>% ggplot(aes(x = n, y = reorder(word, n), * fill = sentiment)) + geom_bar(stat = "identity") + * scale_fill_brewer(type = "qual", name = NULL, * guide = guide_legend(reverse = TRUE)) + scale_x_continuous(expand = expansion(mult = c(0, .1)), name = "Count", labels = label_number_si()) + labs(y = NULL, title = "Common words in wine reviews, by sentiment") + cmu_theme() + theme(panel.grid.major.y = element_blank(), plot.title.position = "plot", legend.position = c(0.8, 0.2)) ``` --- # Sentiment analysis <img src="figs/Lec13/sentiment-6-1.png" width="504" style="display: block; margin: auto;" /> --- # Bigrams some of text analysis (including sentiment analysis) is more effective if the "token" is larger bigram: two words ```r wine_ratings %>% unnest_tokens(bigram, description, token = "ngrams", n = 2) %>% count(bigram, sort = TRUE) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> bigram </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> on the </td> <td style="text-align:right;"> 8445 </td> </tr> <tr> <td style="text-align:left;"> this is </td> <td style="text-align:right;"> 5934 </td> </tr> <tr> <td style="text-align:left;"> with a </td> <td style="text-align:right;"> 5734 </td> </tr> <tr> <td style="text-align:left;"> the palate </td> <td style="text-align:right;"> 5278 </td> </tr> <tr> <td style="text-align:left;"> this wine </td> <td style="text-align:right;"> 4896 </td> </tr> <tr> <td style="text-align:left;"> is a </td> <td style="text-align:right;"> 4777 </td> </tr> <tr> <td style="text-align:left;"> in the </td> <td style="text-align:right;"> 4694 </td> </tr> </tbody> </table> --- # Bigrams ```r wine_ratings %>% unnest_tokens(bigram, description, token = "ngrams", n = 2) %>% separate(bigram, into = c("first_word", "second_word"), remove = FALSE, sep = " ") ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> points </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> price </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> variety </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> title </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> bigram </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> first_word </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> second_word </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> much like </td> <td style="text-align:left;"> much </td> <td style="text-align:left;"> like </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> like the </td> <td style="text-align:left;"> like </td> <td style="text-align:left;"> the </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> the regular </td> <td style="text-align:left;"> the </td> <td style="text-align:left;"> regular </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> regular bottling </td> <td style="text-align:left;"> regular </td> <td style="text-align:left;"> bottling </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> bottling from </td> <td style="text-align:left;"> bottling </td> <td style="text-align:left;"> from </td> </tr> </tbody> </table> --- # Bigrams ```r wine_ratings %>% unnest_tokens(bigram, description, token = "ngrams", n = 2) %>% separate(bigram, into = c("first_word", "second_word"), remove = FALSE, sep = " ") %>% filter(!first_word %in% stop_words_custom$word & !second_word %in% stop_words_custom$word) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> points </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> price </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> variety </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> title </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> bigram </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> first_word </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> second_word </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> regular bottling </td> <td style="text-align:left;"> regular </td> <td style="text-align:left;"> bottling </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> rustic earthy </td> <td style="text-align:left;"> rustic </td> <td style="text-align:left;"> earthy </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> earthy herbal </td> <td style="text-align:left;"> earthy </td> <td style="text-align:left;"> herbal </td> </tr> <tr> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> herbal characteristics </td> <td style="text-align:left;"> herbal </td> <td style="text-align:left;"> characteristics </td> </tr> </tbody> </table> --- # Bigrams ```r wine_ratings %>% unnest_tokens(bigram, description, token = "ngrams", n = 2) %>% separate(bigram, into = c("first_word", "second_word"), remove = FALSE, sep = " ") %>% filter(!first_word %in% stop_words_custom$word & !second_word %in% stop_words_custom$word) %>% count(bigram, sort = TRUE) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> bigram </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> medium bodied </td> <td style="text-align:right;"> 1073 </td> </tr> <tr> <td style="text-align:left;"> cherry fruit </td> <td style="text-align:right;"> 1009 </td> </tr> <tr> <td style="text-align:left;"> buttered toast </td> <td style="text-align:right;"> 792 </td> </tr> <tr> <td style="text-align:left;"> french oak </td> <td style="text-align:right;"> 760 </td> </tr> <tr> <td style="text-align:left;"> tropical fruit </td> <td style="text-align:right;"> 641 </td> </tr> </tbody> </table> --- .h1[# Bigrams] .tiny[ ```r display %>% filter(variety != "Cabernet Sauvignon") %>% count(bigram, variety) %>% * group_by(variety) %>% * slice_max(n, n = 15) %>% ggplot(aes(x = n, y = reorder_within(bigram, n, variety))) + geom_bar(stat = "identity") + scale_x_continuous(expand = expansion(mult = c(0, .1)), name = "Count", labels = label_number_si()) + * scale_y_reordered(sep = "___") + facet_wrap(~variety, scales = "free") + labs(y = NULL, title = "Common bigrams in wine reviews") + cmu_theme() + theme(panel.grid.major.y = element_blank(), plot.title.position = "plot", strip.background = element_rect(fill = "light gray"), strip.text = element_text(color = "black")) ``` ] --- # Bigrams <img src="figs/Lec13/bigram-9-1.png" width="504" style="display: block; margin: auto;" /> --- # Another example **what elements are in the bakes of those who were eliminated and those who were star baker?** .center[![bakeoff](figs/Lec13/bakeoff.png)] --- # Bake Off <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> series </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> episode </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> baker </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> result </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> signature </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> showstopper </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Lea </td> <td style="text-align:left;"> OUT </td> <td style="text-align:left;"> Cranberry and Pistachio Cakewith Orange Flower Water Icing </td> <td style="text-align:left;"> Raspberries and Cream filled Chocolatewith Chocolate-dipped Fresh Fruit </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Mark </td> <td style="text-align:left;"> OUT </td> <td style="text-align:left;"> Sticky Marmalade Tea Loaf </td> <td style="text-align:left;"> Heart-shaped Chocolate and Beetroot Cake with Store-Bought silver chocolate hearts and chocolate red and white roses. </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Annetha </td> <td style="text-align:left;"> OUT </td> <td style="text-align:left;"> Rose Petal Shortbread </td> <td style="text-align:left;"> Pink Swirl Macarons / Eclairs </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Louise </td> <td style="text-align:left;"> OUT </td> <td style="text-align:left;"> Stained Glass Window Shortbread </td> <td style="text-align:left;"> Strawberry, Mint, and Cream Meringues Chocolate Eclairs / Orange, Yellow and Pink Macarons </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Jonathan </td> <td style="text-align:left;"> OUT </td> <td style="text-align:left;"> Anchovy, Sweet Paprika and Oregano Bread </td> <td style="text-align:left;"> Sticky Lemon Honey Bun Olive and Anchovy RollSundried Tomatoes and Fresh Herbs RollStilton, Walnut and Apple RollCinnamon and Cardamom Chelsea Bun </td> </tr> </tbody> </table> --- # Bake Off ```r challenge_results %>% filter(result == "STAR BAKER") %>% pivot_longer(c(signature, showstopper), names_to = NULL, values_to = "bake") %>% unnest_tokens(word, bake) %>% anti_join(stop_words) %>% count(word, sort = TRUE) %>% slice_max(n, n = 15) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> cake </td> <td style="text-align:right;"> 22 </td> </tr> <tr> <td style="text-align:left;"> chocolate </td> <td style="text-align:right;"> 21 </td> </tr> <tr> <td style="text-align:left;"> orange </td> <td style="text-align:right;"> 11 </td> </tr> <tr> <td style="text-align:left;"> pie </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:left;"> apple </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:left;"> caramel </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:left;"> hazelnut </td> <td style="text-align:right;"> 7 </td> </tr> </tbody> </table> --- # Bake Off ```r plot <- challenge_results %>% * filter(result %in% c("STAR BAKER","OUT")) %>% pivot_longer(c(signature, showstopper), names_to = NULL, values_to = "bake") %>% unnest_tokens(word, bake) %>% anti_join(stop_words) %>% * count(word, result, sort = TRUE) %>% * group_by(result) %>% slice_max(n, n = 15) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> result </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> cake </td> <td style="text-align:left;"> OUT </td> <td style="text-align:right;"> 27 </td> </tr> <tr> <td style="text-align:left;"> chocolate </td> <td style="text-align:left;"> OUT </td> <td style="text-align:right;"> 27 </td> </tr> <tr> <td style="text-align:left;"> orange </td> <td style="text-align:left;"> OUT </td> <td style="text-align:right;"> 13 </td> </tr> <tr> <td style="text-align:left;"> fruit </td> <td style="text-align:left;"> OUT </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:left;"> pie </td> <td style="text-align:left;"> OUT </td> <td style="text-align:right;"> 12 </td> </tr> </tbody> </table> --- # Bake Off ```r plot %>% * mutate(result = factor(result, levels = c("STAR BAKER","OUT"))) %>% ggplot(aes(x = n, * y = reorder_within(word, n, result))) + geom_bar(stat = "identity") + scale_x_continuous(expand = expansion(mult = c(0, .1)), name = "Count", * labels = label_number_si()) + * scale_y_reordered(sep = "___") + facet_wrap(~result, scales = "free") + labs(y = NULL, title = "Bake details by result") + cmu_theme() + theme(panel.grid.major.y = element_blank(), plot.title.position = "plot", strip.background = element_rect(fill = "light gray"), strip.text = element_text(color = "black")) ``` --- # Bake Off <img src="figs/Lec13/bake-6-1.png" width="504" style="display: block; margin: auto;" /> --- # Bake Off ```r compare <- challenge_results %>% filter(result %in% c("STAR BAKER","OUT")) %>% pivot_longer(c(signature, showstopper), names_to = NULL, values_to = "bake") %>% unnest_tokens(word, bake) %>% anti_join(stop_words) %>% count(word, result, sort = TRUE) %>% group_by(result) %>% mutate(proportion = n / sum(n)) %>% select(-c(n)) %>% pivot_wider(names_from = result, values_from = proportion) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> word </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> OUT </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> STAR BAKER </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> cake </td> <td style="text-align:right;"> 0.0334572 </td> <td style="text-align:right;"> 0.0324484 </td> </tr> <tr> <td style="text-align:left;"> chocolate </td> <td style="text-align:right;"> 0.0334572 </td> <td style="text-align:right;"> 0.0309735 </td> </tr> <tr> <td style="text-align:left;"> orange </td> <td style="text-align:right;"> 0.0161090 </td> <td style="text-align:right;"> 0.0162242 </td> </tr> <tr> <td style="text-align:left;"> fruit </td> <td style="text-align:right;"> 0.0148699 </td> <td style="text-align:right;"> 0.0058997 </td> </tr> </tbody> </table> --- # Bake Off ```r compare %>% ggplot(aes(x = OUT, y = `STAR BAKER`)) + geom_jitter(alpha = 0.2) + geom_text(aes(label = word), check_overlap = TRUE) + geom_abline(color = "light gray", linetype = 2) + labs(x = "Eliminated", y = "Star Baker") + cmu_theme() ``` --- # Bake Off <img src="figs/Lec13/bake-9-1.png" width="504" style="display: block; margin: auto;" /> --- # Tables .large[When are plots necessary?] <br> .medium[when position & shape is important (to spot trends, outliers, etc)] <br> .medium[when you need to see the data as a whole] <br> -- .large[When can tables be useful?] <br> .medium[when you want to look up a specific value] <br> .medium[example: qualitative value against 2 categorical variables] -- <br> .large[Package options (my preference)] <br> .medium[`kableExtra` for presentations] <br> .medium[`gt` for reports/documents] <br> .medium[both use `%>%`] --- .h1[# `kableExtra`] .tiny[ ```r wine_ratings %>% select(title, winery, variety, points, price) %>% head(3) %>% kable("html") %>% kable_styling(font_size = 16, position = "center", full_width = F) %>% row_spec(0, bold = T, color = "white", background = "#bb0000") ``` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> title </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> winery </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> variety </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> points </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> price </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> Sweet Cheeks </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> </tr> <tr> <td style="text-align:left;"> Kirkland Signature 2011 Mountain Cuvée Cabernet Sauvignon (Napa Valley) </td> <td style="text-align:left;"> Kirkland Signature </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 19 </td> </tr> <tr> <td style="text-align:left;"> Louis M. Martini 2012 Cabernet Sauvignon (Alexander Valley) </td> <td style="text-align:left;"> Louis M. Martini </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 34 </td> </tr> </tbody> </table> ] --- .h1[# `kableExtra`] .tiny[ ```r wine_ratings %>% select(title, winery, variety, points, price) %>% head(3) %>% kable("html", col.names = c("Wine Name", "Winery", "Variety", "Rating", "Price")) %>% kable_styling(font_size = 16, position = "center", full_width = F) %>% row_spec(0, bold = T, color = "white", background = "#bb0000") ``` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Wine Name </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Winery </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Variety </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Rating </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Price </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> Sweet Cheeks </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> </tr> <tr> <td style="text-align:left;"> Kirkland Signature 2011 Mountain Cuvée Cabernet Sauvignon (Napa Valley) </td> <td style="text-align:left;"> Kirkland Signature </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 19 </td> </tr> <tr> <td style="text-align:left;"> Louis M. Martini 2012 Cabernet Sauvignon (Alexander Valley) </td> <td style="text-align:left;"> Louis M. Martini </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 34 </td> </tr> </tbody> </table> ] --- .h1[# `kableExtra`] .tiny[ ```r wine_ratings %>% select(title, winery, variety, points, price) %>% head(3) %>% kable("html", col.names = c("Wine Name", "Winery", "Variety", "Rating", "Price")) %>% kable_styling(font_size = 16, position = "center", full_width = F) %>% row_spec(0, bold = T, color = "white", background = "#bb0000") %>% footnote(general = "Data from WineEnthusiast, via #TidyTuesday.") ``` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Wine Name </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Winery </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Variety </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Rating </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Price </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> Sweet Cheeks </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 65 </td> </tr> <tr> <td style="text-align:left;"> Kirkland Signature 2011 Mountain Cuvée Cabernet Sauvignon (Napa Valley) </td> <td style="text-align:left;"> Kirkland Signature </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 19 </td> </tr> <tr> <td style="text-align:left;"> Louis M. Martini 2012 Cabernet Sauvignon (Alexander Valley) </td> <td style="text-align:left;"> Louis M. Martini </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 87 </td> <td style="text-align:right;"> 34 </td> </tr> </tbody> <tfoot> <tr><td style="padding: 0; " colspan="100%"><span style="font-style: italic;">Note: </span></td></tr> <tr><td style="padding: 0; " colspan="100%"> <sup></sup> Data from WineEnthusiast, via #TidyTuesday.</td></tr> </tfoot> </table> ] --- .h1[# `kableExtra`] .tiny[ ```r wine_ratings %>% select(title, winery, variety, points, price) %>% mutate(price = dollar(price)) %>% head(3) %>% kable("html", col.names = c("Wine Name", "Winery", "Variety", "Rating", "Price")) %>% kable_styling(font_size = 16, position = "center", full_width = F) %>% row_spec(0, bold = T, color = "white", background = "#bb0000") %>% footnote(general = "Data from WineEnthusiast, via #TidyTuesday.") ``` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Wine Name </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Winery </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Variety </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Rating </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Price </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> Sweet Cheeks </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 87 </td> <td style="text-align:left;"> $65 </td> </tr> <tr> <td style="text-align:left;"> Kirkland Signature 2011 Mountain Cuvée Cabernet Sauvignon (Napa Valley) </td> <td style="text-align:left;"> Kirkland Signature </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 87 </td> <td style="text-align:left;"> $19 </td> </tr> <tr> <td style="text-align:left;"> Louis M. Martini 2012 Cabernet Sauvignon (Alexander Valley) </td> <td style="text-align:left;"> Louis M. Martini </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 87 </td> <td style="text-align:left;"> $34 </td> </tr> </tbody> <tfoot> <tr><td style="padding: 0; " colspan="100%"><span style="font-style: italic;">Note: </span></td></tr> <tr><td style="padding: 0; " colspan="100%"> <sup></sup> Data from WineEnthusiast, via #TidyTuesday.</td></tr> </tfoot> </table> ] --- .h1[# `kableExtra`] .tiny[ ```r wine_ratings %>% select(title, winery, variety, points, price) %>% mutate(price = dollar(price)) %>% head(5) %>% kable("html", col.names = c("Wine Name", "Winery", "Variety", "Rating", "Price")) %>% kable_styling(font_size = 16, position = "center", full_width = F) %>% row_spec(0, bold = T, color = "white", background = "#bb0000") %>% footnote(general = "Data from WineEnthusiast, via #TidyTuesday.") %>% scroll_box(width = "800px", height = "200px") ``` <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:200px; overflow-x: scroll; width:800px; "><table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;position: sticky; top:0; background-color: #FFFFFF;"> Wine Name </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;position: sticky; top:0; background-color: #FFFFFF;"> Winery </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;position: sticky; top:0; background-color: #FFFFFF;"> Variety </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;position: sticky; top:0; background-color: #FFFFFF;"> Rating </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;position: sticky; top:0; background-color: #FFFFFF;"> Price </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley) </td> <td style="text-align:left;"> Sweet Cheeks </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 87 </td> <td style="text-align:left;"> $65 </td> </tr> <tr> <td style="text-align:left;"> Kirkland Signature 2011 Mountain Cuvée Cabernet Sauvignon (Napa Valley) </td> <td style="text-align:left;"> Kirkland Signature </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 87 </td> <td style="text-align:left;"> $19 </td> </tr> <tr> <td style="text-align:left;"> Louis M. Martini 2012 Cabernet Sauvignon (Alexander Valley) </td> <td style="text-align:left;"> Louis M. Martini </td> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 87 </td> <td style="text-align:left;"> $34 </td> </tr> <tr> <td style="text-align:left;"> Mirassou 2012 Chardonnay (Central Coast) </td> <td style="text-align:left;"> Mirassou </td> <td style="text-align:left;"> Chardonnay </td> <td style="text-align:right;"> 87 </td> <td style="text-align:left;"> $12 </td> </tr> <tr> <td style="text-align:left;"> Acrobat 2013 Pinot Noir (Oregon) </td> <td style="text-align:left;"> Acrobat </td> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 87 </td> <td style="text-align:left;"> $20 </td> </tr> </tbody> <tfoot> <tr><td style="padding: 0; " colspan="100%"><span style="font-style: italic;">Note: </span></td></tr> <tr><td style="padding: 0; " colspan="100%"> <sup></sup> Data from WineEnthusiast, via #TidyTuesday.</td></tr> </tfoot> </table></div> ] --- .h1[# `kableExtra`] .tiny[ ```r wine_ratings %>% group_by(variety) %>% summarize(avg_rating = mean(points, na.rm = TRUE), avg_price = mean(price, na.rm = TRUE)) %>% mutate(avg_rating = round(avg_rating, 1), avg_price = dollar(avg_price)) %>% kable("html", col.names = c("Variety", "Avg. Rating", "Avg. Price")) %>% kable_styling(font_size = 16, position = "center", full_width = F) %>% row_spec(0, bold = T, color = "white", background = "#bb0000") %>% footnote(general = "Data from WineEnthusiast, via #TidyTuesday.") ``` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Variety </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Avg. Rating </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Avg. Price </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 88.6 </td> <td style="text-align:left;"> $48.25 </td> </tr> <tr> <td style="text-align:left;"> Chardonnay </td> <td style="text-align:right;"> 88.3 </td> <td style="text-align:left;"> $34.70 </td> </tr> <tr> <td style="text-align:left;"> Pinot Noir </td> <td style="text-align:right;"> 89.4 </td> <td style="text-align:left;"> $47.88 </td> </tr> </tbody> <tfoot> <tr><td style="padding: 0; " colspan="100%"><span style="font-style: italic;">Note: </span></td></tr> <tr><td style="padding: 0; " colspan="100%"> <sup></sup> Data from WineEnthusiast, via #TidyTuesday.</td></tr> </tfoot> </table> ] --- .h1[# `kableExtra`] .tiny[ ```r wine_ratings %>% group_by(variety) %>% summarize(avg_rating = mean(points, na.rm = TRUE), avg_price = mean(price, na.rm = TRUE)) %>% mutate(avg_rating = round(avg_rating, 1), avg_price = dollar(avg_price)) %>% kable("html", col.names = c("Variety", "Avg. Rating", "Avg. Price")) %>% kable_styling(font_size = 16, position = "center", full_width = F) %>% row_spec(c(0, 3), bold = T, color = "white", background = "#bb0000") %>% footnote(general = "Data from WineEnthusiast, via #TidyTuesday.") ``` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Variety </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Avg. Rating </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Avg. Price </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 88.6 </td> <td style="text-align:left;"> $48.25 </td> </tr> <tr> <td style="text-align:left;"> Chardonnay </td> <td style="text-align:right;"> 88.3 </td> <td style="text-align:left;"> $34.70 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Pinot Noir </td> <td style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> 89.4 </td> <td style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> $47.88 </td> </tr> </tbody> <tfoot> <tr><td style="padding: 0; " colspan="100%"><span style="font-style: italic;">Note: </span></td></tr> <tr><td style="padding: 0; " colspan="100%"> <sup></sup> Data from WineEnthusiast, via #TidyTuesday.</td></tr> </tfoot> </table> ] --- .h1[# `kableExtra`] .tiny[ ```r wine_ratings %>% group_by(variety) %>% summarize(avg_rating = mean(points, na.rm = TRUE), avg_price = mean(price, na.rm = TRUE)) %>% mutate(avg_rating = round(avg_rating, 1), avg_price = dollar(avg_price)) %>% kable("html", col.names = c("Variety", "Avg. Rating", "Avg. Price"), caption = "Pinot Noir is, on average, the highest-rated wine") %>% kable_styling(font_size = 16, position = "center", full_width = F) %>% row_spec(c(0, 3), bold = T, color = "white", background = "#bb0000") %>% footnote(general = "Data from WineEnthusiast, via #TidyTuesday.") ``` <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;"> <caption style="font-size: initial !important;">Pinot Noir is, on average, the highest-rated wine</caption> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Variety </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Avg. Rating </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Avg. Price </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Cabernet Sauvignon </td> <td style="text-align:right;"> 88.6 </td> <td style="text-align:left;"> $48.25 </td> </tr> <tr> <td style="text-align:left;"> Chardonnay </td> <td style="text-align:right;"> 88.3 </td> <td style="text-align:left;"> $34.70 </td> </tr> <tr> <td style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> Pinot Noir </td> <td style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> 89.4 </td> <td style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> $47.88 </td> </tr> </tbody> <tfoot> <tr><td style="padding: 0; " colspan="100%"><span style="font-style: italic;">Note: </span></td></tr> <tr><td style="padding: 0; " colspan="100%"> <sup></sup> Data from WineEnthusiast, via #TidyTuesday.</td></tr> </tfoot> </table> ] --- # `gt` .center[![gt](figs/Lec13/gt.svg)] .right[image source: [https://gt.rstudio.com/](https://gt.rstudio.com/)] .left[*more details in the .html!*] --- # Upcoming <br> .large[Lab 9 on Thursday June 24] <br> .large[Lecture 14 on Friday June 25] <br> .medium[presentations with `xaringan`] <br> .large[Group project check-in due Friday]