class: inverse, center, middle # 36-315: Statistical Graphics and Visualization ## Lecture 8 Meghan Hall <br> Department of Statistics & Data Science <br> Carnegie Mellon University <br> June 9, 2021 --- layout: true <div class="my-footer"><span>cmu-36315.netlify.app</span></div> --- # From last time <br> .large[Grab bag!] <br> .medium[maps, pie charts, heat maps] <br> .large[Data manipulation] <br> .medium[aggregating, joining, pivoting] --- # Today <br> .large[Taking plots to the next level] <br> .medium[effective *and* elegant] <br> .large[Redoing plots we've seen before] <br> .medium[get familiar with the syntax and the options] --- # Intro <br> <br> .large[Previously: exploratory data viz] <br> .medium[aesthetics are less important, as long as you're not distorting your data] <br> <br> -- <br> .large[Moving toward explanatory data viz for presentation] <br> .medium[data viz = communication] <br> <br> -- <br> .large[What's necessary vs. what's aesthetic preference] <br> .medium[there are always bad options, but there's rarely only one good option] --- # Data-ink ratio <br> <br> .large[The ratio of your data to everything else] <br> .medium[popularized by Edward Tufte] <br> <br> -- <br> .large[Be mindful, but don't go too far] <br> <br> -- <br> .large[The "extras" may add visual clutter, technically, but they're essential for orienting yourself] <br> .medium[gridlines, axes, etc.] <br> <br> --- # Titles <br> <br> .large[The purpose is to make a point!] <br> <br> -- <br> .large[Don't be afraid to put your conclusion in your title] <br> .medium[guide your reader to the purpose of the graph] <br> .medium[why am I making this plot?] <br> <br> -- <br> .large[Use subtitles and captions to add extra context or data source info] <br> <br> --- # Axes <br> <br> .large[When axis labels aren't necessary] <br> .medium[generally: time, regions, many categorical variables] <br> <br> -- <br> .large[Adjust font size!] <br> .medium[frequently knit to your desired output to check scale] <br> .medium[axis labels are often too small] <br> <br> --- # Legends <br> <br> .large[Limit their presence when possible] <br> .medium[move onto the plot if there's space] <br> .medium[label relevant points instead] <br> .medium[or match colors to words in the title] <br> <br> -- <br> .large[Order in a meaningful way] <br> .medium[match ordering of lines to the legend] <br> <br> --- # Multiple plots <br> <br> .large[Maintain the same color scheme throughout] <br> .medium[reuse/share legends where possible] <br> <br> -- <br> .large[Move from simple to more complex] <br> <br> -- <br> .large[Diversity chart types] <br> <br> -- <br> .large[Generally one "point" per graph] --- # Syntax <br> <br> .large[Never worry about memorizing all the syntax] <br> .medium[Google is your friend] <br> <br> -- <br> .large[Be aware of the guidelines we'll discuss] <br> .medium[and how out-of-the-box ggplots can fall short] <br> <br> -- <br> .large[Know what options are available so you can look up details] --- # Some golden rules of graphs <br> <br> Don’t add complexity without a good reason. Everything (everything!) must be readable. Don’t distort data, intentionally or not. Be mindful of the data-to-ink ratio. All axes, labels, etc. should have real titles, not code variable names. Always strive for clarity. Titles, subtitles, and captions should add information. --- # 1: HW 1, Problem 2 <img src="figs/Lec8/1-1-1.png" width="504" style="display: block; margin: auto;" /> --- # 1: HW 1, Problem 2 <img src="figs/Lec8/1-2-1.png" width="504" style="display: block; margin: auto;" /> --- .h-1[# 1: HW 1, Problem 2] .tiny[ ```r mpg %>% filter(year == 2008 & class == "suv") %>% group_by(manufacturer) %>% summarize(median = median(cty), n = n()) %>% * mutate(manufacturer = str_to_title(manufacturer), * label = paste0(manufacturer, "\n", n, * ifelse(n > 1, " models", " model"))) %>% ggplot(aes(x = reorder(label, -median), y = median)) + geom_bar(stat = "identity") + * scale_y_continuous(expand = expansion(mult = c(0, .1))) + * labs(title = "Subaru SUVs have highest median city mpg", subtitle = "Among manufacturers in 2008", x = NULL, y = "Miles per gallon") + * theme(axis.ticks = element_blank(), * panel.background = element_blank(), * panel.grid.major.y = element_line(color = "grey90", size = 0.2), * panel.border = element_rect(color = "black", fill = NA, size = 0.5)) ``` ] --- # 2: HW 1, Problem 3 <img src="figs/Lec8/2-1-1.png" width="504" style="display: block; margin: auto;" /> --- # 2: HW 1, Problem 3 <img src="figs/Lec8/2-2-1.png" width="504" style="display: block; margin: auto;" /> --- .h-1[# 2: HW 1, Problem 3] .tiny[ ```r mpg %>% filter(manufacturer %in% c("toyota","dodge","audi","nissan") & year == 2008) %>% group_by(manufacturer, drv) %>% summarize(mean = mean(hwy)) %>% * mutate(manufacturer = str_to_title(manufacturer), * drv = fct_recode(drv, "4-wheel drive" = "4", "front-wheel drive" = "f")) %>% ggplot(aes(x = manufacturer, y = mean, fill = drv)) + geom_bar(stat = "identity", position = "dodge") + * scale_fill_manual(values = c("#707078", "#BB0000"), name = NULL) + geom_text(aes(label = round(mean, 1)), position = position_dodge(width = 0.9), * vjust = 1.5, color = "white", fontface = 2) + * labs(title = "Average highway miles per gallon is higher in front-wheel drive vehicles", subtitle = "Among selected manufacturers in 2008", x = NULL, y = "Miles per gallon") + theme(axis.ticks = element_blank(), panel.background = element_blank(), panel.grid.major.y = element_line(color = "grey90", size = 0.2), panel.border = element_rect(color = "black", fill = NA, size = 0.5), * legend.position = c(0.44, 0.85), * legend.text = element_text(size = 11)) ``` ] --- # 3: Lab 3, Problem 3 <img src="figs/Lec8/3-1-1.png" width="504" style="display: block; margin: auto;" /> --- # 3: Lab 3, Problem 3 <img src="figs/Lec8/3-2-1.png" width="504" style="display: block; margin: auto;" /> --- .h-1[# 3: Lab 3, Problem 3] .tiny[ ```r txhousing %>% filter(year >= 2010) %>% ggplot(aes(x = as.character(year), y = median)) + geom_violin(draw_quantiles = c(0.25, 0.5, 0.75)) + * geom_jitter(alpha = .25, width = .3, size = 0.5, color = "#bb0000") + * scale_y_continuous(labels = dollar, breaks = seq(100000, 300000, 50000)) + labs(title = "The distribution of median home prices by city in Texas") + theme(axis.title = element_blank(), panel.background = element_blank(), panel.grid.major.y = element_line(color = "grey90", size = 0.2), panel.border = element_rect(color = "black", fill = NA, size = 0.5), axis.ticks = element_blank(), * axis.text = element_text(size = 10, face = 2), * plot.title.position = "plot") ``` ] --- # 4: Lecture 5 <img src="figs/Lec8/4-1-1.png" width="504" style="display: block; margin: auto;" /> --- # 4: Lecture 5 <img src="figs/Lec8/4-2-1.png" width="504" style="display: block; margin: auto;" /> --- .h-1[# 4: Lecture 5] .tiny[ ```r constructor_pts %>% filter(year == 2020) %>% mutate(third = ifelse(name %in% c("McLaren","Renault","Racing Point"), name, "z"), label = if_else(round == max(round) & third != "z", name, NA_character_)) %>% ggplot(aes(x = round, y = points, color = third, group = name, size = third, alpha = third)) + geom_line() + geom_label_repel(aes(label = label), size = 4.5) + scale_x_continuous(breaks = seq(1, 17, 1)) + scale_color_manual(values = c("#E0610E","#F596C8", "#FFF500","dark grey")) + scale_size_manual(values = c(1.5, 1.5, 1.5, 0.75)) + scale_alpha_manual(values = c(1, 1, 1, 0.3)) + * labs(title = "The race for third place during the 2020 F1 season", * subtitle = "While Mercedes and Red Bull ran off with the first two placings, three teams battled all year long for third place", x = "Race Round", y = "Accumulated Points") + theme(legend.position = "none", panel.background = element_blank(), * panel.grid.major.y = element_line(color = "grey90", size = 0.2), axis.ticks = element_blank(), panel.border = element_rect(color = "black", fill = NA, size = 0.5)) ``` ] --- # 5: Lecture 6 <img src="figs/Lec8/5-1-1.png" width="504" style="display: block; margin: auto;" /> --- # 5: Lecture 6 <img src="figs/Lec8/5-2-1.png" width="504" style="display: block; margin: auto;" /> --- .h-1[# 5: Lecture 6] .tiny[ ```r friends_info %>% * mutate(label = ifelse(us_views_millions > 50, title, NA_character_)) %>% ggplot(aes(x = us_views_millions, y = imdb_rating, color = season)) + geom_jitter(size = 2) + scale_colour_gradient(low = "#fafafa",high = "#191970",breaks = seq(1, 10, 1), name = "Season") + * geom_label_repel(aes(label = label, x = us_views_millions, * y = imdb_rating), size = 4, inherit.aes = FALSE) + * scale_x_continuous(labels = label_number(suffix = "M")) + scale_y_continuous(breaks = seq(7, 10, 0.5)) + * labs(title = "Two Friends episodes were viewed far more than the rest", * subtitle = "Both highlighted episodes were two-part episodes", x = "US views", y = "IMDB rating") + * theme(legend.position = c(0.75, 0.08), * legend.direction = "horizontal", * legend.background = element_blank(), * legend.title = element_text(color = "#353839", size = 11, face = "bold", vjust = 0.75), axis.ticks = element_blank(), panel.background = element_blank(), panel.border = element_rect(color = "black", fill = NA, size = 0.5), panel.grid.major = element_line(color = "grey90", size = 0.3)) ``` ] --- # 6: Lecture 6 <img src="figs/Lec8/6-1-1.png" width="504" style="display: block; margin: auto;" /> --- # 6: Lecture 6 <img src="figs/Lec8/6-2-1.png" width="504" style="display: block; margin: auto;" /> --- .h-1[# 6: Lecture 6] .tiny[ ```r penguins %>% filter(species != "Gentoo") %>% mutate(label = case_when(flipper_length_mm == 192 & body_mass_g == 2700 ~ "Chinstrap", flipper_length_mm == 184 & body_mass_g == 4650 ~ "Adelie")) %>% ggplot(aes(x = flipper_length_mm, y = body_mass_g, size = bill_length_mm, color = species)) + geom_point(alpha = 0.5) + scale_size(range = c(0.1, 7), breaks = c(35, 40, 45, 50, 55), name = "Bill Length (mm)") + * geom_label_repel(aes(x = flipper_length_mm, y = body_mass_g, * color = species, label = label), inherit.aes = FALSE) + * scale_color_discrete(guide = "none") + labs(x = "Flipper Length (mm)", y = "Body Mass (g)", title = "Chinstrap penguins tend to have longer flippers and longer bills", * caption = "Data from the palmerpenguins package") + theme(legend.position = "top", panel.background = element_blank(), * panel.grid.major = element_line(color = "grey90", size = 0.2), * axis.ticks = element_line(color = "grey90", size = 0.2), * legend.key = element_rect(fill = "transparent")) ``` ] --- # 7: Lecture 7 <img src="figs/Lec8/7-1-1.png" width="504" style="display: block; margin: auto;" /> --- # 7: Lecture 7 <img src="figs/Lec8/7-2-1.png" width="504" style="display: block; margin: auto;" /> --- .h-1[# 7: Lecture 7] .tiny[ ```r lincoln_weather %>% select(CST, temp = `Max Temperature [F]`) %>% mutate(date = ymd(CST), * month = month(date, label = TRUE), day = day(date)) %>% ggplot(aes(x = month, y = day, fill = temp)) + geom_tile(color = "white") + scale_y_continuous(trans = "reverse", breaks = seq(1, 31, 5)) + labs(title = "Maximum Temperature by day in Lincoln, NE in 2016") + * geom_text(aes(color = temp < 40, label = temp), size = 2) + * scale_color_manual(guide = FALSE, values = c("black", "white")) + scale_fill_gradient(low = "blue", high = "yellow", breaks = seq(20, 90, 10), name = " °F") + * guides(fill = guide_colorsteps()) + theme(axis.ticks = element_blank(), * panel.background = element_blank(), axis.title = element_blank(), * axis.text = element_text(face = 2)) + coord_cartesian(expand = FALSE) ``` ] --- # 8: Lecture 7 <img src="figs/Lec8/8-1-1.png" width="504" style="display: block; margin: auto;" /> --- # 8: Lecture 7 <img src="figs/Lec8/8-2-1.png" width="504" style="display: block; margin: auto;" /> --- .h-1[# 8: Lecture 7] .tiny[ ```r penguins %>% ggplot(aes(x = body_mass_g, y = ..count..)) + geom_density_line(data = select(penguins, -species), * aes(fill = "all penguins"), color = "transparent") + * geom_density_line(aes(fill = "species"), color = "transparent") + facet_wrap(~species, nrow = 1) + scale_fill_manual(values = c("grey","#0C8346"), name = NULL, guide = guide_legend(direction = "horizontal")) + labs(x = "Body Mass (g)", * title = "Comparing the distribution of body mass by penguin species", * subtitle = "Gentoo penguins tend to be the heaviest") + theme(legend.position = "bottom", panel.background = element_blank(), * panel.grid.major = element_line(color = "grey90", size = 0.3), * strip.text = element_text(face = "bold", color = "white"), * strip.background = element_rect(fill = "#0C8346"), panel.border = element_rect(color = "black", fill = NA, size = 0.5), axis.ticks = element_blank(), * plot.title.position = "plot") ``` ] --- # 9: HW 2, Problem 5 <img src="figs/Lec8/9-1-1.png" width="504" style="display: block; margin: auto;" /> --- # 9: HW 2, Problem 5 <img src="figs/Lec8/9-2-1.png" width="504" style="display: block; margin: auto;" /> --- .h-1[# 9: HW 2, Problem 5] .tiny[ ```r penguins %>% ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) + geom_point() + * geom_smooth(method = "lm", show.legend = FALSE, alpha = 0.3) + * labs(title = "Bill depth is positively correlated with bill length, regardless of species", y = "Bill Depth (mm)", x = "Bill Length (mm)") + * guides(color = guide_legend(override.aes = list(size = 5))) + theme(legend.position = c(0.1, 0.1), legend.title = element_blank(), panel.background = element_blank(), panel.grid.major = element_line(color = "grey90", size = 0.3), axis.ticks = element_line(color = "grey90", size = 0.3), * legend.key = element_rect(fill = "transparent")) ``` ] --- # 10: Lab 5, Problem 1 <img src="figs/Lec8/10-1-1.png" width="504" style="display: block; margin: auto;" /> --- # 10: Lab 5, Problem 1 <img src="figs/Lec8/10-2-1.png" width="504" style="display: block; margin: auto;" /> --- .h-1[# 10: Lab 5, HW 1] .tiny[ ```r msleep %>% filter(vore %in% c("carni","herbi")) %>% mutate(name = fct_reorder(name, sleep_total), name = fct_reorder(name, vore), * vore = fct_recode(vore, "herbivore" = "herbi", "carnivore" = "carni")) %>% group_by(vore) %>% mutate(mean = mean(sleep_total)) %>% ggplot(aes(x = sleep_total, y = name, color = vore)) + geom_point(size = 2) + scale_color_manual(values = c("#bb0000","#098641")) + * scale_x_continuous(name = NULL, labels = number_format(suffix = " hrs", accuracy = 1)) + labs(y = NULL, x = "Total sleep per day (hrs)", * title = "<span style = 'color:#bb0000;'>**Carnivores**</span> sleep slightly more per day than <span style = 'color:#098641;'>**herbivores**</span>", subtitle = "Among these species, carnivores sleep one hour more per day on average") + * theme(plot.title = element_markdown(), * plot.subtitle = element_text(margin = margin(-5, 0, 10, 0)), plot.title.position = "plot", legend.position = "none", panel.background = element_blank(), plot.background = element_rect(fill = "transparent", color = NA), panel.grid.major = element_line(color = "black", size = 0.1), axis.ticks = element_blank(), * axis.text.y = element_text(size = 6.75)) ``` ] --- # Upcoming <br> .large[Lab 6 on Thursday June 10] <br> .large[Lecture 9 on Friday June 11] <br> .large[Homework 2 due Tuesday June 15]