class: inverse, center, middle # 36-315: Statistical Graphics and Visualization ## Lecture 5 Meghan Hall <br> Department of Statistics & Data Science <br> Carnegie Mellon University <br> June 2, 2021 --- layout: true <div class="my-footer"><span>cmu-36315.netlify.app</span></div> --- # From last time <br> .large[Graphing distributions] <br> .medium[Various techniques and considerations] <br> .large[Histograms and box plots] <br> .medium[And density plots and violin plots] --- # Updates .large[Homework] <br> .medium[solution posted] <br> .medium[how to "describe" a graph] -- <img src="figs/Lec5/homework-1.png" width="504" style="display: block; margin: auto;" /> --- # `object not found` <br> .large[The consequence of `group_by` and `summarize`] <br> .medium[Only grouping variables are available] ```r mpg %>% group_by(drv) %>% summarize(mean_hwy_mpg = mean(hwy)) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> drv </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> mean_hwy_mpg </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 19.17476 </td> </tr> <tr> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 28.16038 </td> </tr> <tr> <td style="text-align:left;"> r </td> <td style="text-align:right;"> 21.00000 </td> </tr> </tbody> </table> --- # `object not found` <br> .large[The consequence of `group_by` and `summarize`] <br> .medium[Only grouping variables are available] <br> .large[Search for missing `%>%` and `+`] ```r mpg %>% filter(year == 2008) %>% group_by(manufacturer) summarize(median_size = median(cty, na.rm = TRUE)) ``` --- # Today <br> .large[Line graphs] <br> .medium[Various techniques and considerations] <br> .large[Working with time] <br> .medium[`lubridate` package] --- class: left # Today's agenda <br> .large[ 1. the basics of line graphs 2. using multiple groups, avoiding "spaghetti graphs" 3. working with `lubridate` 4. the line graph debate 5. slope graphs ] --- class: left # Today's agenda <br> .large[ 1. **the basics of line graphs** 2. using multiple groups, avoiding "spaghetti graphs" 3. working with `lubridate` 4. the line graph debate 5. slope graphs ] --- # When to use a line graph <br> .large[Most commonly: when looking at values over time] <br> .medium[can also be used with ordered categorical data (e.g., month)] <br> <br> -- <br> .large[`y`: values of whatever we're measuring] <br> .large[`x`: when the measurement was taken] <br> .medium[most often chronologically, or time via a different variable] <br> <br> -- <br> .large[Most important: meaningful relationship between successive points on the x] --- # Axis of a line graph <br> .large[The axis of a bar graph **must** start at zero] <br> .medium[encoding data by the bars and comparing by length] <br> .medium[otherwise can be misleading] <br> <br> -- <br> <br> .large[But what about a line graph?] <br> .medium[encoding data by position, not length] <br> .medium[include zero if it makes sense] --- # Axis of a line graph <br> .huge[*Don't* start the axis at zero if:] <br> <br> <br> <br> .large[the range of data is small but the distance from the bottom of the range to zero is large] <br> <br> <br> --- # Axis of a line graph <img src="figs/Lec5/stock-1-1.png" width="504" style="display: block; margin: auto;" /> --- # Axis of a line graph <img src="figs/Lec5/stock-2-1.png" width="504" style="display: block; margin: auto;" /> --- # Axis of a line graph <br> .huge[*Don't* start the axis at zero if:] <br> <br> <br> <br> .large[the range of data is small but the distance from the bottom of the range to zero is large] <br> <br> <br> .large[the relationship to zero is insignificant] <br> <br> --- # Axis of a line graph <img src="figs/Lec5/fever-1-1.png" width="504" style="display: block; margin: auto;" /> --- # Axis of a line graph <img src="figs/Lec5/fever-2-1.png" width="504" style="display: block; margin: auto;" /> --- # Today's data .center[![F1](figs/Lec5/F1.png)] <br> <br> .center[`constructor_pts`]<br> .center[`BritishGP`]<br> .center[`driver_pts`]<br> --- # Today's data <br> .center[`constructor_pts`]<br> <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> name </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> year </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> date </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> round </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:left;"> 2020-07-05 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 37 </td> </tr> <tr> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:left;"> 2020-07-12 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 80 </td> </tr> <tr> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:left;"> 2020-07-19 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 121 </td> </tr> <tr> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:left;"> 2020-08-02 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 146 </td> </tr> <tr> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:left;"> 2020-08-09 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 180 </td> </tr> <tr> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:left;"> 2020-08-16 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 221 </td> </tr> <tr> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:left;"> 2020-08-30 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 264 </td> </tr> <tr> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:left;"> 2020-09-06 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 281 </td> </tr> <tr> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:left;"> 2020-09-13 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 325 </td> </tr> <tr> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:left;"> 2020-09-27 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 366 </td> </tr> </tbody> </table> --- # Line graph basics ```r constructor_pts %>% filter(year == 2020 & name == "McLaren") %>% ggplot(aes(x = round, y = points)) + geom_line() ``` <img src="figs/Lec5/basic-1-1.png" width="504" style="display: block; margin: auto;" /> --- # Line graph basics ```r constructor_pts %>% filter(year == 2020 & name == "McLaren") %>% ggplot(aes(x = round, y = points)) + geom_area(fill = "#E0610E") ``` <img src="figs/Lec5/basic-2-1.png" width="504" style="display: block; margin: auto;" /> --- # Line graph basics ```r constructor_pts %>% filter(year == 2020 & name == "McLaren") %>% ggplot(aes(x = round, y = points)) + geom_line(color = "#E0610E", size = 1.5, linetype = "dotted") ``` <img src="figs/Lec5/basic-3-1.png" width="504" style="display: block; margin: auto;" /> --- # Line graph basics ```r constructor_pts %>% filter(year == 2020 & name == "McLaren") %>% ggplot(aes(x = round, y = points)) + geom_line(color = "#E0610E", size = 1.5) + geom_point(alpha = 0.5, color = "black", size = 2) ``` <img src="figs/Lec5/basic-4-1.png" width="504" style="display: block; margin: auto;" /> --- # Line graph basics ```r constructor_pts %>% filter(year == 2020 & name == "McLaren") %>% ggplot(aes(x = round, y = points)) + geom_line() + * scale_x_continuous(breaks = seq(1, 17, 1)) ``` <img src="figs/Lec5/basic-5-1.png" width="504" style="display: block; margin: auto;" /> --- class: left # Today's agenda <br> .large[ 1. the basics of line graphs 2. **using multiple groups, avoiding "spaghetti graphs"** 3. working with `lubridate` 4. the line graph debate 5. slope graphs ] --- # Multiple groups ```r constructor_pts %>% filter(year == 2020) %>% ggplot(aes(x = round, y = points, color = name)) + geom_line() + scale_x_continuous(breaks = seq(1, 17, 1)) ``` <img src="figs/Lec5/multi-1-1.png" width="504" style="display: block; margin: auto;" /> --- # Multiple groups **one way to reorder a legend** ```r constructor_pts %>% filter(year == 2020) %>% ggplot(aes(x = round, y = points, color = reorder(name, -points))) + geom_line() + scale_x_continuous(breaks = seq(1, 17, 1)) + labs(color = "") ``` <img src="figs/Lec5/multi-2-1.png" width="504" style="display: block; margin: auto;" /> --- # Multiple groups ```r constructor_pts %>% filter(year == 2020) %>% * mutate(mclaren = ifelse(name == "McLaren","highlight","normal")) %>% ggplot(aes(x = round, y = points, color = mclaren)) + geom_line() + scale_x_continuous(breaks = seq(1, 17, 1)) ``` -- <img src="figs/Lec5/multi-3-1.png" width="504" style="display: block; margin: auto;" /> --- # Multiple groups **when color doesn't distinguish all lines, need group** ```r constructor_pts %>% filter(year == 2020) %>% * mutate(mclaren = ifelse(name == "McLaren","highlight","normal")) %>% ggplot(aes(x = round, y = points, color = mclaren, group = name)) + geom_line() + scale_x_continuous(breaks = seq(1, 17, 1)) + * scale_color_manual(values = c("#E0610E", "black")) ``` --- # Multiple groups <img src="figs/Lec5/multi-4-1.png" width="504" style="display: block; margin: auto;" /> --- # Multiple groups **further distinguish highlighted group with size and alpha** ```r constructor_pts %>% filter(year == 2020) %>% mutate(mclaren = ifelse(name == "McLaren","highlight","normal")) %>% ggplot(aes(x = round, y = points, color = mclaren, group = name, * size = mclaren, alpha = mclaren)) + geom_line() + scale_x_continuous(breaks = seq(1, 17, 1)) + scale_color_manual(values = c("#E0610E", "dark grey")) + * scale_size_manual(values = c(1.5, 0.75)) + * scale_alpha_manual(values = c(1, 0.3)) ``` --- # Multiple groups <img src="figs/Lec5/multi-5-1.png" width="504" style="display: block; margin: auto;" /> --- # Multiple groups ```r constructor_pts %>% filter(year == 2020) %>% mutate(mclaren = ifelse(name == "McLaren","highlight","normal")) %>% ggplot(aes(x = round, y = points, color = mclaren, group = name, size = mclaren, alpha = mclaren)) + geom_line() + scale_x_continuous(breaks = seq(1, 17, 1)) + scale_color_manual(values = c("#E0610E", "dark grey")) + scale_size_manual(values = c(1.5, 0.75)) + scale_alpha_manual(values = c(1, 0.3)) + * labs(title = "Accumulated points by McLaren during the 2020 season", * x = "Race Round", * y = "Accumulated Points") + * theme(legend.position = "none") ``` --- # Multiple groups <img src="figs/Lec5/multi-6-1.png" width="504" style="display: block; margin: auto;" /> --- # Multiple groups ```r constructor_pts %>% filter(year == 2020) %>% * mutate(third = ifelse(name %in% * c("McLaren","Renault","Racing Point"), name, "z")) %>% ggplot(aes(x = round, y = points, color = third, group = name, size = third, alpha = third)) + geom_line() + scale_x_continuous(breaks = seq(1, 17, 1)) + scale_color_manual(values = c("#E0610E","#FFF500", "#F596C8","dark grey")) + scale_size_manual(values = c(1.5, 1.5, 1.5, 0.75)) + scale_alpha_manual(values = c(1, 1, 1, 0.3)) + labs(title = "The race for third place during the 2020 season", x = "Race Round", y = "Accumulated Points") ``` --- # Multiple groups <img src="figs/Lec5/multi-7-1.png" width="504" style="display: block; margin: auto;" /> --- # Multiple groups ```r constructor_pts %>% filter(year == 2020) %>% mutate(third = ifelse(name %in% c("McLaren","Renault","Racing Point"), name, "z")) %>% ggplot(aes(x = round, y = points, color = third, group = name, size = third, alpha = third)) + geom_line() + * geom_text(aes(label = name), size = 4) + scale_x_continuous(breaks = seq(1, 17, 1)) + scale_color_manual(values = c("#E0610E","#F596C8", "#FFF500","dark grey")) + scale_size_manual(values = c(1.5, 1.5, 1.5, 0.75)) + scale_alpha_manual(values = c(1, 1, 1, 0.3)) + labs(title = "The race for third place during the 2020 season", x = "Race Round", y = "Accumulated Points") + theme(legend.position = "none") ``` --- # Multiple groups <img src="figs/Lec5/multi-8-1.png" width="504" style="display: block; margin: auto;" /> --- # Multiple groups ```r constructor_pts %>% filter(year == 2020) %>% mutate(third = ifelse(name %in% c("McLaren","Renault","Racing Point"), name, "z"), * label = if_else(round == max(round) & * third != "z", name, NA_character_)) %>% ggplot(aes(x = round, y = points, color = third, group = name, size = third, alpha = third)) + geom_line() + * geom_text(aes(label = label), size = 4) + scale_x_continuous(breaks = seq(1, 17, 1)) + scale_color_manual(values = c("#E0610E","#F596C8", "#FFF500","dark grey")) + scale_size_manual(values = c(1.5, 1.5, 1.5, 0.75)) + scale_alpha_manual(values = c(1, 1, 1, 0.3)) + labs(title = "The race for third place during the 2020 season", x = "Race Round", y = "Accumulated Points") + theme(legend.position = "none") ``` --- # Multiple groups <img src="figs/Lec5/multi-9-1.png" width="504" style="display: block; margin: auto;" /> --- # Multiple groups ```r constructor_pts %>% filter(year == 2020) %>% mutate(third = ifelse(name %in% c("McLaren","Renault","Racing Point"), name, "z"), label = if_else(round == max(round) & third != "z", name, NA_character_)) %>% ggplot(aes(x = round, y = points, color = third, group = name, size = third, alpha = third)) + geom_line() + * geom_label_repel(aes(label = label), size = 4) + scale_x_continuous(breaks = seq(1, 17, 1)) + scale_color_manual(values = c("#E0610E","#F596C8", "#FFF500","dark grey")) + scale_size_manual(values = c(1.5, 1.5, 1.5, 0.75)) + scale_alpha_manual(values = c(1, 1, 1, 0.3)) + labs(title = "The race for third place during the 2020 season", x = "Race Round", y = "Accumulated Points") + theme(legend.position = "none") ``` --- # Multiple groups <img src="figs/Lec5/multi-10-1.png" width="504" style="display: block; margin: auto;" /> --- class: left # Today's agenda <br> .large[ 1. the basics of line graphs 2. using multiple groups, avoiding "spaghetti graphs" 3. **working with `lubridate`** 4. the line graph debate 5. slope graphs ] --- # Today's data <br> .center[`BritishGP`]<br> <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> race_name </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> surname </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> name </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> year </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> round </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> q1 </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> q2 </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> q3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Hamilton </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 1:25.900 </td> <td style="text-align:left;"> 1:25.347 </td> <td style="text-align:left;"> 1:24.303 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Bottas </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 1:25.801 </td> <td style="text-align:left;"> 1:25.015 </td> <td style="text-align:left;"> 1:24.616 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Verstappen </td> <td style="text-align:left;"> Red Bull </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 1:26.115 </td> <td style="text-align:left;"> 1:26.144 </td> <td style="text-align:left;"> 1:25.325 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> Ferrari </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 1:26.550 </td> <td style="text-align:left;"> 1:26.203 </td> <td style="text-align:left;"> 1:25.427 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Norris </td> <td style="text-align:left;"> McLaren </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 1:26.855 </td> <td style="text-align:left;"> 1:26.420 </td> <td style="text-align:left;"> 1:25.782 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Stroll </td> <td style="text-align:left;"> Racing Point </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 1:26.243 </td> <td style="text-align:left;"> 1:26.501 </td> <td style="text-align:left;"> 1:25.839 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Sainz </td> <td style="text-align:left;"> McLaren </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 1:26.715 </td> <td style="text-align:left;"> 1:26.149 </td> <td style="text-align:left;"> 1:25.965 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Ricciardo </td> <td style="text-align:left;"> Renault </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 1:26.677 </td> <td style="text-align:left;"> 1:26.339 </td> <td style="text-align:left;"> 1:26.009 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Ocon </td> <td style="text-align:left;"> Renault </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 1:26.396 </td> <td style="text-align:left;"> 1:26.252 </td> <td style="text-align:left;"> 1:26.209 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Vettel </td> <td style="text-align:left;"> Ferrari </td> <td style="text-align:right;"> 2020 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 1:26.469 </td> <td style="text-align:left;"> 1:26.455 </td> <td style="text-align:left;"> 1:26.339 </td> </tr> </tbody> </table> --- # Working with `lubridate` **how do lap times change through qualifying?** ```r BritishGP %>% filter(q3 != "\\N") %>% pivot_longer(q1:q3, values_to = "time", names_to = "qround") ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> race_name </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> surname </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> name </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> qround </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> time </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Hamilton </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:left;"> q1 </td> <td style="text-align:left;"> 1:25.900 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Hamilton </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:left;"> q2 </td> <td style="text-align:left;"> 1:25.347 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Hamilton </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:left;"> q3 </td> <td style="text-align:left;"> 1:24.303 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Bottas </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:left;"> q1 </td> <td style="text-align:left;"> 1:25.801 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Bottas </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:left;"> q2 </td> <td style="text-align:left;"> 1:25.015 </td> </tr> <tr> <td style="text-align:left;"> British Grand Prix </td> <td style="text-align:left;"> Bottas </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:left;"> q3 </td> <td style="text-align:left;"> 1:24.616 </td> </tr> </tbody> </table> --- # Working with `lubridate` **how do lap times change through qualifying?** ```r BritishGP %>% filter(q3 != "\\N") %>% pivot_longer(q1:q3, values_to = "time", names_to = "qround") %>% * ggplot(aes(x = qround, y = time, color = name, group = surname)) + geom_line() ``` -- <img src="figs/Lec5/lubridate-1-1.png" width="504" style="display: block; margin: auto;" /> --- # Working with `lubridate` ```r class(BritishGP$q1) ``` ``` ## [1] "character" ``` -- .center[![lubridate](figs/Lec5/lubridate.png)] <br> .center[.medium[working with dates & times in R:]] <br> .center[[https://lubridate.tidyverse.org/](https://lubridate.tidyverse.org/)] --- # Working with `lubridate` **how do lap times change through qualifying?** ```r BritishGP %>% filter(q3 != "\\N") %>% pivot_longer(q1:q3, values_to = "time", names_to = "qround") %>% * mutate(time_format = ms(time), * seconds = seconds(time_format), * duration = as.duration(time_format)) ``` -- ``` ## # A tibble: 5 x 6 ## surname qround time time_format seconds duration ## <chr> <chr> <chr> <Period> <Period> <Duration> ## 1 Hamilton q1 1:25.900 1M 25.9S 85.9S 85.9s (~1.43 minutes) ## 2 Hamilton q2 1:25.347 1M 25.347S 85.347S 85.347s (~1.42 minutes) ## 3 Hamilton q3 1:24.303 1M 24.303S 84.303S 84.303s (~1.41 minutes) ## 4 Bottas q1 1:25.801 1M 25.801S 85.801S 85.801s (~1.43 minutes) ## 5 Bottas q2 1:25.015 1M 25.015S 85.015S 85.015s (~1.42 minutes) ``` --- # Working with `lubridate` **how do lap times change through qualifying?** ```r BritishGP %>% filter(q3 != "\\N") %>% pivot_longer(q1:q3, values_to = "time", names_to = "qround") %>% mutate(time_format = ms(time), seconds = seconds(time_format), duration = as.duration(time_format)) %>% * ggplot(aes(x = qround, y = duration, group = surname)) + geom_line() ``` --- # Working with `lubridate` <img src="figs/Lec5/lubridate-5-1.png" width="504" style="display: block; margin: auto;" /> --- # Working with `lubridate` **how do lap times change through qualifying?** ```r BritishGP %>% filter(q3 != "\\N") %>% pivot_longer(q1:q3, values_to = "time", names_to = "qround") %>% mutate(time_format = ms(time), seconds = seconds(time_format), duration = as.duration(time_format), * mercedes = ifelse(name == "Mercedes","highlight","normal")) %>% ggplot(aes(x = qround, y = duration, group = surname, size = mercedes, alpha = mercedes, color = mercedes)) + scale_color_manual(values = c("#00D2BE","black")) + scale_size_manual(values = c(1.5, 0.75)) + scale_alpha_manual(values = c(1, 0.3)) + geom_line() ``` --- # Working with `lubridate` <img src="figs/Lec5/lubridate-6-1.png" width="504" style="display: block; margin: auto;" /> --- # Working with `lubridate` ```r BritishGP %>% filter(q3 != "\\N") %>% pivot_longer(q1:q3, values_to = "time", names_to = "qround") %>% mutate(time_format = ms(time), seconds = seconds(time_format), duration = as.duration(time_format), mercedes = ifelse(name == "Mercedes","highlight","normal"), * qround = fct_recode(qround, "Qualifying 1" = "q1", * "Qualifying 2" = "q2", "Qualifying 3" = "q3"), * label = ifelse(surname == "Hamilton" & qround == "Qualifying 3", * name, NA_character_)) %>% ggplot(aes(x = qround, y = duration, group = surname, size = mercedes, alpha = mercedes, color = mercedes)) + scale_color_manual(values = c("#00D2BE","black")) + scale_size_manual(values = c(1.5, 0.75)) + scale_alpha_manual(values = c(1, 0.3)) + geom_line() + * geom_label_repel(aes(label = label), size = 4) + theme(legend.position = "none") ``` --- # Working with `lubridate` <img src="figs/Lec5/lubridate-7-1.png" width="504" style="display: block; margin: auto;" /> --- # Working with `lubridate` ```r *BritishGP_cleaned %>% ggplot(aes(x = qround, y = duration, group = surname, size = mercedes, alpha = mercedes, color = mercedes)) + scale_color_manual(values = c("#00D2BE","black")) + scale_size_manual(values = c(1.5, 0.75)) + scale_alpha_manual(values = c(1, 0.3)) + geom_line() + geom_label_repel(aes(label = label), size = 4) + * scale_y_continuous(trans = "reverse", breaks = seq(84, 87, 0.5), * labels = c("1:24.0","1:24.5","1:25.0","1:25.5", * "1:26.0","1:26.5","1:27.0")) + labs(y = "Lap Time (sec)", x = "", title = "Qualifying Times in the 2020 British Grand Prix") + theme(legend.position = "none") ``` --- # Working with `lubridate` <img src="figs/Lec5/lubridate-8-1.png" width="504" style="display: block; margin: auto;" /> --- # Working with `lubridate` **a preview of annotations** ```r BritishGP_cleaned %>% ggplot(aes(x = qround, y = duration, group = surname, size = mercedes, alpha = mercedes, color = mercedes)) + scale_color_manual(values = c("#00D2BE","black")) + scale_size_manual(values = c(1.5, 0.75)) + scale_alpha_manual(values = c(1, 0.3)) + geom_line() + geom_label_repel(aes(label = label), size = 4) + scale_y_continuous(trans = "reverse", breaks = seq(84, 87, 0.5), labels = c("1:24.0","1:24.5","1:25.0","1:25.5", "1:26.0","1:26.5","1:27.0")) + labs(y = "Lap Time (sec)", x = "", title = "Qualifying Times in the 2020 British Grand Prix") + theme(legend.position = "none") + * annotate("text", x = 0.65, y = 84.2, label = "Faster", * size = 3.5, fontface = 3) + * annotate("segment", x = 0.65, xend = 0.65, y = 84.7, yend = 84.3, * color = "black", size = 1, arrow = arrow(length = unit(2, "mm"))) ``` --- # Working with `lubridate` <img src="figs/Lec5/lubridate-9-1.png" width="504" style="display: block; margin: auto;" /> --- class: left # Today's agenda <br> .large[ 1. the basics of line graphs 2. using multiple groups, avoiding "spaghetti graphs" 3. working with `lubridate` 4. **the line graph debate** 5. slope graphs ] --- # When to use a line graph <br> .large[Your graph can be thought of as a function (e.g., temperature as a function of time)]<br> .medium[the line can be reliably used to estimate data between points] --- # When to use a line graph <img src="figs/Lec5/fever-2-1.png" width="504" style="display: block; margin: auto;" /> --- # When to use a line graph <br> .large[Your graph can be thought of as a function (e.g., temperature as a function of time)]<br> .medium[the line can be reliably used to estimate data between points]<br> <br> <br> .large[You're at the most discrete level of data and are looking for a trend]<br> .medium[the line is mostly a guide for the eye]<br> .medium[there isn't really a clearer way to display this] --- # When to use a line graph <img src="figs/Lec5/multi-10-1.png" width="504" style="display: block; margin: auto;" /> --- # When to use a line graph <img src="figs/Lec5/lubridate-9-1.png" width="504" style="display: block; margin: auto;" /> --- # What about this? <img src="figs/Lec5/unnamed-chunk-2-1.png" width="504" style="display: block; margin: auto;" /> --- # The line graph debate **how to view the number of total points for two constructors, 2015 to 2020?** ```r constructor_pts %>% filter(name %in% c("Mercedes","Red Bull") & year != 2021) %>% group_by(year, name) %>% summarize(points = max(points), rounds = max(round)) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> year </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> name </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> points </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> rounds </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 2015 </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 703 </td> <td style="text-align:right;"> 19 </td> </tr> <tr> <td style="text-align:right;"> 2015 </td> <td style="text-align:left;"> Red Bull </td> <td style="text-align:right;"> 187 </td> <td style="text-align:right;"> 19 </td> </tr> <tr> <td style="text-align:right;"> 2016 </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 765 </td> <td style="text-align:right;"> 21 </td> </tr> <tr> <td style="text-align:right;"> 2016 </td> <td style="text-align:left;"> Red Bull </td> <td style="text-align:right;"> 468 </td> <td style="text-align:right;"> 21 </td> </tr> <tr> <td style="text-align:right;"> 2017 </td> <td style="text-align:left;"> Mercedes </td> <td style="text-align:right;"> 668 </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 2017 </td> <td style="text-align:left;"> Red Bull </td> <td style="text-align:right;"> 368 </td> <td style="text-align:right;"> 20 </td> </tr> </tbody> </table> --- # The line graph debate **how to view the number of total points for two constructors, 2015 to 2020?** ```r constructor_pts %>% filter(name %in% c("Mercedes","Red Bull") & year != 2021) %>% group_by(year, name) %>% summarize(points = max(points), rounds = max(round)) %>% * mutate(axis = paste0(year, "\n", rounds, " races"), * label = ifelse(year == 2015, name, NA_character_)) %>% ggplot(aes(x = axis, y = points, color = name, group = name)) + geom_line() + scale_color_manual(values = c("#00D2BE","#1E41FF")) + * geom_label_repel(aes(label = label)) + theme(legend.position = "none") + labs(x = "", y = "Accumulated Points", title = "Total points for Mercedes & Red Bull, 2015 to 2020") ``` --- # The line graph debate <img src="figs/Lec5/debate-3-1.png" width="504" style="display: block; margin: auto;" /> --- # The line graph debate <br> .large[A line between two points implies the presence of data—is this a problem?]<br> .medium[Does the data need to be measured nearly continuously for the lines to be logical interpretations?]<br> .medium[Or are we used to lines representing time?]<br> <br> -- <br> .large[My view (only one opinion!)]<br> .medium[Be very careful as to what your lines are representing]<br> .medium[Consider the time between measurements and the total # of measurements]<br> .medium[If each value "fills up from zero" at each interval, consider a bar chart instead]<br> --- # The line graph debate **how to view the number of total points for two constructors, 2015 to 2020?** ```r constructor_pts %>% filter(name %in% c("Mercedes","Red Bull") & year != 2021) %>% group_by(year, name) %>% summarize(points = max(points), rounds = max(round)) %>% mutate(axis = paste0(year, "\n", rounds, " races"), label = ifelse(year == 2015, name, NA_character_)) %>% ggplot(aes(x = axis, y = points, fill = name)) + * geom_bar(position = "dodge", stat = "identity") + scale_fill_manual(values = c("#00D2BE","#1E41FF")) + geom_label_repel(aes(label = label)) + theme(legend.position = "none") + labs(x = "", y = "Accumulated Points") ``` --- # The line graph debate <img src="figs/Lec5/debate-4-1.png" width="504" style="display: block; margin: auto;" /> --- class: left # Today's agenda <br> .large[ 1. the basics of line graphs 2. using multiple groups, avoiding "spaghetti graphs" 3. working with `lubridate` 4. the line graph debate 5. **slope graphs** ] --- # Slope graphs <br> <br> .large[Another option to compare groups over time]<br> .medium[(only two points in time, though)] --- # Today's data <br> .center[`driver_pts`]<br> .center[**how did points change for drivers who switched teams from 2018 to 2019?**] <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> surname </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> date </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> round </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-03-17 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-03-31 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 26 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-04-14 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 36 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-04-28 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 47 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-05-12 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 57 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-05-26 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 57 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-06-09 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 72 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-06-23 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 87 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-06-30 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 105 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-07-14 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 120 </td> </tr> </tbody> </table> --- # Date to year ```r driver_pts %>% mutate(year = lubridate::year(date)) ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> surname </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> date </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> year </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> round </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-03-17 </td> <td style="text-align:right;"> 2019 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-03-31 </td> <td style="text-align:right;"> 2019 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 26 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-04-14 </td> <td style="text-align:right;"> 2019 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 36 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-04-28 </td> <td style="text-align:right;"> 2019 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 47 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-05-12 </td> <td style="text-align:right;"> 2019 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 57 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-05-26 </td> <td style="text-align:right;"> 2019 </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 57 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-06-09 </td> <td style="text-align:right;"> 2019 </td> <td style="text-align:right;"> 7 </td> <td style="text-align:right;"> 72 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-06-23 </td> <td style="text-align:right;"> 2019 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 87 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-06-30 </td> <td style="text-align:right;"> 2019 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 105 </td> </tr> <tr> <td style="text-align:left;"> Leclerc </td> <td style="text-align:left;"> 2019-07-14 </td> <td style="text-align:right;"> 2019 </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 120 </td> </tr> </tbody> </table> --- # Slope graphs **how did points change for drivers who switched teams?** ```r driver_pts %>% mutate(year = lubridate::year(date)) %>% filter(surname %in% c("Sainz","Gasly","Leclerc","Ricciardo")) %>% group_by(year, surname) %>% summarize(points = max(points)) %>% * ggplot(aes(x = year, y = points, group = surname)) + * geom_line(aes(color = surname), size = 2) + * geom_point(aes(color = surname), size = 4) ``` --- # Slope graphs <img src="figs/Lec5/slope-1-1.png" width="504" style="display: block; margin: auto;" /> --- # Slope graphs **how did points change for drivers who switched teams?** ```r driver_pts %>% mutate(year = lubridate::year(date)) %>% filter(surname %in% c("Sainz","Gasly","Leclerc","Ricciardo")) %>% group_by(year, surname) %>% summarize(points = max(points)) %>% * ggplot(aes(x = as.character(year), y = points, group = surname)) + geom_line(aes(color = surname), size = 2) + geom_point(aes(color = surname), size = 4) + * scale_y_continuous(breaks = seq(25, 300, 25)) ``` --- # Slope graphs <img src="figs/Lec5/slope-2-1.png" width="504" style="display: block; margin: auto;" /> --- # Slope graphs **how did points change for drivers who switched teams?** ```r driver_pts %>% mutate(year = lubridate::year(date)) %>% filter(surname %in% c("Sainz","Gasly","Leclerc","Ricciardo")) %>% group_by(year, surname) %>% summarize(points = max(points)) %>% * mutate(label = ifelse(year == 2019, surname, NA_character_)) %>% ggplot(aes(x = as.character(year), y = points, group = surname)) + geom_line(aes(color = surname), size = 2) + geom_point(aes(color = surname), size = 4) + scale_y_continuous(breaks = seq(25, 300, 25)) + * geom_label_repel(aes(label = label), size = 4) + * theme(legend.position = "none") ``` --- # Slope graphs <img src="figs/Lec5/slope-3-1.png" width="504" style="display: block; margin: auto;" /> --- # Slope graphs **how did points change for drivers who switched teams?** ```r driver_pts %>% mutate(year = lubridate::year(date)) %>% filter(surname %in% c("Sainz","Gasly","Leclerc","Ricciardo")) %>% group_by(year, surname) %>% summarize(points = max(points)) %>% mutate(label = ifelse(year == 2019, surname, NA_character_)) %>% ggplot(aes(x = as.character(year), y = points, group = surname)) + geom_line(aes(color = surname), size = 2) + geom_point(aes(color = surname), size = 4) + scale_y_continuous(breaks = seq(25, 300, 25)) + geom_label_repel(aes(label = label), size = 4) + theme(legend.position = "none") + labs(y = "Total Points", x = "", title = "Change in points, drivers who switched teams in 2019") ``` --- # Slope graphs <img src="figs/Lec5/slope-4-1.png" width="504" style="display: block; margin: auto;" /> --- # Upcoming <br> .large[Graphic critique due before midterm] <br> .medium[Details on syllabus] <br> .large[Lab 4 on Thursday June 3] <br> .large[Lecture 6 on Friday June 4]