class: inverse, center, middle # 36-315: Statistical Graphics and Visualization ## Lab 4 Meghan Hall <br> Department of Statistics & Data Science <br> Carnegie Mellon University <br> June 3, 2021 --- layout: true <div class="my-footer"><span>cmu-36315.netlify.app</span></div> --- # Today <br> .large[Creating line graphs] <br> .medium[pivoting data] <br> .medium[labeling with `ggrepel`] <br> .medium[using functions from `lubridate`] <br> <br> .large[Instructions] <br> .medium[HW: describe the graphs] <br> .medium[HW/labs: add plot title and relevant axis titles (include units!)] --- # `pivot_longer` **need to pivot when your data is at a different observation level than necessary for analysis/visualization**<br> data here is by track, we need observations per week <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> artist </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> track </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> wk1 </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> wk2 </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> wk3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Dixie Chicks, The </td> <td style="text-align:left;"> Cold Day In July </td> <td style="text-align:right;"> 80 </td> <td style="text-align:right;"> 79 </td> <td style="text-align:right;"> 76 </td> </tr> <tr> <td style="text-align:left;"> Dixie Chicks, The </td> <td style="text-align:left;"> Cowboy Take Me Away </td> <td style="text-align:right;"> 79 </td> <td style="text-align:right;"> 72 </td> <td style="text-align:right;"> 70 </td> </tr> <tr> <td style="text-align:left;"> Dixie Chicks, The </td> <td style="text-align:left;"> Goodbye Earl </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 29 </td> <td style="text-align:right;"> 24 </td> </tr> <tr> <td style="text-align:left;"> Dixie Chicks, The </td> <td style="text-align:left;"> Without You </td> <td style="text-align:right;"> 80 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 63 </td> </tr> </tbody> </table> --- # `pivot_longer` **need to pivot when your data is at a different observation level than necessary for analysis/visualization**<br> data here is by track, we need observations per week ```r billboard %>% filter(artist == "Dixie Chicks, The") %>% select(artist, track, wk1:wk3) %>% * pivot_longer(wk1:wk3, names_to = "week", values_to = "ranking") ``` -- <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> artist </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> track </th> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> week </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;"> ranking </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Dixie Chicks, The </td> <td style="text-align:left;"> Cold Day In July </td> <td style="text-align:left;"> wk1 </td> <td style="text-align:right;"> 80 </td> </tr> <tr> <td style="text-align:left;"> Dixie Chicks, The </td> <td style="text-align:left;"> Cold Day In July </td> <td style="text-align:left;"> wk2 </td> <td style="text-align:right;"> 79 </td> </tr> <tr> <td style="text-align:left;"> Dixie Chicks, The </td> <td style="text-align:left;"> Cold Day In July </td> <td style="text-align:left;"> wk3 </td> <td style="text-align:right;"> 76 </td> </tr> <tr> <td style="text-align:left;"> Dixie Chicks, The </td> <td style="text-align:left;"> Cowboy Take Me Away </td> <td style="text-align:left;"> wk1 </td> <td style="text-align:right;"> 79 </td> </tr> <tr> <td style="text-align:left;"> Dixie Chicks, The </td> <td style="text-align:left;"> Cowboy Take Me Away </td> <td style="text-align:left;"> wk2 </td> <td style="text-align:right;"> 72 </td> </tr> <tr> <td style="text-align:left;"> Dixie Chicks, The </td> <td style="text-align:left;"> Cowboy Take Me Away </td> <td style="text-align:left;"> wk3 </td> <td style="text-align:right;"> 70 </td> </tr> </tbody> </table> --- # Options for relabeling **`fct_recode` is useful when you have a factor and ordering is important** ```r # from Wednesday's lecture mutate(qround = fct_recode(qround, "Qualifying 1" = "q1", "Qualifying 2" = "q2", "Qualifying 3" = "q3") ``` -- **`case_when` is generally easier when you have a standard categorical variable** ```r # example from mpg data set mutate(drv = case_when(drv == "f" ~ "front-wheel drive", drv == "r" ~ "rear-wheel drive", drv == "4" ~ "four-wheel drive")) ``` --- # Dealing with dates **functions from `lubridate`** ```r # from Wednesday's lecture mutate(time_format = ms(time), seconds = seconds(time_format)) # from today's lab mutate(date = ym(paste(year, month))) ``` -- **special scale when axis is a date** ```r scale_x_date(date_breaks = "1 year", date_labels = "%m-%Y") ``` -- <img src="figs/Lab1/date-3-1.png" width="504" style="display: block; margin: auto;" /> --- # Upcoming <br> .large[Lab assignment due 11:30am EDT Friday!] <br> .medium[Ask questions on Piazza if they don't get answered here] <br> .large[Lecture 6 on Friday June 3] <br> .medium[Scatter plots] <br> .large[Homework 2 due 11:30am EDT Tuesday] <br> .medium[Posted now]