class: inverse, center, middle # 36-315: Statistical Graphics and Visualization ## Lecture 1 Meghan Hall <br> Department of Statistics & Data Science <br> Carnegie Mellon University <br> May 21, 2021 --- layout: true <div class="my-footer"><span>cmu-36315.netlify.app</span></div> --- # Teaching team .large[Instructor: Meghan Hall] <br> .medium[Grad TA: Galen Vincent] .medium[Undergrad TAs] <br> <br> Office hours TBD --- # Course objectives 1. Create statistical graphics. <br> <br> -- 2. Understand the fundamentals of data and reproducible data analysis. <br> <br> -- 3. Write about statistical graphics. <br> <br> -- 4. Speak about statistical graphics and data analyses. <br> <br> -- 5. Assess and critique statistical graphics. --- # Course tools <br> <br> .medium[R & RStudio] <br> <br> -- .medium[`ggplot2` and related packages] <br> <br> -- .medium[R Markdown] <br> <br> --- # Course components <br> <br> <br> .center[![syllabus snippet](figs/Lec1/syllabus_snip.png)] --- # Course components <br> <br> Lectures -- Labs -- Homework -- Code style --- # Code style <br> <br> .large[Code must be written with the [tidyverse style guide](https://style.tidyverse.org/)] <br> <br> .medium[Ignore section II, focus on I.2, I.4, I.5] .medium[It will match lecture notes, lab notes, etc.] .medium[You can use the R package [styler](https://styler.r-lib.org/) if you want] --- # Course components <br> <br> Lectures Labs Homework Code style -- Graphics discussion -- Midterm -- Group project --- # Course components <img src="figs/Lec1/component-graph-1.png" width="504" style="display: block; margin: auto;" /> --- # Various logistics <br> <br> Course website(s) -- Piazza -- Communication -- Office hours -- Extensions -- Regrades -- Integrity --- # Why do we visualize data? <br> <br> <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;text-align: center;"> x </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;text-align: center;"> y </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 55.3846 </td> <td style="text-align:right;"> 97.1795 </td> </tr> <tr> <td style="text-align:right;"> 51.5385 </td> <td style="text-align:right;"> 96.0256 </td> </tr> <tr> <td style="text-align:right;"> 46.1538 </td> <td style="text-align:right;"> 94.4872 </td> </tr> <tr> <td style="text-align:right;"> 42.8205 </td> <td style="text-align:right;"> 91.4103 </td> </tr> <tr> <td style="text-align:right;"> 40.7692 </td> <td style="text-align:right;"> 88.3333 </td> </tr> <tr> <td style="text-align:right;"> 38.7179 </td> <td style="text-align:right;"> 84.8718 </td> </tr> <tr> <td style="text-align:right;"> 35.6410 </td> <td style="text-align:right;"> 79.8718 </td> </tr> <tr> <td style="text-align:right;"> 33.0769 </td> <td style="text-align:right;"> 77.5641 </td> </tr> <tr> <td style="text-align:right;"> 28.9744 </td> <td style="text-align:right;"> 74.4872 </td> </tr> <tr> <td style="text-align:right;"> 26.1538 </td> <td style="text-align:right;"> 71.4103 </td> </tr> </tbody> </table> --- # Why do we visualize data? <br> <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;text-align: center;"> Mean of x </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;text-align: center;"> Mean of y </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 54.26327 </td> <td style="text-align:right;"> 47.83225 </td> </tr> </tbody> </table> -- <br> <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;text-align: center;"> SD of x </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;text-align: center;"> SD of y </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 16.76514 </td> <td style="text-align:right;"> 26.9354 </td> </tr> </tbody> </table> -- <br> <table class="table" style="font-size: 16px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: white !important;background-color: #bb0000 !important;text-align: center;"> Variable </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;text-align: center;"> Min </th> <th style="text-align:right;font-weight: bold;color: white !important;background-color: #bb0000 !important;text-align: center;"> Max </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> x </td> <td style="text-align:right;"> 22.3077 </td> <td style="text-align:right;"> 98.2051 </td> </tr> <tr> <td style="text-align:left;"> y </td> <td style="text-align:right;"> 2.9487 </td> <td style="text-align:right;"> 99.4872 </td> </tr> </tbody> </table> --- # Why do we visualize data? <img src="figs/Lec1/dino-graph-1.png" width="504" style="display: block; margin: auto;" /> --- # Why do we visualize data? <br> <br> <br> .large[Explore] <br> <br> <br> .large[Diagnose] <br> <br> <br> .large[Explain] --- # By the end of the class <br> <br> .large[You can...] <br> <br> .medium[Ask relevant questions from data] <br> <br> .medium[Know which types of visualizations are appropriate for your data] <br> <br> .medium[Know which types of visualizations are appropriate for your *audience*] <br> <br> .medium[Create plots that are] <br> <br> Effective in their properties <br> <br> Elegant & aesthetically-pleasing --- # Some golden rules of graphs <br> <br> Don’t add complexity without a good reason. -- Everything (everything!) must be readable. -- Don’t distort data, intentionally or not. -- Be mindful of the data-to-ink ratio. -- All axes, labels, etc. should have real titles, not code variable names. -- Always strive for clarity. -- Titles, subtitles, and captions should add information. --- # Upcoming <br> .large[Lecture 2 on Monday May 24] <br> .medium[grammar of graphics and tidyverse principles] <br> .large[Lab 1 on Tuesday May 25] <br> .medium[be on the lookout for a survey about times]