Chapter 10 Synthesis

10.1 Summary

In this session, we’ll pull together the skills that we’ve learned so far. We’ll create a new GitHub repo and R project, wrangle and visualize data from spreadsheets in R Markdown, communicate between RStudio (locally) and GitHub (remotely) to keep our updates safe, then share our outputs in a nicely formatted GitHub ReadMe. And we’ll learn a few new things along the way!

Grolemund & Wickham R4DS Illustration

Grolemund & Wickham R4DS Illustration

10.2 Objectives

  • Create a new repo on GitHub
  • Start a new R project, connected to the repo
  • Create a new R Markdown document
  • Attach necessary packages (googlesheets4, tidyverse, here)
  • Use here::here() for simpler (and safer) file paths
  • Read in data from a Google sheet with the googlesheets4 package in R
  • Basic data wrangling (dplyr, tidyr, etc.)
  • Data visualization (ggplot2)
  • Publish with a useful ReadMe to share

10.3 Resources

10.4 Lesson

10.4.1 Set-up:

  • Log in to your GitHub account and create a new repository called sea-creature-synthesis
  • Clone the repo to create a version controlled project (remember, copy & paste the URL from the GitHub Clone / Download)
  • In the local project folder, create a subfolder called ‘data’
  • Copy and paste the fish_counts_curated.csv and lobster_counts.csv into the ‘data’ subfolder
  • Create a new R Markdown document within your sea-creature-synthesis project
  • Knit your .Rmd to html, saving as sb_sea_creatures.Rmd

10.4.2 Attach packages and read in the data

Attach (load) packages with library():

Now we’ll read in our files with readr::read_csv(), but our files aren’t in our project root. They’re in the data subfolder.

Use here::here() to direct R where to look for files, if they’re not in the project root. Not sure where that is? Type here() in the Console, and it will tell you!

"/returns/your/project/root/"

Go ahead, find your project root!

Then use here::here() again to easily locate a file somewhere outside of the exact project root. In our case, the files we want to read in are in the data subfolder - so we have to tell R how to get there from the root:

Check out the two data frames (fish_counts and lobster_counts).

The fish_counts data frame is in pretty good shape. But the lobster_counts df could use some love, because there are “-99999” entries indicating NA values, and the column names would be difficult to write code with.

When reading in the lobster data, let’s:

  • convert every “-99999” to an NA
  • get the column names into lower snake case using janitor::clean_names()

Look at it again to check (always look at your data) - now both data frames seem pretty coder-friendly to work with.

10.4.3 Data wrangling

  • join?
  • filter?
  • unite/separate

  • Read in lobster data
  • Join with another existing data frame (or 2?)
  • Pivoting
  • Transforming / subsetting
  • Grouping & summarizing (for means, sd, count)
  • Make a table
  • Make a graph

Possible new things: complete()

10.5 Fun facts (quirky things) - making a note of these wherever possible for interest (little “Did you know?” sections)

10.6 Interludes (deep thoughts/openscapes)

10.7 Efficiency Tips