Woods Hole Oceanographic Institute

Instructors and helpers


Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in the mundane labor of collecting and preparing data, before it can be explored for useful information. - NYTimes (2014)

What to expect

This is going to be a fun workshop.

It will introduce you to open data science so you can work with data in an open, reproducible, and collaborative way. The plan is to expose you to a lot of great tools that you can have confidence using in your research. You’ll be working hands-on and doing the same things on your own computer as we do live on up on the screen. We’re going to go through a lot in these two days and it is hard to remember it all at once, but you’ll know you can do it and know where to look for help as you go forward with your analyses. Googling is a big part of coding!

In this workshop we’ll be talking about:

Workshop materials

Data science workflow

The tidy data workflow will help you think deliberately about data and your analyses. In our workshop we will be focusing Tidy, Transform, and Visualise.

This graphic is from Wickham & Grolemund’s R for Data Science, which is a must-read (read it for free online or order a hardcopy from Amazon). This is a way to think deliberately and reproducibly about they way you work with data.

By the end of the course

You’ll have hands-on experience with a reproducible workflow involving data wrangling, and visualization collaboratively in R. You’ll see how Git and GitHub facilitates collaboratation with your future self and others (no more ‘my_script_v2_Aug_17.R’) and how you can publish dynamic documents online through your GitHub account. It’s going to be great!