- Definitions and tools
October 21, 2016, Hopkins Marine Station, Stanford University
Definitions and tools
My programming origin story
Definitions and tools
My programming origin story
Definitions and tools
My programming origin story
Data science:
"an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge" (Grolemund & Wickham 2016)
Data science:
"an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge" (Grolemund & Wickham 2016)
Data science:
"an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge" (Grolemund & Wickham 2016)
Open science:
"the concept of transparency at all stages of the research process, coupled with free and open access to data, code, and papers" (Hampton et al. 2014)
this talk: jules32.github.io/opensci-talk
Photo credit: Greg Auger
Science:
Data science:
* except for wonderful programming mentors:
Steve Haddock, Dave Foley, Ashley Booth
method to categorize benefits that oceans provide to people
scores are modeled using existing data; data intensive
method to categorize benefits that oceans provide to people
scores are modeled using existing data; data intensive
method can be tailored to different geographies
can help inform policy decisions, especially when repeated
2012: OHI method and first global assessment (Halpern et al. 2012)
2013: second annual global assessment
2013: second annual global assessment
We expected to easily reproduce our previous work. We had planned ahead:
We struggled to reproduce our work using standard approaches
We struggled to reproduce our work using standard approaches
…mainly due to our approaches to data preparation (data science)
We struggled to reproduce our work using standard approaches
…mainly due to our approaches to data preparation (data science)
Additional challenge of managing multiple years
Lowndes et al. Improving reproducibility, collaboration, and communication in environmental science using open science tools, in prep
"Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in the mundane labor of collecting and preparing data, before it can be explored for useful information." - NYTimes (2014)
Before
After
Philosophy of data wrangling
Grolemund & Wickham 2016: R for Data Science
"For scientist coders, [Git] works like a laboratory notebook for scientific computing…it keeps a lasting record of events." - Nature 2016
"For scientist coders, [Git] works like a laboratory notebook for scientific computing…it keeps a lasting record of events." - Nature 2016
Before
final.csv
and final_JL-2016-08-05.csv
After
git
Demo: link
Before
After
Demo link (private)
Before
After
Demo: github.com/OHI-Science
Demo: OHI-Science.org
These tools and this workflow make our science possible.
These tools and this workflow make our science possible.
All on ohi-science.org
1. Learn to code
   - in R
   - with RStudio
2. Use version control
   - git
   - with GitHub
   - through RStudio
Introduce these concepts incrementally
Books, trainings, and webinars:
Recent academic publications:
Join existing and create new communities - locally and online
THANK YOU
to my team, colleagues, #rstats community
email: lowndes @nceas.ucsb.edu
twitter: @juliesquid
website: https://jules32.github.io
talk url: https://jules32.github.io/opensci-talk
15-minute version of this talk at WSN:
Friday, Nov 11, Session 7, 3pm
NCEAS is hiring: nceas.ucsb.edu/positionsopen