- Practice using common cleaning and wrangling functions
- Practice creating plots using common visualization functions in
ggplot - Practice saving and sharing data visualizations
- Practice Git and GitHub workflow and collaborating with a collegue
These exercises are adapted from Allison Horst’s EDS 221: Scientific Programming Essentials Course for the Bren School’s Master of Environmental Data Science program.
About the data
These exercises will be using data on abundance, size, and trap counts (fishing pressure) of California spiny lobster (Panulirus interruptus) and were collected along the mainland coast of the Santa Barbara Channel by Santa Barbara Coastal LTER researchers [CITATION].
Your task: Collaborate on an analysis and create a report to publish using GitHub Pages.
- Create a new repository with a partner
- Determine who is the Owner and who is the Collaborator
- The Owner creates a repository on GitHub titled with both your names (i.e. If Casey and Camila were partners, and Casey is the Owner, she would create a repo called
casey-camila)- When creating the repository, add a brief description (i.e. R Practice Session: Collaborating on, Wrangling & Visualizing Data), keep the repo Public, and Initialize the repo with a
READMEfile and an R.gitignoretemplate.
- When creating the repository, add a brief description (i.e. R Practice Session: Collaborating on, Wrangling & Visualizing Data), keep the repo Public, and Initialize the repo with a
- The Owner adds the Collaborator to the repo
- Both the Collaborator and the Owner clone the repo into their RStudio
**Step 2 and Step 3 are meant to be completed at the same time. - Collaborator completes Step 2 - Owner completes Step 3
- Collaborator creates new files for exercise
- The Collaborator creates the following directory (folder):
analysis
- After creating the directories, create the following Quarto Documents and store them in the listed folders:
- Title it: “Owner Analysis”, save it as:
owner-analysis.qmd, and store inanalysisfolder - Title it: “Collaborator Analysis”, save it as:
collaborator-analysis.qmd, and store inanalysisfolder - Title it: “Lobster Report” and save it as:
lobster-report.qmdand store inanalysisfolder
- Title it: “Owner Analysis”, save it as:
- After creating the files, the Collaborator will
stage (add),commit, write a commit message,pull, andpushthe files to the remote repository (on GitHub) - The Owner
pulls the changes and Quarto Documents into their local repository (their workspace)
- The Collaborator creates the following directory (folder):
- Owner downloads data from the EDI Data Portal SBC LTER: Reef: Abundance, size and fishing effort for California Spiny Lobster (Panulirus interruptus), ongoing since 2012.
- Create two new directories one called
dataand one calledfigs- Note: Git does not track empty directories, so you won’t see
figswhen you push to GitHub
- Note: Git does not track empty directories, so you won’t see
- Download the following data and upload them to the
datafolder:- Time-series of lobster abundance and size
- Time-series of lobster trap buoy counts
- After creating the
datafolder and adding the data, the Owner willstage (add),commit, write a commit message,pull, andpushthe files to the remote repository (on GitHub) - The Collaborator
pulls the changes and data into their local repository (their workspace)
- Create two new directories one called
1 Explore, clean and wrangle data
For this portion of the exercise, the - Owner will be working with the lobster abundance and size data - Collaborator will be working with the lobster trap buoy counts data
Questions 1-3 you will be working independently since you’re working with different data frames, but you’re welcome to check in with each other.
- Open the Quarto Document
owner-analysis.qmd- Check the
YAMLand add your name to theauthorfield - Create a new section with a level 2 header and title it “Exercise: Explore, Clean, and Wrangle Data”
- Check the
- Load the following libraries at the top of your Quarto Document
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)- Read in the data and store the data frame as
lobster_abundance
lobster_abundance <- read_csv(here::here("data/Lobster_Abundance_All_Years_20220829.csv"))Look at your data. Take a minute to explore what your data structure looks like, what data types are in the data frame, or use a function to get a high-level summary of the data you’re working with.
Use the Git workflow:
Stage (add) -> Commit -> Pull -> Push- Note: You also want to
Pullwhen you first open a project
- Note: You also want to
1.0.1 Convert missing values using mutate() and na_if()
1.0.2 filter() practice
After you’ve completed the exercises or reached a significant stopping point, use the workflow: Stage (add) -> Commit -> Pull -> Push
- Open the Quarto Document
collaborator-analysis.qmd- Check the
YAMLand add your name to theauthorfield - Create a new section with a level 2 header and title it “Exercise: Explore, Clean, and Wrangle Data”
- Check the
- Load the following libraries at the top of your Quarto Document.
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)- Read in the data and store the data frame as
lobster_traps
lobster_traps <- read_csv(here::here("data/Lobster_Trap_Counts_All_Years_20210519.csv"))Look at your data. Take a minute to explore what your data structure looks like, what data types are in the data frame, or use a function to get a high-level summary of the data you’re working with.
Use the Git workflow:
Stage (add) -> Commit -> Pull -> Push- Note: You also want to
Pullwhen you first open a project
- Note: You also want to
1.0.3 Convert missing values using mutate() and na_if()
1.0.4 filter() practice
After you’ve completed the exercises or reached a significant stopping point, use the workflow: Stage (add) -> Commit -> Pull -> Push
2 Create visually appealing and informative data visualization
- Stay in the Quarto Document
owner-analysis.qmdand create a new section with a level 2 header and title it “Exercise: Data Visualization”
Structure of the data visualization exercises:
In this section, you will first have you create the necessary subsets to create the data visualizations, as well as the basic code to create a visualization.
The next step is to return to the data visualization code you’ve written and add styling code to it. For this exercise, only add styling code to the visualization you want to include in the
lobster-report.qmd(start with just one plot and if there’s time add styling code to another plot).Lastly, save the final visualizations to the
figsfolder before collaborating on thelobster-report.qmd.
## Run this chunk to test exercises ##
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
lobster_abundance <- read_csv("https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-sbc.77.8&entityid=f32823fba432f58f66c06b589b7efac6") %>%
mutate(SIZE_MM = na_if(SIZE_MM, -99999))After you’ve completed the exercises or reached a significant stopping point, use the workflow: Stage (add) -> Commit -> Pull -> Push
- Stay in the Quarto Document
collaborator-analysis.qmdand create a new section with a level 2 header and title it “Exercise: Data Visualization”
Structure of the data visualization exercises:
- First you will create the necessary subsets to create the data visualizations, as well as the basic code to create a visualization.
- Then, you will return to the data visualization code you’ve written and add styling code to it. For this exercise, only add styling code to the visualization you want to include in the
lobster-report.qmd(start with just one plot and if there’s time add styling code to another plot). - Lastly, save the final visualizations to the
figsfolder before collaborating on thelobster-report.qmd.
## Run this chunk to test exercises ##
library(readr)
library(dplyr)
library(ggplot2)
library(tidyr)
lobster_traps <- read_csv("https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-sbc.77.8&entityid=66dd61c75bda17c23a3bce458c56ed84") %>%
mutate(TRAPS = na_if(TRAPS, -99999))After you’ve completed the exercises or reached a significant stopping point, use the workflow: Stage (add) -> Commit -> Pull -> Push
3 Collaborate on a report and publish using GitHub pages
The final step! Time to work together again. Collaborate with your partner in lobster-report.qmd to create a report to publish to GitHub pages.
As you’re working on the lobster-report.qmd you will be conducting two types of code reviews: (1) pair programming and (2) lightweight code review.
Pair programming is where two people develop code together at the same workstation. One person is the “driver” and one person is the “navigator”. The driver writes the code while the navigator observes the code being typed, points out any immediate quick fixes, and will also Google / troubleshoot if errors occur. Both the Owner and the Collaborator should experience both roles, so switch halfway through or at a meaningful stopping point.
A lightweight code review is brief and you will be giving feedback on code readability and code logic as you’re adding Owner and Collaborator code from their respective
analysis.qmds to thelobster-report.qmd. Think of it as a walk through of your the code for the data visualizations you plan to include in the report (this includes the code you wrote to create the subset for the plot and the code to create the plot) and give quick feedback.
Make sure your Quarto Document is well organized and includes the following elements:
- citation of the data
- brief summary of the abstract (i.e. 1-2 sentences) from the EDI Portal
- Owner analysis and visualizations (you choose which plots you want to include)
- Try adding alternative text to your plots (See Quarto Documentation)
- Plots can be added either with the data visualization code or with Markdown syntax (calling a saved image) - it’s up to you if you want to include the code or not.
- Collaborator analysis and visualizations (you choose which plots you want to include)
- Try adding alternative text to your plots (See Quarto Documentation)
- plots can be added either with the data visualization code or with Markdown syntax (calling a saved image) - it’s up to you if you want to include the code or not.
Finally, publish on GitHub pages (from Owner’s repository). Refer back to Chapter 12 for steps on how to publish using GitHub pages.
4 Bonus: Add marine protected area (MPA) designation to the data
The sites IVEE and NAPL are marine protected areas (MPAs). Add this designation to your data set using a new function called case_when(). Then create some new plots using this new variable. Does it change how you think about the data? What new plots or analysis can you do with this new variable?
Use the object lobster_abundance and add a new column called DESIGNATION that contains “MPA” if the site is IVEE or NAPL, and “not MPA” for all other values.
lobster_mpa <- lobster_abundance %>%
mutate(DESIGNATION = case_when(
SITE %in% c("IVEE", "NAPL") ~ "MPA",
SITE %in% c("AQUE", "CARP", "MOHK") ~ "not MPA"
))
Use the object lobster_traps and add a new column called DESIGNATION that contains “MPA” if the site is IVEE or NAPL, and “not MPA” for all other values.
lobster_traps_mpa <- lobster_traps %>%
mutate(DESIGNATION = case_when(
SITE %in% c("IVEE", "NAPL") ~ "MPA",
SITE %in% c("AQUE", "CARP", "MOHK") ~ "not MPA"
))







