2 R Project Setup

TipLearning Objectives
  • Practice creating an R Project
  • Organize an R Project for effective project management
  • Understand how to move in an R Project using paths and working directories

1 Create an R Project

An R project is tied to a directory on your local computer, and makes organizing your work and collaborating with others easier.

The Big Idea: using an R project is a reproducible research best practice because it bundles all your work within a working directory. Consider your current data analysis workflow. Where do you import you data? Where do you clean and wrangle it? Where do you create graphs, and ultimately, a final report? Are you going back and forth between multiple software tools like Microsoft Excel, R, and Google Docs? Using an R project (and the tools in R) will consolidate this process because it can all be done (and updated) in using one software tool, RStudio, and within one R project.

TipR Project Setup
  1. In the “File” menu, select “New Project”
  2. Click “New Directory”
  3. Click “New Project”
  4. Under “Directory name” type: training_{USERNAME} (i.e. training_lopazanski)
  5. Leave “Create Project as subdirectory of:” set to the location where you want to save the project on your computer.
  6. Click “Create Project”

RStudio should open your new project automatically after creating it. One way to check this is by looking at the top right corner and checking for the project name.

2 Organizing an R Project

The next step is to populate that project with relevant directories. There are many tools that can do this automatically (e.g., rrtools or usethis::create_package()). The goal is to organize your project so that it is a compendium of your research. This means that the project has all of the digital parts needed to replicate your analysis, like code, figures, the manuscript, and data.

Some common directories are:

  • data: where we store data (often contains subdirectories for raw, processed, and metadata data)
  • scripts: has all scripts where you clean and wrangle data and run your analysis
  • plots or figs: generated plots, graphs, and figures
  • docs: summaries or reports of analysis or other relevant project information

Directory organization varies from project to project, but the goal is to create a well-organized project for both reproducibility and collaboration.

TipProject Sub-directories

For this series we will create 3 folders (directories) in our training_{USERNAME} Rproject.

In the files pane in RStudio (bottom right), click on the New Folder button (with a green circle and plus sign) and create 3 new folders: data, plots, scripts.

3 Moving in an R Project using Paths & Working Directories

Artwork by Allison Horst. A cartoon of a cracked glass cube looking frustrated with casts on its arm and leg, with bandaids on it, containing “setwd,” looks on at a metal riveted cube labeled “R Proj” holding a skateboard looking sympathetic, and a smaller cube with a helmet on labeled “here” doing a trick on a skateboard.

Now that we have a project created (we know it’s an R Project because we see a .Rproj file in our Files pane), we can move files within that project. We do this using paths.

There are two types of paths in computing: absolute paths and relative paths.

  • Absolute paths always starts with the root of your file system and locates files from there. The absolute path to my project directory is: /home/lopazanski/github/training_lopazanski

  • Relative paths start from some location in your file system that is below the root. Relative paths are combined with the path of that location to locate files on your system. R (and some other languages like MATLAB) refer to the location where the relative path starts as our working directory.

RStudio projects automatically set the working directory to the directory of the project. This means that you can reference files from within the project without worrying about where the project directory itself is. If I want to read in a file from the data directory within my project, I can simply type read.csv("data/samples.csv") as opposed to read.csv("/home/lopazanski/github/training_lopazanski/data/samples.csv").

This is not only convenient for you, but also when working collaboratively. We will talk more about this later, but if Joe makes a copy of my R project that I have published on GitHub, and I am using relative paths, he can run my code exactly as I have written it, without going back and changing /home/lopazanski/github/training_lopazanski/data/samples.csv to /home/morton/github/training_morton/data/samples.csv.

Note that once you start working in projects you should basically never need to run the setwd() command. If you are in the habit of doing this, stop and take a look at where and why you do it. Could Leveraging the working directory concept of R projects could likely eliminate this need.

Similarly, think about how you work with absolute paths. You could likely leverage the working directory of your R project to replace these with relative paths and make your code more portable.