Creating R Packages

LearningLearning Objectives

After completing this session, you will be able to:

  • Discuss the advantages of creating your own R package to organize code
  • Identify simple techniques for creating an R package
  • Create CRAN-compliant documentation for functions in a package
  • Write unit tests with syntax from the testthat package
  • Discuss the (dis)advantages of hosting a package in GitHub, CRAN, or R-Universe

1 Overview

Most R users are familiar with loading and using packages in their work; we’ve worked with a number of well-known packages in this course thus far! In addition, even if they mostly stick to a core set of packages, they understand the breadth of packages available on the Comprehensive R Archive Network (a.k.a. “CRAN”; the R package authority) for handling virtually any conceivable need. However, most R users have never created a package for their own work, often out of a fear that the process is too complicated. In reality, it’s fairly straighforward and super useful in your personal work. Creating packages serves two main use-cases:

  • Distribute reusable code among projects (even if just for yourself)
  • Reproducibly document analysis and models and their results

Even if you don’t plan on writing a package with such broad appeal such as, say, ggplot2 or dplyr, you still might consider creating a package to contain:

  • Useful utility functions you write (i.e., a Personal Package)
    • Having a place to put these functions makes it much easier to find and use them later.
  • A set of shared routines for your lab or research group, making it easier to remain consistent within your team and also to save time.
  • The analysis accompanying a thesis or manuscript, making it all that much easier for others to reproduce your results.

2 Get Ready

Before we get started with the tutorial, go ahead and make a new folder/project for the R package we’ll start today. Name that project/folder like so: {lastname}tools (e.g., oharatools or lyontools).

WarningPackages & GitHub

Make sure that your package is also tracked by Git and linked to a remote repository of the same name in GitHub! Include a “.gitignore” as you normally would for any other such project.

Users–including you–can install your package from GitHub even if you don’t go all the way to getting it accepted by CRAN.

2.1 Install Packages for Maintaining Packages

The usethis, devtools, roxygen2, and testthat packages greatly streamline the process of creating and maintaining a package. You should install all of these now if you have not already done so.

# install.packages("librarian")
librarian::shelf(usethis, devtools, roxygen2, testthat)

3 Create Core Package Skeleton

Fundamentally, an R package is a bundle of scripts that can be installed on any computer in order to grant access to those scripts. The core package architecture is intended to facilitate the installation step (so that install.packages works as expected, for example) and lets people who install the package use the functions as they would functions from any other package.

For an R package to work (i.e., be install-able), you need a few core components. Click through the tabs below–in order–to create the fundamental structure for your R package.

Make a new file named exactly “DESCRIPTION” (all caps, no file extension). This file is the central pillar of any R package as it tells R that this folder is a package along with a number of other necessary pieces of information (e.g., what other packages need to be installed, what R version is required, the license).

Once you have this file, click the collapsed menu below and copy/paste all of its contents into your empty DESCRIPTION file.

DESCRIPTION Contents

Change the part of the first line after “Package:” to match the name of your folder.

Package: mytools
Title: What the Package Does in One Line and Title Case
Version: 0.0.0.900
Authors@R: 
    person("First", "Last", , "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
    license
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.3

Once you have that file, edit the Title, Authors@R, and Description fields.

Make a new file named exactly “.Rbuildignore”, ignore any warnings about creating a file that starts with a period. This file is to an R package as the .gitignore is to a Git repository; it tells the package what files in your folder should not be included in the R package. This is typically used to avoid including things that would break your package or if you have content necessary for the package build on your (i.e., the maintainer’s) end, but not useful to the end user.

Once you have this file, click the collapsed menu below and copy/paste all of its contents into your empty .Rbuildignore.

.Rbuildignore Contents

Technically you only need these two files listed if you’re doing this in RStudio but it doesn’t hurt to include them at this point and is a good excuse to make a necessary file.

^.*\.Rproj$
^\.Rproj\.user$

Within your package folder, make a new folder called “R”. All of the functions in your package must live in this folder but for now we can leave it empty.

4 Add a License

While R packages tend to be publicly available, you still need to specify how you want people to be able to use the package and how to credit their use of your package! A “LICENSE” file is how this is specified. Information about choosing a LICENSE is provided in the R Package (2e) book Chapter 12: Licensing.

We’ll use the Apache 2.0 license for now and use one of the license-specific functions from the usethis package to quickly build it into our nascent R package. Copy and run the code from the following chunk.

This will do three necessary things:

  1. Create a LICENSE.md file with the legal text of the license
  2. Add the name and version of the license to the relevant line of your DESCRIPTION file
  3. Add LICENSE.md to your .Rbuildignore
usethis::use_apache_license()
✔ Adding "Apache License (>= 2)" to License.
✔ Writing LICENSE.md.
✔ Adding "^LICENSE\\.md$"" to .Rbuildignore.

5 Add Code

We are now ready to actually add a function to our package! To do this, do the following:

  1. Make a new R script in the “R” folder
    • You can do this either manually or with the usethis::use_r() function
  2. Save that script as “cv.R”
  3. Click the collapsed menu below and copy/paste all of its contents into your new script
cv() Function - Version 1
cv <- function(x = NULL, na_rm = TRUE){

  # Check input(s)
  if(is.null(x) || is.numeric(x) != TRUE)
    stop("'x' must be numeric")

  if(is.logical(na_rm) != TRUE){
    warning("'na_rm' must be a logical. Coercing to TRUE.")
    na_rm <- TRUE 
  }

  # Calculate SD and mean
  sd_x <- sd(x = x, na.rm = na_rm)
  avg_x <- mean(x = x, na.rm = na_rm)

  # Calculate coefficient of variation
  coef_var <- sd_x / avg_x

  # Return that 
  return(coef_var)
}

6 Add Documentation

Documentation is vital for functions, particularly those shared as part of a package, because users will encounter errors and will want to refer back to some sort of help document to try to figure out where they went wrong. The preferred method of documenting a function is to use roxygen2 syntax.

Work through the tabs below to create documentation in your function and get your package to recognize it as such.

Add the following lines to the top of your cv() function in your “cv.R” file (i.e., push down the ‘actual’ function code so that all documentation can be above the function itself).

#' @title Calculate CV
#'
#' @description Calculates the Coefficient of Variation ("CV") of a vector of numbers.
#' 
#' @param x Vector of numbers
#' @param na_rm Whether `NA`s should be removed. Defaults to `TRUE`
#'
#' @returns Coefficient of variation of the numbers provided to 'x'
#' @export
#' 
#' @examples
#' # Calculate CV
#' cv(x = c(4, 5, 6), na_rm = TRUE)
#' 

Double check that you’re on the right track by confirming that your “cv.R” file looks like the code chunk in the collapsed menu below

cv() Function - Version 2
#' @title Calculate CV
#'
#' @description Calculates the Coefficient of Variation ("CV") of a vector of numbers.
#' 
#' @param x Vector of numbers
#' @param na_rm Whether `NA`s should be removed. Defaults to `TRUE`
#'
#' @returns Coefficient of variation of the numbers provided to 'x'
#' @export
#' 
#' @examples
#' # Calculate CV
#' cv(x = c(4, 5, 6), na_rm = TRUE)
#' 
cv <- function(x = NULL, na_rm = TRUE){

  # Check input(s)
  if(is.null(x) || is.numeric(x) != TRUE)
    stop("'x' must be numeric")

  if(is.logical(na_rm) != TRUE){
    warning("'na_rm' must be a logical. Coercing to TRUE.")
    na_rm <- TRUE 
  }

  # Calculate SD and mean
  sd_x <- sd(x = x, na.rm = na_rm)
  avg_x <- mean(x = x, na.rm = na_rm)

  # Calculate coefficient of variation
  coef_var <- sd_x / avg_x

  # Return that 
  return(coef_var)
}

Once your cv() function has roxygen2-style comments, we can document the package. Fortunately, the devtools package has a nice function for doing exactly this! Run the following code chunk in your Console.

devtools::document()

This will create a new folder (named “man”, short for “manuals”) and fill it with specially-formatted “.Rd” files. You will never edit these files manually! If you update your function documentation and need to update the respective .Rd files, re-run devtools::document()!

Congratulations, you’re now the proud owner of your very own R package! You can now install this package (from GitHub) as well as run automatic checks on it, and even send it to CRAN.

A smaller, but only slightly less significant, win is that you can now check the help file for your cv() function! Run the following code chunk in your Console.

?cv

This should open up the help pane of your IDE and present the documentation that you just added (but processed so that it looks almost like a Quarto/RMarkdown document).

7 Testing

7.1 Unit Tests

Now that our package has a function, we can start creating “unit tests” to make sure our function(s) behave as expected/as they should. The testthat package provides some really nice tools for doing this reproducibly and to a high standard so we’ll use that structure.

Set up your package to use the testthat package by running the following code chunk in the Console.

usethis::use_testthat()

This will create a “tests” folder that contains both an R script called “testthat.R”–which you can ignore–and a folder called “testthat” containing nothing (yet).

That done, we can write specific unit tests for the cv() function that use functions from testthat. The syntax of a unit test with testthat can be a little alien until you get familiar with it so for now, do the following:

  1. Make a new script called “test-cv.R” in the “testthat” folder inside of the “tests” folder
    • So the file path will be tests / testthat / test-cv.R
  2. Click the collapsed menu below and copy/paste all of its contents into the “test-cv.R” file
cv() Unit Test
# Run all tests in this script:
## testthat::test_file(file.path("tests", "testthat", "test-cv.R"))

# Error testing
test_that("Errors work as desired", {
   expect_error(cv(x = NULL, na_rm = TRUE))
   expect_error(cv(x = "not a number", na_rm = TRUE))
 })

# Warning testing
test_that("Warnings work as desired", {
   expect_warning(cv(x = c(1, 2, 3), na_rm = "false"))
})

# Message testing
# test_that("Messages work as desired", {
#   # None in this function (yet)
#   expect_message()
# })

# Output testing
test_that("Outputs are as expected", {
  
  # Make a testing vector
  test_vec <- c(4, 5, 6)
  
  # Use the function to calculate CV and calculate it by hand
  fxn_cv <- cv(x = test_vec, na_rm = FALSE)
  test_sd <- sd(x = test_vec, na.rm = FALSE)
  test_avg <- mean(x = test_vec, na.rm = FALSE)
  test_cv <- (test_sd / test_avg)

  # Check certain aspects of output
  expect_equal(fxn_cv, test_cv)
  expect_true(class(fxn_cv) == "numeric")
})

Now we can run all of the unit tests in our package–currently just the one–by running the following code chunk in the Console.

devtools::test()

This will return a handy little report–in the Console–summarizing the results of your unit tests.

7.2 Package-Level Checks

In addition to function-specific unit tests, the devtools package also includes some tools for package-level checks. Copy and run the following code in the Console.

devtools::check()

The check() function will both, (1) run devtools::test() and (2) check the core package structure is compliant with CRAN standards. Even if you don’t plan on submitting to CRAN, you should still run check() because it will help make sure your package is working as desired.

7.3 Checking By Doing

Beyond automated checks, sometimes the best way to test your functions is simply to load the package and use the functions! You can load your functions (in order to test them) by copying the contents of the following code chunk and running it.

devtools::load_all()

8 Sharing & Releasing

Once your package is set up, you have a number of options for places to share that package and allow others to install it. These platforms have different requirements for you so consider those standards when choosing where to host your package.

The simplest way of sharing your package is simply to put it in a GitHub repository. If you’ve been following along with this lesson, you’re already nicely set up for this! Once your package is on GitHub, anyone who can find that repository can install it directly from GitHub like so:

# Option 1: `devtools`
devtools::install_github("github-username/repository-name")

# Option 2: `librarian`
librarian::shelf(github-username/repository-name)

Prerequisites:

  • GitHub has no pre-requisites for an R package! Even packages that are malformed / uninstallable are allowed to be in GitHub repository because GitHub doesn’t inherently support special cases of repositories
  • You can add a GitHub Action (GHA) to automate tests of your package each time you push but the success/failure of the GHA–or whether you have one at all–doesn’t affect whether you’re ‘allowed’ to have it on GitHub

If you’re willing to do additional legwork, you can get your package released onto CRAN. This would let anyone install it using the standard install.packages function. Note this is typically done in addition to hosting it on GitHub. In order to release your package on CRAN, you’ll need to run devtools::release() and work through the prompts it will put in the Console asking you whether you’ve done a set of necessary steps for CRAN-compliance (some of which were covered above).

If you are considering releasing a package more broadly, you may find that the supportive community at ROpenSci provides incredible help and valuable feeback through their onboarding process.

Prerequisites:

  • Releasing to CRAN requires a non-trivial amount of work beyond what’s necessary for GitHub
  • We don’t have time today to go into the specifics here, but if this is of interest, check out the R Packages (2e) book Chapter 20: Releasing to CRAN

A newer approach is to link your package release to R-Universe, which is an effective way to make it easy to test and maintain packages so that many people can install them using the familiar install.pacakges() function in R. In R-Universe, people and organizations can create their own universe of packages, which represents a collection of packages that appear as a CRAN-compatible repository in R.

For example, DataONE maintains the DataONE R-Universe, which lists the packages they actively maintain as an organization. So, any R-user that wants to install these packages can do so by adding our universe to their list of repositories, and then installing packages as normal. For example, to install the codyn package, one could use:

install.packages('codyn', repos = c('https://dataoneorg.r-universe.dev', 
  'https://cloud.r-project.org'))

Prerequisites:

  • The requirements here are like a less-stringent version of what’s necessary for CRAN
  • So, you’d need to do more than you would for GitHub but less than you’d need to do for CRAN

9 Looking Ahead

What we’ve outlined above is the core content of creating an R package but if this is of interest, the R Packages (2e) book is actually written in a very accessible way while still containing a ton of useful, practical information. If you decide to fully embrace making/maintaining an R package, it’s worth reading through some of this book!

There are a handful of extra components that are worth surfacing here though.

9.1 Dependencies

If your R code depends on functions from another package, you must declare it as a “dependency” (i.e., a package on which your package depends). This will make that package also be installed when someone installs your package. As you might expect, dependencies must be declared in the DESCRIPTION file. You can add a dependency either manually or with the usethis package.

In order to add a dependency manually, add the following lines to the end of your DESCRIPTION file. Be sure the package name is indented relative to the “Imports” field name.

Imports:
    ggplot2

In order to add a dependency with usethis, copy run the code from the following chunk. This will add a new line to the end of the DESCRIPTION labeled “Imports” and then another line indented below that with the ggplot2 package named.

usethis::use_package("ggplot2")

Note that you will get a warning from devtools::check() if you declare a dependency that isn’t used in your package.

9.2 Version Numbers

A key point in the default DESCRIPTION that we did not cover was the “Version” line. Version information is located in the DESCRIPTION file and when you first create a package (if you follow our instructions above) the version is 0.0.0.9000. When you update your package, you need to “increment” that version number so that users can know–and cite!–which version of your package they used.

Each part of the version number implies a different severity of change, where the earlier numbers represent more significant version changes and the later numbers represent more minor ones. The R package version number follows the format major.minor.patch.dev. Note that when you increment one part of the version number, you should reset all numbers to the right of the changed number to zero. See below for an in-depth explanation of each component of a version number.

3.2.9 4.0.0

A significant change to the package that would be expected to break users code. This is updated very rarely when the package has been redesigned in some way.

1.1.8 1.2.0

A minor version update means that new functionality has been added to the package. It might be new functions to improvements to existing functions that are compatible with most existing code.

1.1.2 1.1.3

Patch updates are bug fixes. They solve existing issues but don’t do anything new.

4.0.0 4.0.0.900

Dev versions are used during development and this part is missing from release versions. For example you might use a dev version when you give someone a beta version to test. A package with a dev version can be expected to change rapidly or have undiscovered issues.

Typically the dev version will always just be the more recent release version with “.900” added to the end.

It is common convention to create a “NEWS.md” file in the top-level of your R package and note changes among versions in that so that there’s a quick, strategic summary of how your package evolves from version to version. It’s good practice to create and curate this file as you make edits so that you don’t depend on your memory for what changes you’ve made to your package since its last release version.

Check out the NEWS.md file from the supportR package as an example for what this might look like after a few years.

10 Additional Resources