# install.packages("librarian")
librarian::shelf(usethis, devtools, roxygen2, testthat)1 Overview
Most R users are familiar with loading and using packages in their work; we’ve worked with a number of well-known packages in this course thus far! In addition, even if they mostly stick to a core set of packages, they understand the breadth of packages available on the Comprehensive R Archive Network (a.k.a. “CRAN”; the R package authority) for handling virtually any conceivable need. However, most R users have never created a package for their own work, often out of a fear that the process is too complicated. In reality, it’s fairly straighforward and super useful in your personal work. Creating packages serves two main use-cases:
- Distribute reusable code among projects (even if just for yourself)
- Reproducibly document analysis and models and their results
Even if you don’t plan on writing a package with such broad appeal such as, say, ggplot2 or dplyr, you still might consider creating a package to contain:
- Useful utility functions you write (i.e., a Personal Package)
- Having a place to put these functions makes it much easier to find and use them later.
- A set of shared routines for your lab or research group, making it easier to remain consistent within your team and also to save time.
- The analysis accompanying a thesis or manuscript, making it all that much easier for others to reproduce your results.
2 Get Ready
Before we get started with the tutorial, go ahead and make a new folder/project for the R package we’ll start today. Name that project/folder like so: {lastname}tools (e.g., oharatools or lyontools).
2.1 Install Packages for Maintaining Packages
The usethis, devtools, roxygen2, and testthat packages greatly streamline the process of creating and maintaining a package. You should install all of these now if you have not already done so.
3 Create Core Package Skeleton
Fundamentally, an R package is a bundle of scripts that can be installed on any computer in order to grant access to those scripts. The core package architecture is intended to facilitate the installation step (so that install.packages works as expected, for example) and lets people who install the package use the functions as they would functions from any other package.
For an R package to work (i.e., be install-able), you need a few core components. Click through the tabs below–in order–to create the fundamental structure for your R package.
Make a new file named exactly “DESCRIPTION” (all caps, no file extension). This file is the central pillar of any R package as it tells R that this folder is a package along with a number of other necessary pieces of information (e.g., what other packages need to be installed, what R version is required, the license).
Once you have this file, click the collapsed menu below and copy/paste all of its contents into your empty DESCRIPTION file.
DESCRIPTION Contents
Change the part of the first line after “Package:” to match the name of your folder.
Package: mytools
Title: What the Package Does in One Line and Title Case
Version: 0.0.0.900
Authors@R:
person("First", "Last", , "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
license
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.3Once you have that file, edit the Title, Authors@R, and Description fields.
Make a new file named exactly “.Rbuildignore”, ignore any warnings about creating a file that starts with a period. This file is to an R package as the .gitignore is to a Git repository; it tells the package what files in your folder should not be included in the R package. This is typically used to avoid including things that would break your package or if you have content necessary for the package build on your (i.e., the maintainer’s) end, but not useful to the end user.
Once you have this file, click the collapsed menu below and copy/paste all of its contents into your empty .Rbuildignore.
.Rbuildignore Contents
Technically you only need these two files listed if you’re doing this in RStudio but it doesn’t hurt to include them at this point and is a good excuse to make a necessary file.
^.*\.Rproj$
^\.Rproj\.user$Within your package folder, make a new folder called “R”. All of the functions in your package must live in this folder but for now we can leave it empty.
4 Add a License
While R packages tend to be publicly available, you still need to specify how you want people to be able to use the package and how to credit their use of your package! A “LICENSE” file is how this is specified. Information about choosing a LICENSE is provided in the R Package (2e) book Chapter 12: Licensing.
We’ll use the Apache 2.0 license for now and use one of the license-specific functions from the usethis package to quickly build it into our nascent R package. Copy and run the code from the following chunk.
This will do three necessary things:
- Create a LICENSE.md file with the legal text of the license
- Add the name and version of the license to the relevant line of your DESCRIPTION file
- Add LICENSE.md to your .Rbuildignore
usethis::use_apache_license()✔ Adding "Apache License (>= 2)" to License.
✔ Writing LICENSE.md.
✔ Adding "^LICENSE\\.md$"" to .Rbuildignore.
5 Add Code
We are now ready to actually add a function to our package! To do this, do the following:
- Make a new R script in the “R” folder
- You can do this either manually or with the
usethis::use_r()function
- You can do this either manually or with the
- Save that script as “cv.R”
- Click the collapsed menu below and copy/paste all of its contents into your new script
cv() Function - Version 1
cv <- function(x = NULL, na_rm = TRUE){
# Check input(s)
if(is.null(x) || is.numeric(x) != TRUE)
stop("'x' must be numeric")
if(is.logical(na_rm) != TRUE){
warning("'na_rm' must be a logical. Coercing to TRUE.")
na_rm <- TRUE
}
# Calculate SD and mean
sd_x <- sd(x = x, na.rm = na_rm)
avg_x <- mean(x = x, na.rm = na_rm)
# Calculate coefficient of variation
coef_var <- sd_x / avg_x
# Return that
return(coef_var)
}6 Add Documentation
Documentation is vital for functions, particularly those shared as part of a package, because users will encounter errors and will want to refer back to some sort of help document to try to figure out where they went wrong. The preferred method of documenting a function is to use roxygen2 syntax.
Work through the tabs below to create documentation in your function and get your package to recognize it as such.
Add the following lines to the top of your cv() function in your “cv.R” file (i.e., push down the ‘actual’ function code so that all documentation can be above the function itself).
#' @title Calculate CV
#'
#' @description Calculates the Coefficient of Variation ("CV") of a vector of numbers.
#'
#' @param x Vector of numbers
#' @param na_rm Whether `NA`s should be removed. Defaults to `TRUE`
#'
#' @returns Coefficient of variation of the numbers provided to 'x'
#' @export
#'
#' @examples
#' # Calculate CV
#' cv(x = c(4, 5, 6), na_rm = TRUE)
#' Double check that you’re on the right track by confirming that your “cv.R” file looks like the code chunk in the collapsed menu below
cv() Function - Version 2
#' @title Calculate CV
#'
#' @description Calculates the Coefficient of Variation ("CV") of a vector of numbers.
#'
#' @param x Vector of numbers
#' @param na_rm Whether `NA`s should be removed. Defaults to `TRUE`
#'
#' @returns Coefficient of variation of the numbers provided to 'x'
#' @export
#'
#' @examples
#' # Calculate CV
#' cv(x = c(4, 5, 6), na_rm = TRUE)
#'
cv <- function(x = NULL, na_rm = TRUE){
# Check input(s)
if(is.null(x) || is.numeric(x) != TRUE)
stop("'x' must be numeric")
if(is.logical(na_rm) != TRUE){
warning("'na_rm' must be a logical. Coercing to TRUE.")
na_rm <- TRUE
}
# Calculate SD and mean
sd_x <- sd(x = x, na.rm = na_rm)
avg_x <- mean(x = x, na.rm = na_rm)
# Calculate coefficient of variation
coef_var <- sd_x / avg_x
# Return that
return(coef_var)
}Once your cv() function has roxygen2-style comments, we can document the package. Fortunately, the devtools package has a nice function for doing exactly this! Run the following code chunk in your Console.
devtools::document()This will create a new folder (named “man”, short for “manuals”) and fill it with specially-formatted “.Rd” files. You will never edit these files manually! If you update your function documentation and need to update the respective .Rd files, re-run devtools::document()!
Congratulations, you’re now the proud owner of your very own R package! You can now install this package (from GitHub) as well as run automatic checks on it, and even send it to CRAN.
A smaller, but only slightly less significant, win is that you can now check the help file for your cv() function! Run the following code chunk in your Console.
?cvThis should open up the help pane of your IDE and present the documentation that you just added (but processed so that it looks almost like a Quarto/RMarkdown document).
7 Testing
7.1 Unit Tests
Now that our package has a function, we can start creating “unit tests” to make sure our function(s) behave as expected/as they should. The testthat package provides some really nice tools for doing this reproducibly and to a high standard so we’ll use that structure.
Set up your package to use the testthat package by running the following code chunk in the Console.
usethis::use_testthat()This will create a “tests” folder that contains both an R script called “testthat.R”–which you can ignore–and a folder called “testthat” containing nothing (yet).
That done, we can write specific unit tests for the cv() function that use functions from testthat. The syntax of a unit test with testthat can be a little alien until you get familiar with it so for now, do the following:
- Make a new script called “test-cv.R” in the “testthat” folder inside of the “tests” folder
- So the file path will be
tests / testthat / test-cv.R
- So the file path will be
- Click the collapsed menu below and copy/paste all of its contents into the “test-cv.R” file
cv() Unit Test
# Run all tests in this script:
## testthat::test_file(file.path("tests", "testthat", "test-cv.R"))
# Error testing
test_that("Errors work as desired", {
expect_error(cv(x = NULL, na_rm = TRUE))
expect_error(cv(x = "not a number", na_rm = TRUE))
})
# Warning testing
test_that("Warnings work as desired", {
expect_warning(cv(x = c(1, 2, 3), na_rm = "false"))
})
# Message testing
# test_that("Messages work as desired", {
# # None in this function (yet)
# expect_message()
# })
# Output testing
test_that("Outputs are as expected", {
# Make a testing vector
test_vec <- c(4, 5, 6)
# Use the function to calculate CV and calculate it by hand
fxn_cv <- cv(x = test_vec, na_rm = FALSE)
test_sd <- sd(x = test_vec, na.rm = FALSE)
test_avg <- mean(x = test_vec, na.rm = FALSE)
test_cv <- (test_sd / test_avg)
# Check certain aspects of output
expect_equal(fxn_cv, test_cv)
expect_true(class(fxn_cv) == "numeric")
})Now we can run all of the unit tests in our package–currently just the one–by running the following code chunk in the Console.
devtools::test()This will return a handy little report–in the Console–summarizing the results of your unit tests.
7.2 Package-Level Checks
In addition to function-specific unit tests, the devtools package also includes some tools for package-level checks. Copy and run the following code in the Console.
devtools::check()The check() function will both, (1) run devtools::test() and (2) check the core package structure is compliant with CRAN standards. Even if you don’t plan on submitting to CRAN, you should still run check() because it will help make sure your package is working as desired.
7.3 Checking By Doing
Beyond automated checks, sometimes the best way to test your functions is simply to load the package and use the functions! You can load your functions (in order to test them) by copying the contents of the following code chunk and running it.
devtools::load_all()9 Looking Ahead
What we’ve outlined above is the core content of creating an R package but if this is of interest, the R Packages (2e) book is actually written in a very accessible way while still containing a ton of useful, practical information. If you decide to fully embrace making/maintaining an R package, it’s worth reading through some of this book!
There are a handful of extra components that are worth surfacing here though.
9.1 Dependencies
If your R code depends on functions from another package, you must declare it as a “dependency” (i.e., a package on which your package depends). This will make that package also be installed when someone installs your package. As you might expect, dependencies must be declared in the DESCRIPTION file. You can add a dependency either manually or with the usethis package.
In order to add a dependency manually, add the following lines to the end of your DESCRIPTION file. Be sure the package name is indented relative to the “Imports” field name.
Imports:
ggplot2
In order to add a dependency with usethis, copy run the code from the following chunk. This will add a new line to the end of the DESCRIPTION labeled “Imports” and then another line indented below that with the ggplot2 package named.
usethis::use_package("ggplot2")Note that you will get a warning from devtools::check() if you declare a dependency that isn’t used in your package.
9.2 Version Numbers
A key point in the default DESCRIPTION that we did not cover was the “Version” line. Version information is located in the DESCRIPTION file and when you first create a package (if you follow our instructions above) the version is 0.0.0.9000. When you update your package, you need to “increment” that version number so that users can know–and cite!–which version of your package they used.
Each part of the version number implies a different severity of change, where the earlier numbers represent more significant version changes and the later numbers represent more minor ones. The R package version number follows the format major.minor.patch.dev. Note that when you increment one part of the version number, you should reset all numbers to the right of the changed number to zero. See below for an in-depth explanation of each component of a version number.
3.2.9 4.0.0
A significant change to the package that would be expected to break users code. This is updated very rarely when the package has been redesigned in some way.
1.1.8 1.2.0
A minor version update means that new functionality has been added to the package. It might be new functions to improvements to existing functions that are compatible with most existing code.
1.1.2 1.1.3
Patch updates are bug fixes. They solve existing issues but don’t do anything new.
4.0.0 4.0.0.900
Dev versions are used during development and this part is missing from release versions. For example you might use a dev version when you give someone a beta version to test. A package with a dev version can be expected to change rapidly or have undiscovered issues.
Typically the dev version will always just be the more recent release version with “.900” added to the end.
It is common convention to create a “NEWS.md” file in the top-level of your R package and note changes among versions in that so that there’s a quick, strategic summary of how your package evolves from version to version. It’s good practice to create and curate this file as you make edits so that you don’t depend on your memory for what changes you’ve made to your package since its last release version.
Check out the NEWS.md file from the supportR package as an example for what this might look like after a few years.
10 Additional Resources
- Hadley Wickham and Jenny Bryan’s awesome book: R Packages
- ROpenSci Blog Post: How to create your personal CRAN-like repository on R-universe
- Karl Broman’s: R package primer: a minimal tutorial on writing R packages
- Thomas Westlake’s Short Tutorial: Writing an R package from scratch (his post is an updated version of Hilary Parker’s blog post)