Data types & structures in R

An introduction to programming in R


NCEAS Learning Hub

Common data types in R

A single data value in R usually falls into one of a few basic classes:

  • logical: TRUE or FALSE
  • integer: whole numbers (e.g., 1, 2, -999) - add L to specify the type explicitly (e.g. x <- 2L)
  • numeric (also called double): real numbers with decimals (e.g., 0.001, 1.0, -273.16)
  • character: text or enclosed in quotes (e.g., "cat", "I like bananas", or "Everest is 8848.9 m tall"); character strings are words, sentences, or paragraphs.
  • factor: characteris with defined levels or orders (e.g., low, medium, high might be converted to a factor by setting their order)

Any class can include NA to represent missing values.

Common data types in R

Quick Tip

Check the class of a given object using class(), or with a logical test such as: is.numeric(), is.character(), is.logical(), etc.

science_rocks <- "yes it does!"
class(science_rocks)
[1] "character"
is.numeric(science_rocks)
[1] FALSE
is.character(science_rocks)
[1] TRUE

Data structures in R

When you work with multiple values, you need structures that store data in an organized way. R uses several core structures for this purpose: vectors, matrices, arrays, data frames, and lists. Each holds collections of values in a defined shape and is suited to different types of operations.

Data structures in R: vectors

A vector contains one or more elements of the same data type

  • Our age_yrs object was a vector of three elements, all numeric data type.
  • Create a vector using the c() function (“combine”):
age_yrs = c(5, 3, 7)

Data structures in R: vectors

Each vector can contain only one data type/class. If you try to combine classes, R will “coerce” the other elements to the least restrictive class: logical –> integer –> numeric –> character

chr_vec <- c("hello", "goodbye", "see you later"); class(chr_vec)
[1] "character"
numeric_vec <- c(5, 1.3, 10); class(numeric_vec)
[1] "numeric"
logical_vec <- c(TRUE, FALSE, TRUE); class(logical_vec)
[1] "logical"
### use c() to combine different types; what's the result?
combined_vec <- c(TRUE, 3.14, "puppies!"); class(combined_vec)
[1] "character"
combined_vec ### look, all elements are character (in quotes)
[1] "TRUE"     "3.14"     "puppies!"

Data structures in R: vectors

You can set or retrieve a value of an element in a vector using that element’s “index” or position.

A set of single square brackets [...] is used to select an element or set of elements for vectors.

numeric_vec <- c(5, 1.3, 10, 2.8, 17, -1)

numeric_vec[2]          ### retrieve the second element
[1] 1.3
numeric_vec[3:5]        ### retrieve the third through fifth element
[1] 10.0  2.8 17.0
numeric_vec[c(1, 3, 6)] ### use a vector to retrieve elements
[1]  5 10 -1
numeric_vec[2] <- 3.14  ### set the value of the second element
numeric_vec             ### element 2 has been changed
[1]  5.00  3.14 10.00  2.80 17.00 -1.00

Data structures in R: matrix and array

Matrices and arrays are like vectors, but two-dimensional and N-dimensional, respectively.

  • A vector has only one dimension (length)
  • A matrix has two dimensions (rows/columns).
  • An array has more than two dimensions (e.g., a cube, but can also contain as many dimensions as you need)

Like a vector, all elements of a matrix or array must be of the same type.

Tip

Matrices and arrays are useful for mathematical purposes, e.g., linear algebra, but not as commonly used for data science purposes.

Data structures in R: lists

A list contains one or more elements, but unlike a vector, each element can be a different type.

We can create a list using the list() function.

### a list containing three numeric elements:
list_of_nums <- list(5, 3.14, -999)

### a list containing elements of different types
list_of_stuff <- list(5, 7:10, c("puppies", "kittens"), c(TRUE, FALSE))
list_of_stuff[[2]]
[1]  7  8  9 10

Data structures in R: lists

Like a vector, you can set or retrieve a value of an element in a list using that element’s “index” or position.

A set of double square brackets [[...]] is used to select an element or set of elements for lists.

### a list containing elements of different types
list_of_stuff <- list(5, 7:10, c("puppies", "kittens"), c(TRUE, FALSE))

class(list_of_stuff[[2]]) ### numeric vector: 7, 8, 9, 10
[1] "integer"
class(list_of_stuff[[3]]) ### character vector: "puppies" and "kittens"
[1] "character"
list_of_stuff[[1]] <- list("bananas", c(1, 2, 4))
class(list_of_stuff[[1]]) ### was numeric, now it's a list!
[1] "list"

Data structures in R: data frames

A data frame is a two-dimensional structure for tabular data, with columns that can differ in type.

You can create one with data.frame(), but in practice most data frames come from importing spreadsheets such as .xlsx files or comma-separated .csv files.

Data structures in R: data frames

Data frames consist of named columns, each storing a variable, and rows that hold the observed values for those variables.

Let’s create a simple one representing three dogs, with ages (in human years) and weights (in pounds).

Create the data frame:

dog_df <- data.frame(
    name   = c('Waffle', 'Khora', 'Teddy'),
    weight = c(      35,      60,     50),
    age    = c(       5,       9,      7)
  )

Result:

dog_df
    name weight age
1 Waffle     35   5
2  Khora     60   9
3  Teddy     50   7

Code styling

Note that spacing, tabs, and new lines don’t affect how the R code runs - but they can help make it easier to read! Use code styling to help communicate your intentions to others.

Data structures in R: data frames

We can explore our data frame using the RStudio IDE.

Try these out to see how they work:

  • Find the dog_df data frame in the Environment pane and click on it
    • Interactive view, try sorting and searching!
  • Find the data frame in the Environment pane and click on the blue arrow next to it
  • In the Console, type head(dog_df) to see the first few rows
    • Note, our data frame is very short so you’ll see all the rows!
  • In the Console, type View(dog_df) (note capital V)
    • This is the same as clicking on it in the Environment pane.

Data structures in R: data frames

A data frame is just a named list of vectors!

To help understand how to access information in a data frame, it is helpful to look at it as a list of vectors, each vector of the same length, and given a handy name!

dog_df <- data.frame(
    name   = c('Waffle', 'Khora', 'Teddy'), ### char vec length 3
    weight = c(      35,      60,     50),  ### num vec length 3
    age    = c(       5,       9,      7)   ### num vec length 3
  )

Because a data frame is a list of vectors, our single [...] and double [[...]] square brackets can work, but there are other more elegant ways of retrieving and setting values in a data frame.

Data structures in R: data frames

Accessing elements of a data frame

You can extract elements from a data frame with two-dimensional indexing: df[rows, columns]. Entire columns can be pulled by name using df[['col']] or df$col. To access specific elements within a column, index the column directly, for example df$col[3].

Data structures in R: data frames

Accessing values of elements of a data frame

Using the dog_df we created previously:

dog_df[1, 3]       ### first row, third column: Waffle's age
[1] 5
### leave row or column blank to choose all in that row/col:
dog_df[ , 1]       ### all rows, first column: name of all dogs 
[1] "Waffle" "Khora"  "Teddy" 
dog_df[['weight']] ### weight of all dogs, in order
[1] 35 60 50
dog_df$age         ### age of all dogs, in order
[1] 5 9 7
dog_df$name[2]     ### name of all dogs, then choose only element 2
[1] "Khora"

Data structures in R: data frames

Changing values of elements of a data frame or adding new variables

Using the dog_df we created previously, let’s update Waffle’s age and weight, and then add a new column describing the color of each dog.

dog_df[1, 2] <- 34  ### assign new value to row 1, col 2
dog_df$age[1] <- 6  ### assign new value to age[1]

### create a new column using the $ operator, and assign values
dog_df$color <- c('tan', 'grey', 'brown') 

dog_df  ### inspect the updated data frame
    name weight age color
1 Waffle     34   6   tan
2  Khora     60   9  grey
3  Teddy     50   7 brown

Data structures in R: data frames

Accessing elements of a data frame

Note

These methods of accessing data within a dataframe will always work in R, as they are part of Base R, or the core functionality of the R language.

Later we will introduce you to a different approach to coding in R, called the Tidyverse, that you may find easier and more elegant.