11  Data Structures (Part II)

11.1 Questions

  • List the data structures that can be used in R?

  • How do we define the various data structures?

  • What operations can be performed on the different data structures?

  • How do these data structures differ from one another in terms of storage and functionality?

11.2 Learning Objectives

  • Learn about data structures in R, specifically matrices, arrays, and factors.

  • Define and manipulate matrices, arrays, and factors.

  • Perform common operations on each data structure.

  • Identify and differentiate between the diverse data structures offered by R.

  • Choose the appropriate data structure based on the type and organization of your data.

  • Understand the strengths and limitations of each structure for efficient analysis.

11.3 Lesson Content

11.3.1 Introduction

Data structures in R are the formats used to organize, process, retrieve, and store data. They help to organize stored data in a way that the data can be used more effectively. Data structures vary according to the number of dimensions and the data types (heterogeneous or homogeneous) contained.

The primary data structures are:

  • Vectors

  • Lists

  • Data frames

  • Matrices

  • Arrays

  • Factors

In this lesson, we will review the 2nd set of data structures: matrices, arrays, and factors.

11.3.2 Matrices

A matrix is a rectangular two-dimensional (2D) homogeneous data set containing rows and columns. It contains real numbers that are arranged in a fixed number of rows and columns. Matrices are generally used for various mathematical and statistical applications.

  1. Creation of matrices

Using the matrix() function

Format: matrix(range, nrow = _, ncol = _)

m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(21:29, nrow = 3, ncol = 3)
m3 <- matrix(1:12, nrow = 2, ncol = 6)
m1
m2
m3

Using cbind() and rbind()

  1. Obtain the dimensions of the matrices m1 and m3
nrow(m1)
ncol(m1)
dim(m1)
nrow(m3)
ncol(m3)
dim(m3)
  1. Arithmetic with matrices
m1 + m2
m1 - m2
m1 * m2
m1 / m2
m1 == m2
  1. Matrix multiplication
m5 <- matrix(1:10, nrow = 5)
m6 <- matrix(43:34, nrow = 5)
m5
m6
m5 * m6

Knowledge of dimensions is very important when working with matrices. Here, we observe that we can’t multiply m5 by m6 because of the matrix dimensions

dim(m5)
dim(m6)
# m5 %*% m6 This will not execute because the dimensions are not appropriate

The vector m6 needs to be transposed before multiplication.

Transpose

dim(m5)
dim(t(m6))

Now we have a 5 by 2 matrix that can be multiplied by a 2 by 5 matrix.

m5%*%t(m6)
  1. Generate an identity matrix
diag(5)
  1. Column and row names

Dimensions of m5

dim(m5)

Current column names of m5

Set column names

colnames(m5) <- c("AA", "BB")

Display the matrix m5 and new column names

m5
colnames(m5)

Set row names

Get the dimensions of m6

dim(m6)

Display the current row names of m6

Set the row names of m6

rownames(m6) <- c("ABC", "BCA", "CAB", "ACB", "BAC")
m6
rownames(m6)
  1. Arithmetic within matrices
  1. Subsetting matrices
subset_m5 <- m5[1:4, 1:2]
subset_m5
subset_m6 <- m6[3:5, 1:2]
subset_m6
  1. Matrix division

When you divide a matrix by a vector, the operation is row-wise.

m5 / m6

11.3.3 Arrays

An array is a multidimensional vector that stores homogeneous data. It can be thought of as a stacked matrix and stores data in more than 2 dimensions (n-dimensional). An array is composed of rows by columns by dimensions.

Example: an array with dimensions, dim = c(2,3,3), has 2 rows, 3 columns, and 3 matrices.

  1. Creating arrays
arr_1 <- array(1:12, dim = c(2,3,2))
arr_1
  1. Filter array by index
arr_1[1 ,  , ]
arr_1[1, ,1]
arr_1[, , 1]

11.3.4 Factors

Factors are used to store integers or strings which are categorical. They categorize data and store the data in different levels. This form of data storage is useful for statistical modelling. Examples include TRUE or FALSE and male or female. Useful for handling qualitative datasets.

NOTE: R has a more efficient method for handling categorical variables using the forcatspackage. This will be discussed in a subsequent series focusing on the tidyverse.

vector <- c("Male", "Female")
factor_1 <- factor(vector) 
factor_1

OR

factor_2 <- as.factor(vector) 
factor_2
as.numeric(factor_2)

NOTE: To view the internal structure of various data types described above, the learner can use the str() function.

Example

str(m5)
str(arr_1)
str(factor_1)

11.4 Exercises

  • How do you create a matrix in R? Give an example.

  • What are the column and row dimensions of a matrix called in R?

  • How do you access elements in a matrix?

  • Generate a 3x3 identity matrix using matrix().

  • Write code to create a 3x3 matrix with sequential numeric values.

  • Convert a character vector to a factor with 3 levels.

  • Describe the purpose of factors in R.

  • How do you create a factor variable with specified levels?

  • What is an array, and how does it differ from a matrix in R?

  • Provide an example of creating a three-dimensional array.

11.5 Summary

In this chapter, we have completed our review of data structures. These data structures have different properties that influence how they are used in various computational tasks. Additionally, we have looked at the strengths and limitations of each structure for data analysis. However, we haven’t discussed an important aspect of data structures: what do we do with missing data? In the final chapter, we will tackle this issue and provide solutions.