11 Data Structures (Part II)
11.1 Questions
List the data structures that can be used in R?
How do we define the various data structures?
What operations can be performed on the different data structures?
How do these data structures differ from one another in terms of storage and functionality?
11.2 Learning Objectives
Learn about data structures in R, specifically matrices, arrays, and factors.
Define and manipulate matrices, arrays, and factors.
Perform common operations on each data structure.
Identify and differentiate between the diverse data structures offered by R.
Choose the appropriate data structure based on the type and organization of your data.
Understand the strengths and limitations of each structure for efficient analysis.
11.3 Lesson Content
11.3.1 Introduction
Data structures in R are the formats used to organize, process, retrieve, and store data. They help to organize stored data in a way that the data can be used more effectively. Data structures vary according to the number of dimensions and the data types (heterogeneous or homogeneous) contained.
The primary data structures are:
Vectors
Lists
Data frames
Matrices
Arrays
Factors
In this lesson, we will review the 2nd set of data structures: matrices, arrays, and factors.
11.3.2 Matrices
A matrix is a rectangular two-dimensional (2D) homogeneous data set containing rows and columns. It contains real numbers that are arranged in a fixed number of rows and columns. Matrices are generally used for various mathematical and statistical applications.
- Creation of matrices
Using the matrix()
function
Format: matrix(range, nrow = _, ncol = _)
- Obtain the dimensions of the matrices
m1
andm3
- Arithmetic with matrices
m1 + m2
m1 - m2
m1 * m2
m1 / m2
m1 == m2
- Matrix multiplication
m5 * m6
Knowledge of dimensions is very important when working with matrices. Here, we observe that we can’t multiply m5 by m6 because of the matrix dimensions
# m5 %*% m6 This will not execute because the dimensions are not appropriate
The vector m6 needs to be transposed before multiplication.
Transpose
Now we have a 5 by 2 matrix that can be multiplied by a 2 by 5 matrix.
- Generate an identity matrix
diag(5)
- Column and row names
Dimensions of m5
dim(m5)
Current column names of m5
colnames(m5)
Set column names
Display the matrix m5
and new column names
m5
colnames(m5)
Set row names
Get the dimensions of m6
dim(m6)
Display the current row names of m6
rownames(m6)
Set the row names of m6
m6
rownames(m6)
- Arithmetic within matrices
colSums(m5)
rowSums(m6)
- Subsetting matrices
subset_m5 <- m5[1:4, 1:2]
subset_m5
subset_m6 <- m6[3:5, 1:2]
subset_m6
- Matrix division
When you divide a matrix by a vector, the operation is row-wise.
m5 / m6
11.3.3 Arrays
An array is a multidimensional vector that stores homogeneous data. It can be thought of as a stacked matrix and stores data in more than 2 dimensions (n-dimensional). An array is composed of rows by columns by dimensions.
Example: an array with dimensions, dim = c(2,3,3), has 2 rows, 3 columns, and 3 matrices.
- Creating arrays
arr_1
- Filter array by index
arr_1[1 , , ]
arr_1[1, ,1]
arr_1[, , 1]
11.3.4 Factors
Factors are used to store integers or strings which are categorical. They categorize data and store the data in different levels. This form of data storage is useful for statistical modelling. Examples include TRUE or FALSE and male or female. Useful for handling qualitative datasets.
NOTE: R has a more efficient method for handling categorical variables using the forcats
package. This will be discussed in a subsequent series focusing on the tidyverse
.
vector <- c("Male", "Female")
factor_1 <- factor(vector)
factor_1
OR
factor_2 <- as.factor(vector)
factor_2
as.numeric(factor_2)
NOTE: To view the internal structure of various data types described above, the learner can use the str()
function.
Example
str(m5)
str(arr_1)
str(factor_1)
11.4 Exercises
How do you create a matrix in R? Give an example.
What are the column and row dimensions of a matrix called in R?
How do you access elements in a matrix?
Generate a 3x3 identity matrix using matrix().
Write code to create a 3x3 matrix with sequential numeric values.
Convert a character vector to a factor with 3 levels.
Describe the purpose of factors in R.
How do you create a factor variable with specified levels?
What is an array, and how does it differ from a matrix in R?
Provide an example of creating a three-dimensional array.
11.5 Summary
In this chapter, we have completed our review of data structures. These data structures have different properties that influence how they are used in various computational tasks. Additionally, we have looked at the strengths and limitations of each structure for data analysis. However, we haven’t discussed an important aspect of data structures: what do we do with missing data? In the final chapter, we will tackle this issue and provide solutions.