4 Managing your files and data
4.1 Questions
Why is good file and data management important in R?
What is the concept of a project in R, and how does it help in organizing your work?
What is the purpose of creating R project folders and scripts?
Describe the significance of a working directory in R, and how it impacts your workflow?
How do you create, save, and run R scripts for efficient code management?
4.2 Learning Objectives
Understand the importance of properly organized files and datasets for efficient R projects.
Understand the concept of R projects and their role in organizing your work.
Write and run R code in scripts for better code organization and reproducibility.
Master the concept of working directories and navigate your project files effectively.
4.3 Lesson Content
4.3.1 Introduction
Good file and data management allows the user to debug code, reduce the number of coding errors, and improve data quality. Additionally, by utilizing best practices, the learner can easily share their work, organize projects, and develop a reproducible workflow.
4.3.2 Projects
An R project enables the learner to keep all their work in one folder. In the R project folder, all scripts, data files, figures, tables, history, and output can be stored in sub-folders within the R project. The root folder of the project is also known as the working directory (described in the section below).
To create a new project folder:
File –> New Project
4.3.3 Scripts
To ensure that the learner has a record of all the code that they have written in the console, it is important that the learner also write the code in a script and saves it. Failure to do this will mean that the learner will have to write the code in the console every time they need to execute it. Code written in script files allows one to run it multiple times and edit when necessary.
To create a new script file:
File –> New File –> R Script
Save the file in your project folder
Ensure that the file names are readable for both the R software and collaborators.
Examples include:
- “my_model_1.R”
4.3.4 Working Directory
The working directory is a file path or location on your computer that R uses for reading and writing files. The function getwd()
prints the current working directory, and setwd()
allows the user to set the working directory. This is where R looks for files that you ask it to load, and where it puts any files that you ask it to save.
The paths to files or directories can be defined using relative or absolute paths. The relative path is where the file is found in the location where the user is, and absolute includes the entire path from the root directory.
4.3.5 Finding help when using packages, functions, and inbuilt datasets
There is a built-in help system that can be accessed via the “Help” menu, or one can use the
?
symbol before the function, package, or dataset to get a brief description and instructions for use.Online R documentation is also available to allow the user to learn the basics of R.
4.4 Exercises
Create a new RStudio project called “novice_programmer_guide”.
Within the project folder, create a new R Script called “introduction.R” and save it.
Set your working directory in the console to the root of this new project, and practice loading data from a file within it.
Explain the significance of using relative and absolute file paths in R, and experiment with using relative paths to access different data files within your project directory.
Explain the role of the functions,
file.path()
,list.files()
, anddir.create()
, in R.Describe the purpose of the
.Rproj
file in an R project.How can you clear the R workspace? Explore functions like
rm()
andrm(list=ls())
to remove unwanted objects.
4.5 Summary
Hopefully, this lesson has demonstrated to the learner the importance of good file and data management. We have reviewed the concepts of projects, scripts, and working directories and shown how this can help with file organization, saving/reusing code, and ultimately impact project reproducibility. In the next lesson, we will see how this is important when importing data and saving analysis outputs.