Backup, Clean-up, and Soup-up R

I tend to make a lot of mistakes when running R. The two most common sources of error are typos and a mistake in the syntax of a command. Sometimes the command worked, but the variable has the wrong name or the data in it is bad. Here I have collected a series of operations to help correct errors and back up data. At the bottom of the page I also explain how to add modules to R that will increase its usefulness to us.

I. Backing up your data.

Backing up your data, step 1: the Workspace

First of all, set your working-directory (Session > Set Working Directory > Choose…).
Second, check what R understands to be its current working directory:

getwd()

Now you know where your saved data will go.

In the Workspace Pane, click the little blue floppy-disk icon:

Save your Workspace

This will open a new dialog box where you can choose the path (directory-location) and name of your file. Note my geeky protocol: I name my file with the convention “WrkSpace_YYMMDD” and by default the software adds “.RData” as the suffix.

Naming the saved Workspace

Next time you start R-Studio, do it by clicking on this file. Then R-Studio will load the whole workspace, including the dataframes, variables, and other objects you have imported and created.

Backing up your data, step 2: export your dataframe as a CSV file.

If you follow the exercises in these web-pages, you will add quite a bit of data to your main dataframe. You can back it up by using the write.csv() function to export the whole dataframe as a CSV file. This is also useful because you can open that CSV with a spreadsheet program, copy selected data from it, and paste it as a table into a word-processing document–such as your final report.

# Save a CSV of the updated dataframe "CO":
write.csv(CO, "CO_141123.csv")

R-Studio will save this file to your current working directory. If you are not sure which is your current working directory, use getwd() to find out, and that way you can locate your exported CSV file.

Backing up your data, step 3: save your History

I normally call the upper-right-hand-pane the Workspace Pane in R-Studio. But it also has a History tab. If you click on that tab, you will see a log of all the commands you have entered into R. The Console Pane (lower left) is also aware of this history; if you have the cursor active in the Console Pane, and press the Up arrow on your keyboard, you will see all the previously-entered commands appear. (I am digressing a bit now, but notice that this history-keeping function in R is really useful if you want to re-run a command or re-run a slightly-edited version of the command.) It is nice to back up this complete record of all your entered commands:

(notice I have clicked on the History tab, and now it is "in front of" the Workspace tab)

(notice I have clicked on the History tab, and now it is “in front of” the Workspace tab)

The History file will be saved to the current Working Directory, with an automatic “.RHistory” suffix. It is a text file. I suggest naming it with the YYMMDD prefix, so on this day it would be named: 141123.RHistory

Backing up your data, step 4: Save your Script Sheet

You can also save scripts of commands in the upper-left window. Unlike the History, which is a verbatim record of every entered command (including view-refresh commands), the Script is your own page where you can keep and annotate the commands that have worked for you.

05_SaveScriptsheet

In this case, I have a sheet that I have been working on for several weeks, and R-Studio just overwrites the same named file. R-Studio automatically appends the “.r” suffix to this file.

Backing up your data, final note: R-Studio’s own “.RData” backup

When you quit R-Studio, it also asks if you want to save the workspace. I think this automatic save may be redundant, but it saves an .RData file in the directory where the software is installed (at least it does on my Linux system). I think that, at minimum, this will preserve your preferences. Not sure what else it saves, so I would not rely on it for saving all your work.

With all your stuff backed up, now you can move on to…

II. Cleaning up your Workspace

Removing an object from your R Workspace

Now that you have baked up your data, you can remove stuff that was made by mistake, like a mis-named variable. The basic syntax is: rm(object) if you name specific objects each time you run this command, you will not accidentally erase all the data in your workspace.

Deleting a specific column of data in a dataframe

Sometimes you make a mistake when adding a column of data (also called a variable) to a dataframe. How do you selectively delete data within a dataframe? Set the values to NULL.

# Remove mistaken variable "HiDens" from dataframe CO:
 CO$HiDens = NULL

Renaming a specific column within a dataframe

Sometimes you like the data you created, but not the name you gave to the column.

# Rename the variable "ThrLat" within dataframe "CO" to become "hiloLatTr":
 names(CO)[names(CO)=="ThrLat"] <-"hiloLatTr"

The syntax of this command is pretty convoluted. The basic function is:
names(DF)[col#] <- “newcolumnname”
…and if you know the number of the column you are going to rename, you can just put the number in; but we are using a dataframe with more than 40 columns. Rather than try to find the column name, we can insert the following subcommand into the middle of the main command:
names(DF)==”oldcolumnname”
This subcommand makes R aware that “oldcolumnname” is the set of data to be modified, within the dataframe “DF”.

There is an easier command to rename columns within a dataframe:
rename.vars(DF, from=”oldcolumname”, to=”newcolumnname”)
…which has a much more intuitive syntax. However, it is not available in the basic default installation of R and R-Studio. It is in the add-on package called gdata. This is an appropriate moment to point out that you can…

III. Soup-up R: add packages to add functionality

Since R is an open-source project, many people create packages that add functions to R. In May of 2014, for example, Seong-Yun Hong published (i.e. uploaded) the R package “seg” which includes the five equations Massey & Denton (1988) described as the various dimensions of segregation. Make sure your system has an active internet connection, and then type the following command in the Console:

install.packages("seg")

Also, as mentioned above, there is an easier command for renaming variables, available in the gdata package. Install it (with an internet connection) by entering:

install.packages("gdata")

In the install-feedback, I noticed that this also installed two more very useful commands: read.xls() and read.xlsx(), which means you could import straight from an Excel spreadsheet.

I like to activate these and several more built-in packages in R-Studio. In the lower-right View Pane, switch to the Packages tab and click the checkboxes shown:

Screenshot of the lower right-hand "View" pane in R-Studio, with the "Packages" tab brought forward.

Screenshot of the lower right-hand “View” pane in R-Studio, with the “Packages” tab brought forward.

Before you installĀ gdata package, if you try to use the rename.vars() function, R will respond with a digital shoulder-shrug:

> rename.vars(CO, from="ThrDens", to="hiloDnsTr")
Error: could not find function "rename.vars"

With the gdata package installed, it runs without complaint.