Getting our heads around R

If you ask, “What is LibreOffice Calc?” I would answer, “It is a spreadsheet program.”
If you ask, “What is SPSS?” I would answer, “It is a statistical analysis program.”
If you ask, “What is R?” I would answer, “It is a data-analysis language .” I know, that is a weird answer. To explain it, I want to briefly explain some aspects of computers that social scientists usually don’t learn.

Most of the time we open up a program (a.k.a. an App), load some data into it, change the data, then save the changed data-file. For example, we start Word, open (load) a document into it, edit the document, then click Save. The changed file is recorded on our hard drive, USP stick, or wherever we choose to store it. This is one of the most common ways we use computers.

The programs we use are pre-compiled sets of instructions that do all sorts of things to make our life easier. When we run most programs they open up a graphic window, with a menu of commands at the top organized under File, Edit, View, etc. Since the programs are compiled in machine-language, we cannot see how the instructions are written in human-readable form; they are just binary strings of numbers. Therefore, computer jocks often call pre-compiled programs “binaries”. They write instructions as source-code in human-readable form (well, so long as we assume that programmers are humans), and then they compile that code into machine-readable digital instructions.

However, there is an intermediate way of using computers, where you use a program to run a script of commands. Here I mean script in the same sense as in a stage-play or a screen-play: there are directions (Enter Romeo:) and also content that the viewer will hear (Hail, Mercutio!). There might be twenty different commands (stage-directions) in dramatic scripts, and the people producing the play follow those commands.

In a computer, there are some programs that will also follow human-readable scripts of commands. The most common one is the web browser. Web browsers read a text file written in a specific language, called HyperText Markup Language (HTML). When you “point” your browser at a specific address, it reads the HTML text file at that address. It then follows the instructions in that text file to actually create the web-page that you see. If you want to see what the HTML text file of a web-page looks like, you can right-click on the page and select “View Page Source” (at least in Firefox; your mileage may vary with other browsers). So HTML is actually a language and web-pages are written as a script of commands in that language. Web-browsers are actually interpreters of that language, which follow those scripts. The scripts themselves are written as text files.

In this sense, R is also a language that includes specific commands and a specific syntax. The commands are designed to manage data, perform statistical operations on the data, and generate text-reports and graphics from that data. A lot of the commands in R are simply mathematical functions, like sum. A command to sum together a particular set of data is called a function call. If you want to do a whole series of operations, it is easiest to write them as a series of lines in a text editor and then direct R to run that whole series of operations. That series of lines is called a script in R. It is actually a text file; but you put the dot-R (.R) after it to let your computer know that the particular language of that text file is R. Likewise, if you hand-coded a web page, you would save the text file with the dot-html suffix (.html) to let computers know that it can be read by a web browser. Since we are working with Census data, you should also know that CSV files are also kind of text file. In CSV files the text is organized in a very specific syntax so that various computer programs (like Calc or Excel) will try to read it as a table.

Please let me know if this explanation is helpful. I find that it can be really disorienting to learn something–like using R–without having a visceral feeling for what the software is. When you use CSV files, spreadsheets, and R for the first time, this may be your first time using a computer in the “old school” style. 98% of the time we only use graphic-user-interface programs (apps) that are so prevalent on laptops, smartphones, and tablets. So I am hoping that this gives you a very quick sense of the “under-the-hood” nature of computing.

Leave a Reply