R: Pie Chart output as lesson

Kawai and Uri ask: how do I make a Pie Chart in R-Studio? Good question. I will go through a somewhat excruciating way of doing this in order to teach some concepts in R.

My immediate thought is that the easiest thing to show in a Pie Chart is the ethnic makeup of your county. We already created scalar variables (named single values) for the proportion of each race in your county: prp.Wht.Co, prp.Blk.Co, prp.AIAN.Co, etc. So couldn’t we use these to create a Pie Chart?

The problem is that these are separate variables. The command for making a pie chart in R is simple: it is pie(), and you need to specify what gets included in the chart as the ‘argument’ within the parentheses. We could try placing all of the individual scalars within the parentheses as the argument, like this:

pie(prp.Wht.Co,prp.Blk.Co,prp.AIAN.Co,prp.Asn.Co,prp.NHPI.Co,prp.Oth.Co,prp.Mult.Co,prp.Latin.Co)

However, R does not recognize these numbers as proportions, so it does not generate a meaningful graphic.

Concatenate: a very useful general function

Instead, we can use the concatenate function to group all of these separate numbers into a single string–a vector–and then try to get R to generate a pie chart. The syntax for concatenate is pretty basic: NewVector = c(x,y,z…) . As I showed in class, I build this command through copy-and-paste, rather than retyping from scratch and running the risk of creating a typo:

> Prop.Groups.Co = c(prp.Wht.Co,prp.Blk.Co,prp.AIAN.Co,prp.Asn.Co,prp.NHPI.Co,prp.Oth.Co,prp.Mult.Co,prp.Latin.Co)

The new named vector appears in my Workspace, and I can see that it is a string of numbers; but if I want to see the contents of it I type the vector’s name into the Console Pane and hit enter. R then “returns” the contents of the variable:

> Prop.Group.Co
0.477512929 0.089229523 0.002844546 0.141923214 0.004177212 
0.002976097 0.037719787 0.243616692

…and then I can generate a Pie Chart:

> pie(Prop.Group.Co)

This generates a basic Pie Chart. But it is ugly and not very usable: the labels are just the numbers 1 through 8 and it starts at the “3 o’clock” position and wraps counterclockwise…pretty meager.

Assign names to the data in a vector

So the first thing I am going to do is assign names to the values I included in the Prop.Group.Co variable. I need to assign those names in the same order I used to create the variable in the first place. The names should be whatever I want to display in the Pie Chart:

names(Prop.Groups.Co) = c("White","Black","AIAN","Asian","NHPI","Other","Multi","Latino")

If I keep an eye on the Workspace display of that variable, it changes from just being called “num [1:8]…” to “Named num [1:8]…” and when I retype the variable name by itself in the Console Pane, R returns:

> Prop.Group.Co
      White       Black        AIAN       Asian        NHPI 
0.477512929 0.089229523 0.002844546 0.141923214 0.004177212 
      Other       Multi      Latino 
0.002976097 0.037719787 0.243616692

Now, when I repeat the command pie(Prop.Group.Co), the Pie Chart shows the names I added to the variable.

Subarguments within R Commands

The next problem is that the wedges start at the “3 o’clock” position, not noon; and it wraps counterclockwise. If I look up the options, the default settings for this command are:
pie(x, labels = names(x), edges = 200, radius = 0.8,
    clockwise = FALSE, init.angle = if(clockwise) 90 else 0,
    density = NULL, angle = 45, col = NULL, border = NULL,
    lty = NULL, main = NULL, …)
So, if I set clockwise = TRUE, the chart should not only wrap around like most familiar Pie Charts, but the initial angle (if clockwise) should be from the top–from 90 degrees.

> pie(Prop.Group.Co, clockwise = TRUE)

…does indeed generate the Pie Chart that I want.

Getting Help with R Commands

You might ask: How did I find that whole list of arguments and subarguments? There are three ways to tackle problems in R:
1) Google your problem: “R Studio pie() options”, for example.
2) Youtube. Same search term; start poking around among the videos.
3) Invoke the help-system on R:

?pie

The help for this command appears on the lower-right View Pane. Whenever you are going to use a command, it might we worth a read through the syntax of arguments and options (subarguments) for the command.

Other approaches to the issue

Towards the end of class today we discussed how you could improve output, now that we have gone through the hassle of creating this new variable. The first suggestion came from the Help page of the pie() command itself: the authors of this command disparage Pie Charts, and recommend dot-charts as being superior, based on peer-reviewed research.

> dotchart(Prop.Group.Co)

Does indeed produce a more readable graphic. Lisa commented that it was still harder to read, though, because the groups were displayed on the chart in the same order in which I had created the variable. So we looked at the sort() command and found it useful:

> sort(Prop.Group.Co, decreasing = TRUE)
      White      Latino       Asian       Black 
0.477512929 0.243616692 0.141923214 0.089229523 
      Multi        NHPI       Other        AIAN 
0.037719787 0.004177212 0.002976097 0.002844546

…but when I then generated a Dot Chart, the sequence reverted to the original. One solution is to Assign the sorted data into a new named variable, which I showed in class. Samantha came up with a better option (by looking for it in Google): “nest” two functions together to sort-then-display the data in one command. The class also generally agreed that Barplots are easier to read than Dot Charts, so we used the barplot() command but modified it the same way:

barplot(sort(Prop.Group.Co, decreasing = TRUE))

And that worked. The results were very readable.

The bigger picture

I tried to teach class today in a specific way: to show you how to approach R, rather than give you specific instructions about how to accomplish a specific task. To generate a single Pie Chart, just once, with no modifications, it is easier and faster to use a spreadsheet program. For purposes of completing your term paper, you may choose that option. I respect the fact that your time is constrained and that getting results is the key thing. So maybe the most valuable outcome of this class is that you will be able to do basic descriptive and inferential statistics using a spreadsheet program. That, in itself, is extremely valuable.

With this particular lesson, however, I wanted to teach a few more things. First of all, you get to see how a few more commands work. Concatenate is especially useful. Second, I wanted you to see me problem-solve. That worked: Lisa and Samantha came up with better solutions than I was showing, and we could implement those immediately. Third, I wanted to show that once you get the data organized in R, you can work with it and revise your outputs fairly quickly. So if you have to produce a LOT of charts or data-outputs, or if you have to keep revising and tweaking your outputs, R might be more efficient to use than a spreadsheet.

One more point I wanted to emphasize with this approach: the point is to know what we are doing with the data, and then figure out which program is the best one to use for our analysis. In the past, when the software was expensive, researchers tended to get locked in to using one particular program. There is an expression that ‘If you keep wielding a hammer, every problem starts to look like a nail.’ But a hammer is not a very good tool to use to drive a bolt. We should switch tools as needed. When the tools are free, we should learn to let go of the “lock-in” tendency and focus on the best way to solve the actual data-problem.