Quantifying Segregation

December 21, 2014: variable-names on this page now conform to my naming conventions.

On this page we will calculate the Indices of Dissimilarity (D) and Isolation (I), which are two of the primary measures of racial residential segregation described by Massey & Denton (1988).

To begin with, calculate the total population of each group in the county. These will become freestanding named variables, because they are whole-county summaries that so not belong among tract-level data back in the dataframe “CO”.

# Calculate the whole-county populations of each group:
TotCo <- sum(CO$TotTr)
WhtCo <- sum(CO$WhtTr)
BlkCo <- sum(CO$BlkTr)
NtvCo <- sum(CO$NtvTr)
AsnCo <- sum(CO$AsnTr)
ApiCo <- sum(CO$ApiTr)
OthCo <- sum(CO$OthTr)
MltCo <- sum(CO$MltTr)
LatCo <- sum(CO$LatTr)

Calculate the Index of Dissimilarity (D)

Now that we have the total populations for each group, we can calculate the Indices of Dissimilarity(D). D is measured between each group, so there are 28 different pairings. Many of these will not be meaningful, if the proportion or absolute population of a group is tiny. See Blanchard, T. (2007), “Conservative Protestant Congregations and Racial Residential Segregation” for further explanation of this limitation.

# Create scalars of Dissimilarity scores (D), all combinations
D_Wht_Blk = .5*sum(abs(CO$WhtTr / WhtCo - CO$BlkTr / BlkCo))
D_Wht_Ntv = .5*sum(abs(CO$WhtTr / WhtCo - CO$NtvTr / NtvCo))
D_Wht_Asn = .5*sum(abs(CO$WhtTr / WhtCo - CO$AsnTr / AsnCo))
D_Wht_Api = .5*sum(abs(CO$WhtTr / WhtCo - CO$ApiTr / ApiCo))
D_Wht_Oth = .5*sum(abs(CO$WhtTr / WhtCo - CO$OthTr / OthCo))
D_Wht_Mlt = .5*sum(abs(CO$WhtTr / WhtCo - CO$MltTr / MltCo))
D_Wht_Lat = .5*sum(abs(CO$WhtTr / WhtCo - CO$LatTr / LatCo))
D_Blk_Ntv = .5*sum(abs(CO$BlkTr / BlkCo - CO$NtvTr / NtvCo))
D_Blk_Asn = .5*sum(abs(CO$BlkTr / BlkCo - CO$AsnTr / AsnCo))
D_Blk_Api = .5*sum(abs(CO$BlkTr / BlkCo - CO$ApiTr / ApiCo))
D_Blk_Oth = .5*sum(abs(CO$BlkTr / BlkCo - CO$OthTr / OthCo))
D_Blk_Mlt = .5*sum(abs(CO$BlkTr / BlkCo - CO$MltTr / MltCo))
D_Blk_Lat = .5*sum(abs(CO$BlkTr / BlkCo - CO$LatTr / LatCo))
D_Ntv_Asn = .5*sum(abs(CO$NtvTr / NtvCo - CO$AsnTr / AsnCo))
D_Ntv_Api = .5*sum(abs(CO$NtvTr / NtvCo - CO$ApiTr / ApiCo))
D_Ntv_Oth = .5*sum(abs(CO$NtvTr / NtvCo - CO$OthTr / OthCo))
D_Ntv_Mlt = .5*sum(abs(CO$NtvTr / NtvCo - CO$MltTr / MltCo))
D_Ntv_Lat = .5*sum(abs(CO$NtvTr / NtvCo - CO$LatTr / LatCo))
D_Asn_Api = .5*sum(abs(CO$AsnTr / AsnCo - CO$ApiTr / ApiCo))
D_Asn_Oth = .5*sum(abs(CO$AsnTr / AsnCo - CO$OthTr / OthCo))
D_Asn_Mlt = .5*sum(abs(CO$AsnTr / AsnCo - CO$MltTr / MltCo))
D_Asn_Lat = .5*sum(abs(CO$AsnTr / AsnCo - CO$LatTr / LatCo))
D_Api_Oth = .5*sum(abs(CO$ApiTr / ApiCo - CO$OthTr / OthCo))
D_Api_Mlt = .5*sum(abs(CO$ApiTr / ApiCo - CO$MltTr / MltCo))
D_Api_Lat = .5*sum(abs(CO$ApiTr / ApiCo - CO$LatTr / LatCo))
D_Oth_Mlt = .5*sum(abs(CO$OthTr / OthCo - CO$MltTr / MltCo))
D_Oth_Lat = .5*sum(abs(CO$OthTr / OthCo - CO$LatTr / LatCo))
D_Mlt_Lat = .5*sum(abs(CO$MltTr / MltCo - CO$LatTr / LatCo))

In Contra Costa in 2010, by far the highest D score is between non-Latino Whites and Blacks:

 D_Wht_Blk
[1] 0.6066957

According to Massey & Denton (1993), that would just barely qualify as a high level of segregation. But for a California county, what stands out is how much higher it is than the other inter-group dissimilarity scores. To make this easier to analyze, I am going to concatenate the scores into a single named vector:

# Use Concatenate c() to create a vector that includes all the D scores:
DscoreVal <- c(D_Wht_Blk,D_Wht_Ntv,D_Wht_Asn,D_Wht_Api,D_Wht_Oth,
               D_Wht_Mlt,D_Wht_Lat,D_Blk_Ntv,D_Blk_Asn,D_Blk_Api,
               D_Blk_Oth,D_Blk_Mlt,D_Blk_Lat,D_Ntv_Asn,D_Ntv_Api,
               D_Ntv_Oth,D_Ntv_Mlt,D_Ntv_Lat,D_Asn_Api,D_Asn_Oth,
               D_Asn_Mlt,D_Asn_Lat,D_Api_Oth,D_Api_Mlt,D_Api_Lat,
               D_Oth_Mlt,D_Oth_Lat,D_Mlt_Lat)

This command provides another valuable lesson in R syntax. I wanted to put 28 variables into one variable, and that means a lot of text in the command. How do you structure a command that will run more than 80 characters wide? One default tactic is to leave the operation open at the end of the line-break. In this case, I left a comma at the end of the line, and I did not close the parentheses until the end. With this syntax, R will not run just the first line as the whole command; it will interpret all six lines as part of a single command.

Now I can analyze all the Dissimilarity-scores as a set of values:

summary(DscoreVal)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2228  0.2986  0.3560  0.3600  0.4042  0.6067 
sd(DscoreVal)
[1] 0.08757136

Analysis: What this tells me is how unusually high the White-Black D-score really is. The standard deviation is about 0.09, and the mean is 0.36; so a score of 0.61 is more than 2.5 standard deviations above the mean level of inter-group segregation in the county.

…and now for a little more management of this data:

You can gather named vectors of data into a table, using the data.frame() command. Before doing that, I want to create another vector that includes the names of each score. To do this, I use the concatenate function in a slightly different way:

# concat a vector that includes the NAMES of all the D scores:
DscoreNom <- c("D_Wht_Blk","D_Wht_Ntv","D_Wht_Asn","D_Wht_Api",
               "D_Wht_Oth","D_Wht_Mlt","D_Wht_Lat","D_Blk_Ntv",
               "D_Blk_Asn","D_Blk_Api","D_Blk_Oth","D_Blk_Mlt",
               "D_Blk_Lat","D_Ntv_Asn","D_Ntv_Api","D_Ntv_Oth",
               "D_Ntv_Mlt","D_Ntv_Lat","D_Asn_Api","D_Asn_Oth",
               "D_Asn_Mlt","D_Asn_Lat","D_Api_Oth","D_Api_Mlt",
               "D_Api_Lat","D_Oth_Mlt","D_Oth_Lat","D_Mlt_Lat")

By putting double-quotes around each variable- name, I am asking R to compile a string of these names, rather than the values which each name represents.

Now we can create a table in which each of these variables represents a column:

Dtable <- data.frame(name=DscoreNom,score=DscoreVal)

Notice that if I had just entered Dtable <- data.frame(DscoreNom,DscoreVal), each variable would have been included as a row in the table, not a column.
Keep in mind that you can still analyse the data in this table, but with a slightly different syntax:

summary(Dtable$score)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2228  0.2986  0.3560  0.3600  0.4042  0.6067

Finally, the advantage of gathering this data into a table is that it is relatively simple to export/backup the data by writing it out as a CSV file:

write.csv(Dtable,"Dscores_141222.csv")

Isolation (I) scores

Using the total population of the county (TotCo), we can also calculate the Isolation Score:

# Isolation scores for each group
I_Wht = sum((CO$WhtTr / WhtCo) * (CO$WhtTr / TotCo), na.rm=T)
I_Blk = sum((CO$BlkTr / BlkCo) * (CO$BlkTr / TotCo), na.rm=T)
I_Ntv = sum((CO$NtvTr / NtvCo) * (CO$NtvTr / TotCo), na.rm=T)
I_Asn = sum((CO$AsnTr / AsnCo) * (CO$AsnTr / TotCo), na.rm=T)
I_Api = sum((CO$ApiTr / ApiCo) * (CO$ApiTr / TotCo), na.rm=T)
I_Oth = sum((CO$OthTr / OthCo) * (CO$OthTr / TotCo), na.rm=T)
I_Mlt = sum((CO$MltTr / MltCo) * (CO$MltTr / TotCo), na.rm=T)
I_Lat = sum((CO$LatTr / LatCo) * (CO$LatTr / TotCo), na.rm=T)