USP514: Basic GIS – UrbanPolicy.net

Last week I briefly showed you what a first session in Quantum GIS looks like. Now I have revised the data to make it a little easier to start with. To begin, get to a place with a fast internet connection. Then download this 53-megabyte MV_GIS zipped archive. This time, I bundled all the shapefiles and a Google Earth image together as one Zip archive, so when you unzip it, all the files are organized together in one sensible arrangement. If you double-click on the MV_GIS.qgs file within the MV_GIS folder, QGIS will (probably) start automatically and begin complaining about the location of missing files. You will need to browse and find them again. Why? Because when QGIS is looking for the Mountain View city boundary shapefile, it is looking for this:

/home/pietro/Documents/chron/2015/USP514_SustainableDevt_Sp15/MV_GIS/bdy/MV_bdy.shp

However, you now have the file on your computer. The directory folder names, starting from the top, are almost certainly very different. They might be:

/killercomputer/Users/Joe/Documents/USP514/MV_GIS/bdy/MV_bdy.shp

In which case, only the last part of the path (in boldface) is the same; that is the part you unzipped. So when QGIS asks you to browse to find the missing files, it is looking for the same shapefile name. But you have to show QGIS where the file is on your own system. You will need to to that with every file, and then you will have Mountain View loaded in the same way I have it on my system. If all succeeds, it should look something like the following screenshot:

screenshot of QGIS with a series of Mountain View files loaded.

If you are not able to not able to ‘restore the paths’ to all the files, you can just start fresh and load them one-by-one on your own.

What is a “Geographic Information System”?

For those of you beginning to work with QGIS for the first time, please read this closely. A GIS is any system that keeps data linked with spatial information. This can be done by hand with maps, but today we generally mean a system of computer software and digital data. That is still an approximate answer, however, because there are many types of computer-based GIS. For example, if you use a mapping or routefinding program on your phone, you are using a type of GIS. Google Earth is also a GIS. Most of the data that Google Earth stores is imagery: satellite maps.

However, the “classic” GIS is a program which reads geo-referenced data files on your computer, files which can be analyzed as spatial databases. For example: County Tax Assessors keep files of every parcel in their county. They used to keep them on paper, then in spreadsheets, then in databases. But in all those formats, the user needs to memorize where a location is, in order to think about what is adjacent and what is far from that address. However, if the database is linked to spatial coordinates for the address, then you can open up a map onscreen and see where that address is. This makes it much easier to visualize adjacent houses, roads, schools, etc.

Here are two common basic uses of GIS:
1. How many houses are within 300 feet of a flood-prone stream? You can specify a “buffer” area that extends out 300 feet on either side of the stream, and then select all the properties that overlap with that buffer. You can specify whether to include a parcel in that selection if it just partially overlaps the buffer zone, or you can limit the set of selected parcels to only those that fall completely within the buffer zone.
2. Which parts of Oakland are “food deserts”? You can load a point-file of all the fresh-food stores in Oakland, and then set a buffer around them with a half-mile radius. Why a half mile? Most people aren’t willing to walk further than that for basic shopping, and many poor people don’t have access to a car. Buffers around these points will create a pattern of overlapping circles, but places that are more than 1/2 mile from any grocery store will be immediately obvious. You can use that image as a presentation graphic by itself; it is pretty powerful. But you can also use the same method as in example #1 to select properties that fall outside of the overlapping buffers, out in the food desert.

Here is a more sophisticated use:
1. Fire departments need to figure out how long it takes to get from their home station to every location in their city. A simple circular buffer around each fire station is inaccurate, because some streets are clogged with traffic, and some are pretty fast. So the department needs a netowork analysis of route-times. To do this, you need to use a street-network file where every block of every street is classified in terms of route-speed. You might even need to record multiple route-speeds for different times of day or other changing conditions. Then you can ask the software to plot the three fastest routes from the fire-station to any given destination.
2. You might have noticed that the previous example sounds familiar, if you have ever used Google Maps or 511.org to check traffic conditions. Maybe the Bay Bridge is totally blocked, and it is worth going down to the San Mateo Bridge to cross the bay. Emergency-services were the first users of network-analysis, and they funded the development of this technology in the 1980s and 1990s. Now it is used every day by commuters.

Types of data in a GIS

Most of the data used in GIS today is vector data. It is precise, and the files are small, so you can email it or load it onto phones. There are three types of vector entities: points, lines, and polygons. Typically, small items (like fire hydrants) are mapped as point files. Linear items like streets, creeks, and power-lines are mapped as lines. Sometimes they are called polylines, because they are usually made up of multiple points, like one edge of a polygon. Things that take up land surface area are stored as polygons. These include boundaries of property, cities, counties, states, and countries. Also, lakes and oceans.

What abut projections?

Ah, yes; what a pain in the neck. One reason I am re-posting the whole megillah of Mountain View shapefiles is that I had to re-project them so they would all line up and load properly. As you know, the Earth is a spherical thing and for purposes of mapping, we need to squish it flat onto a screen or a printout. There are many debates about the best projections for various uses. For our purposes, we are going to use only one projection this semester. This morning I learned the hard way that we will NOT be using the Web Mercator/pseudo-Mercator projection. No. We will be using the Universal Transverse Mercator (UTM) projection. Just so’s you know, the whole Bay Area lies within Zone 10 North in the UTM system.

But wait–there’s more! You actually need to specify TWO components in order to project a map. The first component is the projection type–in our case, UTM. The second component is called either the datum or the geoid. What is the datum/geoid? It is a mathematical formula for the shape of the earth. The earth is not a simple sphere; oh no. It is oblate, a bit more like a potato. So we are not just projecting a portion of a sphere onto a flat plane. Instead, we are projecting an oblate spheroid onto a flat plane. There have been various versions of the formula for the earth-potato over time, and you will occasionally find maps based on these different standards. One common version is North American Datum of 1983, which is abbreviated as NAD83. However, the world-side standard (for now) is the World Geodetic System 1984 datum, a.k.a. WGS84.

So when you specify a Coordinate Reference System in a GIS, you need to specify both of these items. QGIS stores about a thousand different datum/projection combinations, and the one we will be using is: WGS 84 / UTM zone 10 N.

Another bit of information (I’m not sure if this will help, but here goes): the European Petroleum Survey Group started assigning simple number-codes to all the datum/projection combinations, and this set of numbers has been adopted as a standard known as EPSG codes. The EPSG code for the WGS 84 / UTM zone 10N projection is 32610. You might find it easier to use this number-string if you ever have to distinguish between different datum/projection combos.