Jonathan Callahan received his Ph.D. in Physical Chemistry from the University of Washington in 1993. After two years as a post-doc in a magnetic resonance imaging laboratory, Jonathan joined NOAA's Pacific Marine Environmental Laboratory to work on analysis and visualization software for oceanographic climate and model data. Since 2007 Jonathan has worked as an independent consultant for NOAA, NASA and the US EPA. His areas of expertise include: data management; data visualization; statistical analysis using R; interface design; data mining; web services architecture. Jonathan writes occasional articles on data management at Working With Data. Jonathan is a DZone MVB and is not an employee of DZone and has posted 12 posts at DZone. You can read more from them at their website. View Full User Profile

Using R: Working with Geospatial Data (and ggplot2)

05.12.2014
| 4237 views |
  • submit to reddit

This post was originally written by Bethany Yollin at the Working With Data blog.

This is a follow-up blog-post to an earlier introductory post by Steven Brey: Using R: Working with Geospatial Data. In this post, we’ll learn how to plot geospatial data in ggplot2. Why might we want to do this? Well, it’s really about your personal taste. Some people are willing to forfeit the fine-grained control of base graphics in exchange for the elegance of a ggplot. The choice is entirely yours.

To get started, we’ll need the ggplot2 package and some data! The dataset we’ll look at are shapefiles defining watersheds in Washington state.

NOTE:  Check the Department of Ecology GIS data page if any of the links are unavailable.

Loading libraries and data


# load libraries
library(ggplot2)
library(sp)
library(rgdal)
library(rgeos)

# create a local directory for the data
localDir <- "R_GIS_data"
if (!file.exists(localDir)) {
  dir.create(localDir)
}

# download and unzip the data
url <- "ftp://www.ecy.wa.gov/gis_a/inlandWaters/wria.zip"
file <- paste(localDir, basename(url), sep='/')
if (!file.exists(file)) {
  download.file(url, file)
  unzip(file,exdir=localDir)
}

# create a layer name for the shapefiles (text before file extension)
layerName <- "WRIA_poly"

# read data into a SpatialPolygonsDataFrame object
dataProjected <- readOGR(dsn=localDir, layer=layerName)

Transforming the data

Thus far, we haven’t done anything radically different than before, but in order to prepare the data for plotting in a ggplot, we’ll have to do a couple manipulations to the structure of the data. ggplot2 will only work with a data.frame object, so our object of class of SpatialPolygonsDataFrame will not be appropriate for plotting. Let’s write some code and discuss why this kind of transformation is necessary.

# add to data a new column termed "id" composed of the rownames of data
dataProjected@data$id <- rownames(dataProjected@data)

# create a data.frame from our spatial object
watershedPoints <- fortify(dataProjected, region = "id")

# merge the "fortified" data with the data from our spatial object
watershedDF <- merge(watershedPoints, dataProjected@data, by = "id")

# NOTE : If we so choose, we could have loaded the plyr library to use the
#      : join() function. For those familiar with SQL, this may be a more
#      : intuitive way to understand the merging of two data.frames. An
#      : equivalent SQL statement might look something like this:
#      : SELECT *
#      : FROM dataProjected@data
#      : INNER JOIN watershedPoints
#      : ON dataProjected@data$id = watershedPoints$id

# library(plyr)
# watershedDF <- join(watershedPoints, dataProjected@data, by = "id")

What does all this code mean and why do we need it? Let’s go through this line by line.

dataProjected@data$id <- rownames(dataProjected@data)

Here we are appending to the data an extra column called “id”. This column will contain the rownames so that we define an explicit relationship between the data and the polygons associated with that data.

watershedPoints <- fortify(dataProjected, region = "id")

Fortify? What does that even mean? A quick search on the internet will yield some helpful documentation. (See fortify.sp documentation). Basically, fortify take two arguments: model, which will consist of the SpatialPolygonsDataFrame object we wish to convert and region, the name of the variable by which to split regions. If all goes according to plan, some magic happens and we get a data.frame, just like we wanted… well, not quite. If you inspect this data.frame, you’ll notice it appears to be missing some critical information. Fret not! Using the relationship we created earlier, we can merge these two datasets with the following command.

watershedDF <- merge(watershedPoints, dataProjected@data, by = "id")

And viola! Now that we’ve created a data.frame that ggplot2 likes, we can begin plotting. Before we get to plotting, let’s take a quick look at this new data.frame we’ve created.

head(watershedDF)
##   id    long     lat order  hole piece group WRIA_ID WRIA_NR WRIA_AREA_
## 1  0 2377934 1352106     1 FALSE     1   0.1       1      62     789790
## 2  0 2378018 1352109     2 FALSE     1   0.1       1      62     789790
## 3  0 2382417 1352265     3 FALSE     1   0.1       1      62     789790
## 4  0 2387199 1352434     4 FALSE     1   0.1       1      62     789790
## 5  0 2387693 1352452     5 FALSE     1   0.1       1      62     789790
## 6  0 2392524 1352623     6 FALSE     1   0.1       1      62     789790
##        WRIA_NM Shape_Leng Shape_Area
## 1 Pend Oreille     983140   3.44e+10
## 2 Pend Oreille     983140   3.44e+10
## 3 Pend Oreille     983140   3.44e+10
## 4 Pend Oreille     983140   3.44e+10
## 5 Pend Oreille     983140   3.44e+10
## 6 Pend Oreille     983140   3.44e+10

Your first ggplot

If you’re coming from base graphics, some of the syntax may appear intimidating, but’s it’s all part of the “grammar of graphics” after which ggplot2 is modeled. You’ll notice a graph is built layer by layer, beginning with the data and the mapping of data to “aesthetic attributes”. We’ll add “geoms” or geometric objects and perhaps we’ll compute some statistics. We may also want to adjust the scale or coordinate system. All this can be added in a very modular fashion; this is one of the key advantages to using ggplot2. So, enough talk, let’s make a plot!

Continue reading here.

 


Published at DZone with permission of Jonathan Callahan, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)