Chapter 8 Spatial Data Visualization and Analysis
8.1 Overview
Recall, a data structure is a format for organizing and storing data. The structure is designed so that data can be accessed and worked with in specific ways. Statistical software and programming languages have methods (or functions) designed to operate on different kinds of data structures.
This chapter focuses on spatial data structures and some of the R functions that work with these data. Spatial data comprise values associated with locations, such as temperature data at a given latitude, longitude, and perhaps elevation. Spatial data are typically organized into vector or raster data types. (See Figure 8.1).
- Vector data represent features such as discrete points, lines, and polygons.
- Raster data represent surfaces as a rectangular matrix of square cells or pixels.
Whether or not you use vector or raster data depends on the type of problem, the type of maps you need, and the data source. Each data structure has strengths and weaknesses in terms of functionality and representation. As you gain more experience working with spatial data, you will be able to determine which structure to use for a particular application.
There is a large set of R packages available for working with spatial (and space-time) data. These packages are described in the Cran Task View: Analysis of Spatial Data. The CRAN task view attempts to organize the various packages into categories, such as Handling spatial data, Reading and writing spatial data, Visualization, and Disease mapping and areal data analysis, so users can quickly identify package options given their project needs.
Exploring the extent of the excellent spatial data tools available in R is beyond the scope of this book. Rather, we would point you to subject texts like Applied Spatial Data Analysis with R by Bivand, Pebesma, and Gomez-Rubio (2013) (available for free via the MSU library system), and numerous online tutorials on pretty much any aspect of spatial data analysis with R. These tools make R a full-blown Geographic Information System (GIS) capable of spatial data manipulation and analysis on par with commercial GIS systems such as ESRI’s ArcGIS.
8.1.1 Some Spatial Data Packages
This chapter will focus on a few R packages for manipulating and visualizing spatial data. Specifically we will touch on the following packages
sp
: spatial data structures and methodsrgdal
: interface to the C/C++
spatial data Geospatial Data Abstraction Libraryggmap
: extendsggplot2
language to handle spatial dataleaflet
: generates dynamic online maps
8.2 Motivating Data
We motivate the topics introduced in this chapter using some forestry data from the Penobscot Experimental Forest (PEF) located in Maine. The PEF is a long-term experimental forest that is used to understand the effects of silviculture (i.e., science of tending trees) treatments on forest growth and composition. The PEF is divided into non-overlapping management units that receive different harvesting treatments. Within each management unit is a series of observation point locations (called forest inventory plots) where forest variables have been measured. Ultimately, we want to summarize the inventory plots measurements by management unit and map the results.
8.3 Reading Spatial Data into R
Spatial data come in a variety of file formats. Examples of popular vector file formats for points, lines, and polygons, include ESRI’s shapefile and open standard GeoJSON. Common raster file formats include GeoTIFF and netCDF43.
The rgdal
function readOGR
will read a large variety of vector data file formats (there is also a writeOGR()
for writing vector data files). Raster data file formats can be read using the rgdal
function readGDAL
(yup, also a writeGDAL()
) or read functions in the raster
package. All of these functions automatically cast the data into an appropriate R spatial data object (i.e., data structure), which are defined in the sp
or raster
packages. Table 8.1 provides an abbreviated list of these R spatial objects44. The Without attributes column gives the sp
package’s spatial data object classes for points, lines, polygons, and raster pixels that do not have data associated with the spatial objects (i.e., without attributes in GIS speak). DataFrame
is appended to the object class name once data, in the form of variables, are added to the spatial object.
Without Attributes | With Attributes | |
---|---|---|
Polygons | SpatialPolygons | SpatialPolygonsDataFrame |
Points | SpatialPoints | SpatialPointsDataFrame |
Lines | SpatialLines | SpatialLinesDataFrame |
Raster | SpatialGrid | SpatialGridDataFrame |
Raster | SpatialPixels | SpatialPixelsDataFrame |
Raster | RasterLayer | |
Raster | RasterBrick | |
Raster | RasterStack |
You can create your own spatial data objects in R. Below, for example, we create a SpatialPoints
object consisting of four points. Then add some data to the points to make it a SpatialPointsDataFrame
.
library(sp)
library(dplyr)
c(3,2,5,6)
x <- c(2,5,6,7)
y <-
cbind(x, y)
coords <-
SpatialPoints(coords)
sp.pts <-
class(sp.pts)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"
data.frame(var.1 = c("a", "b", "c", "d"), var.2 = 1:4)
some.data <-
SpatialPointsDataFrame(sp.pts, some.data)
sp.pts.df <-
class(sp.pts.df)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
If, for example, you already have a data frame that includes the spatial coordinate columns and other variables, then you can promote it to a SpatialPointsDataFrame
by indicating which columns contain point coordinates. You can extract or access the data frame associated with the spatial object using @data
. You can also access individual variables directly from the spatial object using $
or by name or column number to the right of the comma in [,]
(analogues to accessing variables in a data frame).
data.frame(x = c(3,2,5,6),
df <-y=c(2,5,6,7),
var.1 = c("a", "b", "c", "d"),
var.2 = 1:4)
class(df)
## [1] "data.frame"
#promote to a SpatialPointsDataFrame
coordinates(df) <- ~x+y
class(df)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
#access entire data frame
@data df
## var.1 var.2
## 1 a 1
## 2 b 2
## 3 c 3
## 4 d 4
class(df@data)
## [1] "data.frame"
#access columns directly
$var.1 df
## [1] "a" "b" "c" "d"
c("var.1","var.2")] df[,
## coordinates var.1 var.2
## 1 (3, 2) a 1
## 2 (2, 5) b 2
## 3 (5, 6) c 3
## 4 (6, 7) d 4
2] df[,
## coordinates var.2
## 1 (3, 2) 1
## 2 (2, 5) 2
## 3 (5, 6) 3
## 4 (6, 7) 4
#get the bounding box
bbox(df)
## min max
## x 2 6
## y 2 7
Here, the data frame df
is promoted to a SpatialPointsDataFrame
by indicating the column names that hold the longitude and latitude (i.e., x
and y
respectively) using the coordinates
function. Here too, the @data
is used to retrieve the data frame associated with the points. We also illustrate how variables can be accessed directly from the spatial object. The bbox
function is used to return the bounding box that is defined by the spatial extent of the point coordinates. The other spatial objects noted in Table 8.1 can be created, and their data accessed, in a similar way45.
More than often we find ourselves reading existing spatial data files into R. The code below uses the downloader
package to download all of the PEF data we’ll use in this chapter. The data are compressed in a single zip file, which is then extracted into the working directory using the unzip
function. A look into the PEF directory using list.files
shows nine files46. Those named MU-bounds.*
comprise the shapefile that holds the PEF’s management unit boundaries in the form of polygons. Like other spatial data file formats, shapefiles are made up of several different files (with different file extensions) that are linked together to form a spatial data object. The plots.csv
file holds the spatial coordinates and other information about the PEF’s forest inventory plots. The roads.*
shapefile holds roads and trails in and around the PEF.
library(downloader)
download("https://www.finley-lab.com/files/data/PEF.zip",
destfile="./PEF.zip", mode="wb")
unzip("PEF.zip", exdir = ".")
list.files("PEF")
## [1] "MU-bounds.dbf" "MU-bounds.prj" "MU-bounds.qpj"
## [4] "MU-bounds.shp" "MU-bounds.shx" "plots.csv"
## [7] "plots.dbf" "plots.prj" "plots.qpj"
## [10] "plots.shp" "plots.shx" "roads.dbf"
## [13] "roads.prj" "roads.shp" "roads.shx"
Next we read the MU-bounds shapefile into R using readOGR()
47 and explore the resulting mu
object. Notice that when we read a shapefile into R, we do not include a file extension with the shapefile name because a shapefile is always composed of multiple files.
library(rgdal)
readOGR("PEF", "MU-bounds") mu <-
## OGR data source with driver: ESRI Shapefile
## Source: "/home/jeffdoser/Dropbox/teaching/for875/textbook/PEF", layer: "MU-bounds"
## with 40 features
## It has 1 fields
When called, the readOGR
function provides a bit of information about the object being read in. Here, we see that it read the MU-bounds shapefile from PEF directory and the shapefile had 40 features (i.e., polygons) and 1 field (i.e., field is synonymous with column or variable in the data frame).
You can think of the resulting mu
object as a data frame where each row corresponds to a polygon and each column holds information about the polygon48. More specifically, the mu
object is a SpatialPolygonsDataFrame
.
class(mu)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
As illustrated using the made-up point data in the example above, you can access the data frame associated with the polygons using @data
.
class(mu@data)
## [1] "data.frame"
dim(mu@data)
## [1] 40 1
head(mu@data)
## mu_id
## 0 C15
## 1 C17
## 2 C16
## 3 C27
## 4 U18
## 5 U31
Above, a call to class()
confirms we have accessed the data frame, dim()
shows there are 40 rows (one row for each polygon) and one column, and head()
shows the first six values of the column named mu_id
. The mu_id
values are unique identifiers for each management unit polygon across the PEF.
8.4 Coordinate Reference Systems
One of the more challenging aspects of working with spatial data is getting used to the idea of a coordinate reference system. A coordinate reference system (CRS) is a system that uses one or more numbers, or coordinates, to uniquely determine the position of a point or other geometric element (e.g., line, polygon, raster). For spatial data, there are two common coordinate systems:
- Spherical coordinate system, such as latitude-longitude, often referred to as a geographic coordinate system.
- Projected coordinate system based on a map projection, which is a systematic transformation of the latitudes and longitudes that aims to minimize distortion occurring from projecting maps of the earth’s spherical surface onto a two-dimensional Cartesian coordinate plane. Projected coordinate systems are sometimes referred to as map projections.
There are numerous map projections49. One of the more frustrating parts of working with spatial data is that it seems like each data source you find offers its data in a different map projection and hence you spend a great deal of time reprojecting (i.e., transforming from one CRS to another) data into a common CRS such that they overlay correctly. Reprojecting is accomplished using the sp
package’s spTransform
function as demonstrated in Section 8.5.
In R, a spatial object’s CRS is accessed via the sp
package proj4string
function. The code below shows the current projection of mu
.
proj4string(mu)
## [1] "+proj=utm +zone=19 +datum=NAD83 +units=m +no_defs"
The cryptic looking string returned by proj4string()
is a set of directives understood by the proj.4 C library, which is part of sp
, and used to map geographic longitude and latitude coordinates into the projected Cartesian coordinates. Note that if you see a warning message you can safely ignore it. This CRS tells us the mu
object is in Universal Transverse Mercator (UTM) zone 19 coordinate system.50
8.5 Illustration using ggmap
Let’s begin by making a map of PEF management unit boundaries over top of a satellite image using the ggmap
package. Given an address, location, or bounding box, the ggmap
package’s get_map
function will query Google Maps, OpenStreetMap, Stamen Maps, or Naver Map servers for a user-specified map type. The get_map
function requires the location or bounding box coordinates be in a geographic coordinate system (i.e., latitude-longitude). This means we need to reproject mu
from UTM zone 19 to latitude-longitude geographic coordinates, which is defined by the '"proj=longlat +datum=WGS84"'
proj.4 string. As seen below, the first argument in spTransform
function is the spatial object to reproject and the second argument is a CRS object created by passing a proj.4 string into the CRS
function.
spTransform(mu, CRS("+proj=longlat +datum=WGS84"))
mu <-proj4string(mu)
## [1] "+proj=longlat +datum=WGS84 +no_defs"
Unfortunately, we cannot just feed the SpatialPolygonsDataFrame
mu
into ggplot
(perhaps some day soon this will possible). Rather, we need to first convert the SpatialPolygonsDataFrame
into a specially formatted data frame using the fortify
function that is part of the ggplot2
package51. The fortify
function will also need a unique identifier for each polygon specified using the region
argument, which for mu
is the mu_id
.
library(ggmap)
fortify(mu, region="mu_id")
mu.f <-head(mu.f)
## long lat order hole piece id group
## 1 -68.62 44.86 1 FALSE 1 C12 C12.1
## 2 -68.62 44.86 2 FALSE 1 C12 C12.1
## 3 -68.62 44.86 3 FALSE 1 C12 C12.1
## 4 -68.62 44.86 4 FALSE 1 C12 C12.1
## 5 -68.62 44.86 5 FALSE 1 C12 C12.1
## 6 -68.62 44.86 6 FALSE 1 C12 C12.1
Notice the id
column in the fortified version of mu
holds each polygon’s mu_id
value (this will be important later when we link data to the polygons).
Next, we query the satellite imagery used to underlay the management units (we’ll generally refer to this underlying map as the basemap). As of October 2018, Google requires you to set up a Google API account to download imagery. This is free, but it does require a credit card to obtain the API Key that is required to make the ggmap
package work. Because of the potential security risk of sharing keys, we do not show the key necessary to run the following code. Instead, you will need to create your own API key to run all the code in this chapter. See the page here for learning about how to use Google maps web services.
register_google(key = "YOUR-KEY-HERE")
bbox(mu)
mu.bbox <- apply(mu.bbox, 1, mean)
mu.centroid <-
get_map(location=mu.centroid, zoom=14, maptype = "satellite", source = "google")
basemap <-
ggmap(basemap) +
geom_polygon(data=mu.f, aes(x = long, y = lat, group=group),
fill=NA, size=0.2, color="orange")
While the call to get_map
can use a bounding box (mu.bbox
in code above) to define the basemap
extent, I’ve found that it’s easier to pass it a center point location (mu.centroid
note the use of the apply
function to find the mean in the easting and northing extents) and zoom
value to extract the desired extent. The resulting map looks pretty good! Take a look at the get_map
function manual page and try different options for maptype
(e.g., maptype="terrain"
).
Next we’ll add the forest inventory plots to the map. Begin by reading in the PEF forest inventory plot data held in “plots.csv”. Recall, foresters have measured forest variables at a set of locations (i.e., inventory plots) within each management unit. The following statement reads these data and displays the resulting data frame structure.
read.csv("PEF/plots.csv", stringsAsFactors=FALSE)
plots <-str(plots)
## 'data.frame': 451 obs. of 8 variables:
## $ mu_id : chr "U10" "U10" "U10" "U10" ...
## $ plot : int 11 13 21 22 23 24 31 32 33 34 ...
## $ easting : num 529699 529777 529774 529814 529850 ...
## $ northing : num 4966333 4966471 4966265 4966336 4966402 ...
## $ biomass_mg_ha : num 96.3 115.7 121.6 72 122.3 ...
## $ stems_ha : int 5453 2629 3385 7742 7980 10047 5039 5831 2505 7325 ...
## $ diameter_cm : num 4.8 6.9 6.1 3.1 4.7 1.6 4.1 5.2 5.7 3.3 ...
## $ basal_area_m2_ha: num 22 23.2 23 16.1 29.2 19.1 14.1 27.4 21.6 15 ...
In plots
each row is a forest inventory plot and columns are:
mu_id
identifies the management unit within which the plot is locatedplot
unique plot number within the management uniteasting
longitude coordinate in UTM zone 19northing
latitude coordinate in UTM zone 19biomass_mg_ha
tree biomass in metric ton (per hectare basis)stocking_stems_ha
number of tree (per hectare basis)diameter_cm
average tree diameter measured 130 cm up the tree trunkbasal_area_m2_ha
total cross-sectional area at 130 cm up the tree trunk (per hectare basis)
There is nothing inherently spatial about this data structure—it is simply a data frame. We make plots
into a spatial object by identifying which columns hold the coordinates. This is done below using the coordinates
function, which promotes the plots
data frame to a SpatialPointsDataFrame
.
coordinates(plots) <- ~easting+northing
class(plots)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
Although plots
is now a SpatialPointsDataFrame
, it does not know to which CRS the coordinates belong; hence, the NA
when proj4string(plots)
is called below. As noted in the plots
file description above, easting
and northing
are in UTM zone 19. This CRS is set using the second call to proj4string
below.
proj4string(plots)
## [1] NA
proj4string(plots) <- CRS("+proj=utm +zone=19 +datum=NAD83 +units=m
+no_defs +ellps=GRS80 +towgs84=0,0,0")
Now let’s reproject plots
to share a common CRS with mu
spTransform(plots, CRS("+proj=longlat +datum=WGS84")) plots <-
Note, because mu
is already in the projection we want for plots
, we could have replaced the second argument in the spTransform
call above with proj4string(mu)
and saved some typing.
We’re now ready to add the forest inventory plots to the existing basemap with management units. Specifically, let’s map the biomass_mg_ha
variable to show changes in biomass across the forest. No need to fortify plots
, ggplot
is happy to take geom_point
’s data
argument as a data frame (although we do need to convert plots
from a SpatialPointsDataFrame
to a data frame using the as.data.frame
function). Check out the scale_color_gradient
function in your favorite ggplot2
reference to understand how the color scale is set.
ggmap(basemap) +
geom_polygon(data=mu.f, aes(x = long, y = lat, group=group),
fill=NA, size=0.2, color="orange") +
geom_point(data=as.data.frame(plots),
aes(x = easting, y = northing, color=biomass_mg_ha)) +
scale_color_gradient(low="white", high="darkblue") +
labs(color = "Biomass (mg/ha)")
There is something subtle and easy to miss in the code above. Notice the aes
function arguments in geom_points
take geographic longitude and latitude, x
and y
respectively, from the points
data frame (but recall easting
and northing
were in UTM zone 19). This works because we applied spTransform
to reproject the points
SpatialPointsDataFrame
to geographic coordinates. sp
then replaces the values in easting
and northing
columns with the reprojected coordinate values when converting a SpatialPointsDataFrame
to a data frame via as.data.frame()
.
Foresters use the inventory plot measurements to estimate forest variables within management units, e.g., the average or total management unit biomass. Next we’ll make a plot with management unit polygons colored by average biomass_mg_ha
.
plots@data %>% group_by(mu_id) %>%
mu.bio <- summarize(biomass_mu = mean(biomass_mg_ha))
print(mu.bio)
## # A tibble: 33 × 2
## mu_id biomass_mu
## <chr> <dbl>
## 1 C12 124.
## 2 C15 49.9
## 3 C16 128.
## 4 C17 112.
## 5 C20 121.
## 6 C21 134.
## 7 C22 65.2
## 8 C23A 108.
## 9 C23B 153.
## 10 C24 126.
## # … with 23 more rows
Recall from Section 6.7.5 this one-liner can be read as “get the data frame from plots
’s SpatialPointsDataFrame
then group by management unit then make a new variable called biomass_mu
that is the average of biomass_mg_ha
and assign it to the mu.bio
tibble.”
The management unit specific biomass_mu
can now be joined to the mu
polygons using the common mu_id
value. Remember when we created the fortified version of mu
called mu.f
? The fortify
function region
argument was mu_id
which is the id
variable in the resulting mu.f
. This id
variable in mu.f
can be linked to the mu_id
variable in mu.bio
using dplyr
’s left_join
function as illustrated below.
head(mu.f, n=2)
## long lat order hole piece id group
## 1 -68.62 44.86 1 FALSE 1 C12 C12.1
## 2 -68.62 44.86 2 FALSE 1 C12 C12.1
left_join(mu.f, mu.bio, by = c('id' = 'mu_id'))
mu.f <-
head(mu.f, n=2)
## long lat order hole piece id group biomass_mu
## 1 -68.62 44.86 1 FALSE 1 C12 C12.1 123.7
## 2 -68.62 44.86 2 FALSE 1 C12 C12.1 123.7
The calls to head()
show the first few rows of mu.f
pre- and post-join. After the join, mu.f
includes biomass_mu
, which is used used below for geom_polygon
’s fill
argument to color the polygons accordingly.
ggmap(basemap) +
geom_polygon(data=mu.f, aes(x = long, y = lat,
group=group, fill=biomass_mu),
size=0.2, color="orange") +
scale_fill_gradient(low="white", high="darkblue",
na.value="transparent") +
labs(fill="Biomass (mg/ha)")
Let’s add the roads and some more descriptive labels as a finishing touch. The roads data include a variable called type
that identifies the road type. To color roads by type in the map, we need to join the roads
data frame with the fortified roads roads.f
using the common variable id
as a road segment specific identifier. Then geom_path
’s color
argument gets this type
variable as a factor to create road-specific color. The default coloring of the roads blends in too much with the polygon colors, so we manually set the road colors using the scale_color_brewer
function. The palette
argument in this function accepts a set of key words, e.g., "Dark2"
, that specify sets of diverging colors chosen to make map object difference optimally distinct (see, the manual page for scale_color_brewer
, http://colorbrewer2.org, and blog here.)52
readOGR("PEF", "roads") roads <-
## OGR data source with driver: ESRI Shapefile
## Source: "/home/jeffdoser/Dropbox/teaching/for875/textbook/PEF", layer: "roads"
## with 33 features
## It has 2 fields
spTransform(roads, proj4string(mu)) roads <-
## Warning in proj4string(mu): CRS object has comment, which is lost in output; in tests, see
## https://cran.r-project.org/web/packages/sp/vignettes/CRS_warnings.html
fortify(roads, region="id")
roads.f <- left_join(roads.f, roads@data, by = c('id' = 'id'))
roads.f <-
ggmap(basemap) +
geom_polygon(data=mu.f, aes(x = long, y = lat, group=group,
fill=biomass_mu),
size=0.2, color="orange") +
geom_path(data=roads.f, aes(x = long, y = lat,
group=group, color=factor(type))) +
scale_fill_gradient(low="white", high="darkblue",
na.value="transparent") +
scale_color_brewer(palette="Dark2") +
labs(fill="Biomass (mg/ha)", color="Road type", xlab="Longitude",
ylab="Latitude", title="PEF forest biomass")
## Warning: Removed 616 row(s) containing missing values
## (geom_path).
The second, and more cryptic, of the two warnings from this code occurs because some of the roads extend beyond the range of the map axes and are removed (nothing to worry about).
8.6 Illustration using leaflet
Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. As noted on the official R leaflet website, it’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as by GIS specialists like OpenStreetMap, Mapbox, and CartoDB.
The R leaflet website is an excellent resource to learn leaflet basics, and should serve as a reference to gain a better understanding of the topics we briefly explore below.
You create a leaflet map using these basic steps:
- Create a map by calling
leaflet()
- Add data layers to the map using layer functions such as,
addTiles()
,addMarkers()
,addPolygons()
,addCircleMarkers()
,addPolylines()
,addRasterImage()
and otheradd...
functions - Repeat step 2 to add more layers to the map
- Print the map to display it
Here’s a brief example.
library(leaflet)
leaflet() %>%
m <- addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(lng=-84.482004, lat=42.727516,
popup="<b>Here I am!</b>") # Add a clickable marker
# Print the map m
There are a couple things to note in the code. First, we use the pipe operator %>%
just like in dplyr
functions. Second, the popup
argument in addMarkers()
takes standard HTML and clicking on the marker makes the text popup. Third, the html version of this text provides the full interactive, dynamic map, so we encourage you to read and interact with the html version of this textbook for this section. The PDF document will simply display a static version of this map and will not do justice to how awesome leaflet
truly is!
As seen in the leaflet()
call above, the various add...
functions can take longitude (i.e., lng
) and latitude (i.e., lat
). Alternatively, these functions can extract the necessary spatial information from sp
objects, e.g., Table 8.1, when passed to the data argument (which greatly simplifies life compared with map making using ggmap
).
8.7 Subsetting Spatial Data
You can imagine that we might want to subset spatial objects to map specific points, lines, or polygons that meet some criteria, or perhaps extract values from polygons or raster surfaces at a set of points or geographic extent. These, and similar types, of operations are easy in R (as long as all spatial objects are in a common CRS). Recall from Chapter 4 how handy it is to subset data structures, e.g., vectors and data frames, using the []
operator and logical vectors? Well it’s just as easy to subset spatial objects, thanks to the authors of sp
, raster
, and other spatial data packages.
8.7.1 Fetching and Cropping Data using raster
In order to motivate our exploration of spatial data subsetting and to illustrate some useful functionality of the raster
package, let’s download some elevation data for the PEF. The raster
package has a rich set of functions for manipulating raster data as well as functions for downloading data from open source repositories. We’ll focus on the package’s getData
function, which, given a location in geographic longitude and latitude or location name, will download data from GADM, Shuttle Radar Topography Mission, Global Climate Data, and other sources commonly used in spatial data applications.
Let’s download SRTM surface elevation data for the PEF, check the resulting object’s class and CRS, and display it using the raster
package’s image
function along with the PEF forest inventory plots.
library(raster)
as.data.frame(plots) %>%
pef.centroid <- summarize(mu.x = mean(easting), mu.y = mean(northing))
getData("SRTM", lon = pef.centroid[, 1], lat = pef.centroid[, 2]) srtm <-
class(srtm)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
proj4string(srtm)
## [1] "+proj=longlat +datum=WGS84 +no_defs"
image(srtm)
plot(plots, add = TRUE)
A few things to notice in the code above. First the getData
function needs the longitude lon
and latitude lat
to identify which SRTM raster tile to return (SRTM data come in large raster tiles that cover the globe). As usual, look at the getData
function documentation for a description of the arguments. To estimate the PEF’s centroid coordinate, we averaged the forest inventory plots’ latitude and longitude then assigned the result to pef.centroid
(for fun, I used the apply
function to do the same task when creating mu.centroid
earlier in the chapter). Second, srtm
is in a longitude latitude geographic CRS (same as our other PEF data). Finally, the image shows SRTM elevation along the coast of Maine, the PEF plots are those tiny specks of black in the northwest quadrant, and the white region of the image is the Atlantic Ocean.
Okay, this is a start, but it would be good to crop the SRTM image to the PEF’s extent. This is done using raster
’s crop
function. This function can use many different kinds of spatial objects in the second argument to calculate the extent at which to crop the object in the first argument. Here, I set mu
as the second argument and save the resulting SRTM subset over the larger tile (the srtm
object).
crop(srtm, mu)
srtm <-
image(srtm)
plot(mu, add = TRUE)
The crop
is in effect doing a spatial subsetting of the raster data. We’ll return to the srtm
data and explore different kinds of subsetting in the subsequent sections.
8.7.2 Logical, Index, and Name Subsetting
As promised, we can subset spatial objects using the []
operator and a logical, index, or name vector. The key is that sp
objects behave like data frames, see Section 4.4. A logical or index vector to the left of the comma in [,]
accesses points, lines, polygons, or pixels. Similarly, a logical, index, or name vector to the right of the comma accesses variables.
For example, say we want to map forest inventory plots with more than 10,000 stems per hectare, stems_ha
(the min()
was added below to double check that the subset worked correctly.
min(plots$stems_ha)
## [1] 119
.10k <- plots[plots$stems_ha > 10000,]
plots
min(plots.10k$stems_ha)
## [1] 10008
You can also add new variables to the spatial objects.
$diameter_in <- plots$diameter_cm/2.54
plots
head(plots)
## mu_id plot biomass_mg_ha stems_ha diameter_cm
## 1 U10 11 96.35 5453 4.8
## 2 U10 13 115.70 2629 6.9
## 3 U10 21 121.58 3385 6.1
## 4 U10 22 71.97 7742 3.1
## 5 U10 23 122.26 7980 4.7
## 6 U10 24 85.85 10047 1.6
## basal_area_m2_ha diameter_in
## 1 22.0 1.8898
## 2 23.2 2.7165
## 3 23.0 2.4016
## 4 16.1 1.2205
## 5 29.2 1.8504
## 6 19.1 0.6299
8.7.3 Spatial Subsetting and Overlay
A spatial overlay retrieves the indexes or variables from object \(A\) using the location of object \(B\). With some spatial objects this operation can be done using the []
operator. For example, say we want to select and map all management units in mu
, i.e., \(A\), that contain plots with more than 10,000 stems per ha, i.e., \(B\).
.10k <- mu[plots.10k,]## A[B,]
mu
.10k.f <- fortify(mu.10k, region="mu_id")
mu
ggmap(basemap) +
geom_polygon(data=mu.10k.f, aes(x = long, y = lat, group=group),
fill="transparent", size=0.2, color="orange") +
geom_point(data=as.data.frame(plots.10k),
aes(x = easting, y = northing), color="white")
More generally, however, the over
function offers consistent overlay operations for sp
objects and can return either indexes or variables from object \(A\) given locations from object \(B\), i.e., over(B, A)
or, equivalently, B%over%A
. The code below duplicates the result from the preceding example using over
.
.10k <- mu[mu$mu_id %in% unique(over(plots.10k, mu)$mu_id),] mu
Yes, this requires more code but over
provides a more flexible and general purpose function for overlays on the variety of sp
objects. Let’s unpack this one-liner into its five steps.
over(plots.10k, mu)
i <- i$mu_id
ii <- unique(ii)
iii <- mu$mu_id %in% iii
iv <- mu[iv,] v <-
- The
over
function returns variables formu
’s polygons that coincide with thenrow(plots.10k@data)
points inplots.10k
. No points fall outside the polygons and the polygons do not overlap, so i should be a data frame withnrow(plots.10k@data)
rows. If polygons did overlap and a point fell within the overlap region, then variables for the coinciding polygons are returned. - Select the unique
mu
identifiermu_id
(this step is actually not necessary here becausemu
only has one variable). - Because some management units contain multiple plots there will be repeat values of
mu_id
in ii, so apply theunique
function to get rid of duplicates. - Use the
%in%
operator to create a logical vector that identifies which polygons should be in the final map. - Subset
mu
using the logical vector created in iv.
Now let’s do something similar using the srtm
elevation raster. Say we want to map elevation along trails, winter roads, and gravel roads across the PEF. We could subset srtm
using the roads
SpatialLinesDataFrame
; however, mapping the resulting pixel values along the road segments using ggmap
requires a bit more massaging. So, to simplify things for this example, roads
is first coerced into a SpatialPointsDataFrame
called roads.pts
that is used to extract spatially coinciding srtm
pixel values which themselves are coerced from raster
’s RasterLayer
to sp
’s SpatialPixelsDataFrame
called srtm.sp
so that we can use the over
function. We also choose a different basemap just for fun.
roads[roads$type %in% c("Trail", "Winter", "Gravel"),]
hikes <-
as(hikes, "SpatialPointsDataFrame")
hikes.pts <- as(srtm, "SpatialPixelsDataFrame")
srtm.sp <-
$srtm <- over(hikes.pts, srtm.sp)
hikes.pts
get_map(location=mu.centroid, zoom=14,
basemap <-maptype = "terrain", source = "google")
srtm@data@values[1:length(hikes.pts)]
color.vals <-
ggmap(basemap) +
geom_point(data=as.data.frame(hikes.pts),
aes(x = coords.x1, y = coords.x2, color = color.vals)) +
scale_color_gradient(low="green", high="red") +
labs(color = "Hiking trail elevation\n(m above sea level)",
xlab="Longitude", ylab="Latitude")
## Warning: Removed 483 rows containing missing values
## (geom_point).
In the call to geom_point
above, coords.x1
coords.x2
are the default names given to longitude and latitude, respectively, when sp
coerces hikes
to hikes.pts
. These points represent the vertices along line segments. I create the vector color.vals
that contains the values from srtm
that I use in the map argument color
.
Overlay operations involving lines and polygons over polygons require the rgeos
package which provides an interface to the Geometry Engine - Open Source (GEOS) C++
library for topology operations on geometries. We’ll leave it to you to explore these capabilities.
8.7.4 Spatial Aggregation
We have seen aggregation operations before when using dplyr
’s summarize
function. The summarize
function is particularly powerful when combined with group_by()
, which can apply a function specified in summarize()
to a variable partitioned using a grouping variable. The aggregate
function in sp
works in a similar manner, except groups are delineated by the spatial extent of a thematic variable. In fact, the work we did to create mu.bio
using dplyr
functions can be accomplished with aggregate()
. Using aggregate()
will, however, require a slightly different approach for joining the derived average biomass_mg_ha
to the fortified mu
. This is because the aggregate
function will apply the user specified function to all variables in the input object, which, in our case, results in an NA
for the linking variable mu_id
as demonstrated below.
aggregate(plots[,c("mu_id","biomass_mg_ha")], by=mu, FUN=mean)
mu.ag <-
head(mu.ag@data, n=2)
## mu_id biomass_mg_ha
## 0 <NA> 49.86
## 1 <NA> 112.17
With mu_id
rendered useless, we do not have a variable that uniquely identifies each polygon for use in fortify
’s region
argument; hence no way to subsequently join the unfortified and fortified versions of mu.bio.ag
. Here’s the work around. If the region
is not specified, fortify()
uses an internal unique polygon ID that is part of the sp
data object and accessed via row.names()
53 So, the trick is to add this unique polygon ID to the aggregate()
output prior to calling fortify()
as demonstrated below.
$id <- row.names(mu.ag)
mu.ag
fortify(mu.ag) mu.ag.f <-
## Regions defined for each Polygons
left_join(mu.ag.f, mu.ag@data) mu.ag.f <-
## Joining, by = "id"
ggmap(basemap) +
geom_polygon(data=mu.ag.f, aes(x = long, y = lat,
group=group, fill=biomass_mg_ha),
size=0.2, color="orange") +
scale_fill_gradient(low="white", high="darkblue",
na.value="transparent") +
labs(fill="Biomass (mg/ha)")
The aggregate()
function will work with all sp
objects. For example let’s map the variance of pixel values in srtm.sp
by management unit. Notice that aggregate()
is happy to take a user-specified function for FUN
.
aggregate(srtm.sp, by=mu,
mu.srtm <-FUN=function(x){sqrt(var(x))})
$id <- row.names(mu.srtm)
mu.srtm
fortify(mu.srtm) mu.srtm.f <-
## Regions defined for each Polygons
left_join(mu.srtm.f, mu.srtm@data) mu.srtm.f <-
## Joining, by = "id"
ggmap(basemap) +
geom_polygon(data=mu.srtm.f, aes(x = long, y = lat, group=group,
fill=srtm_23_04),
size=0.2, color="orange") +
scale_fill_gradient(low="green", high="red") +
labs(fill = "Elevation standard deviation\n(m above sea level)",
xlab="Longitude", ylab="Latitude")
8.8 Where to go from here
This chapter just scratches the surface of R’s spatial data manipulation and visualization capabilities. The basic ideas we presented here should allow you to take a deeper look into sp
, rgdal
, rgeos
, ggmap
, leaflet
, and a myriad of other excellent user-contributed R spatial data packages. A good place to start is with Edzer Pebesma’s excellent vignette on the use of the map overlay and spatial aggregation, available here, as well as Applied Spatial Data Analysis with R by .
References
Bivand, Roger S., Edzer Pebesma, and Virgilio Gomez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY.
Wegmann. 2010. “Raster Vector Tikz.png.” 2010. https://commons.wikimedia.org/wiki/File:Raster_vector_tikz.png.
A longer list of spatial data file formats is available at https://en.wikipedia.org/wiki/GIS_file_formats.↩︎
A more complete list of the
sp
package’s spatial data classes and methods is detailed in the package’s vignette https://cran.r-project.org/web/packages/sp/vignettes/intro_sp.pdf.↩︎This cheatsheet http://www.maths.lancs.ac.uk/~rowlings/Teaching/UseR2012/cheatsheet.html written by Barry Rowlingson is a useful reference http://www.maths.lancs.ac.uk/~rowlings/Teaching/UseR2012/cheatsheet.html↩︎
The
list.files
function does not read data into R; it simply prints the contents of a directory.↩︎The authors of the
rgdal
library decided to have some information about the version of GDAL and other software specifics be printed when the library is loaded. Don’t let it distract you.↩︎Much of the actual spatial information is hidden from you in other parts of the data structure, but is available if you ask nicely for it (see subsequent sections).↩︎
See partial list of map projections at https://en.wikipedia.org/wiki/List\_of\_map\_projections. See a humorous discussion of map projections at http://brilliantmaps.com/xkcd/.↩︎
If you start dealing with a lot of spatial data and reprojecting, http://spatialreference.org is an excellent resources for finding and specifying coordinate reference systems.↩︎
ggmap
depends onggplot2
soggplot2
will be automatically loaded when you calllibrary(ggmap)
.↩︎Install the
RColorBrewer
package and runlibrary(RColorBrewer); display.brewer.all()
to get a graphical list of available palettes.↩︎With other data, there is a chance the row names differ from the unique polygon IDs. Therefore a more reliable approach to getting a unique ID is to use
sapply(slot(mu.ag, 'polygons'), function(x) slot(x, 'ID'))
, but replacemu.ag
with yourSpatialPolygonsDataFrame
. Also, this approach will work with othersp
objects in the right column of Table 8.1.↩︎