All the material presented here, to the extent it is original, is available under CC-BY-SA. Parts build on joint tutorials with Edzer Pebesma.
I am running R 3.6.1, with recent update.packages()
.
needed <- c("sf", "mapview", "sp", "spdep", "elevatr", "raster", "stars", "classInt",
"RColorBrewer", "tmap", "tmaptools", "cartography", "ggplot2", "colorspace")
Script and data at https://github.com/rsbivand/geomed19-workshop/raw/master/Geomed19_I.zip. Download to suitable location, unzip and use as basis.
09:00-09:30 (20+10) Vector representation: sf replaces sp, rgeos and rgdal: vector
09:30-10:00 (20+10) Raster representation: stars/sf replace sp and rgdal: raster remains for now
10:00-10:40 (20+20) Visualization: tmap, cartography, ggplot2 and mapview
Spatial and spatio-temporal data are characterised by structures that distinguish them from typical tabular data
The geometric structures also have spatial reference system information, and can adhere to standards, which may ease geometrical operations
Satellite data and numerical model output data typically have regular grid structures, but these are often domain-specific
Computationally intensive tasks include interpolation, upsampling, focal operations, change of support and handling vector data with very detailed boundaries, as well as modelling using Bayesian inference
Spatial data typically combine position data in 2D (or 3D), attribute data and metadata related to the position data. Much spatial data could be called map data or GIS data. We collect and handle much more position data since global navigation satellite system (GNSS) like GPS came on stream 20 years ago, earth observation satellites have been providing data for longer.
Lapa et al. (2001) (Leprosy surveillance in Olinda, Brazil, using spatial analysis techniques) made available the underlying data set of Olinda census tracts (setor) in the Corrego Alegre 1970-72 / UTM zone 25S projection (EPSG:22525). Marilia Sá Carvalho and I wrote a tutorial in 2003/4 based on this data set; there is more information in the tutorial.
library(sf)
## Linking to GEOS 3.7.2, GDAL 3.0.1, PROJ 6.1.1
olinda <- st_read("data/olinda.gpkg")
## Reading layer `olinda' from data source `/home/rsb/presentations/geomed19-workshop/data/olinda.gpkg' using driver `GPKG'
## Simple feature collection with 243 features and 10 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 288955.3 ymin: 9111044 xmax: 298445.7 ymax: 9120528
## epsg (SRID): NA
## proj4string: +proj=utm +zone=25 +south +ellps=intl +towgs84=-205.57,168.77,-4.12,0,0,0,0 +units=m +no_defs
st_crs(olinda) <- 22525
library(mapview)
mapview(olinda)
Spatial vector data is based on points, from which other geometries are constructed. Vector data is often also termed object-based spatial data. The light rail tracks are 2D vector data. The points themselves are stored as double precision floating point numbers, typically without recorded measures of accuracy (GNSS provides a measure of accuracy). Here, lines are constructed from points.
all(st_is(olinda, "XY"))
## [1] TRUE
str(st_coordinates(st_geometry(olinda)[[1]]))
## num [1:13, 1:4] 295638 295599 295506 295599 295560 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:4] "X" "Y" "L1" "L2"
The sp package was a child of its time, using S4 formal classes, and the best compromise we then had of positional representation (not arc-node, but hard to handle holes in polygons). If we coerse byb
to the sp representation, we see the formal class structure. Input/output used OGR/GDAL vector drivers in the rgdal package, and topological operations used GEOS in the rgeos package.
library(sp)
str(slot(as(st_geometry(olinda), "Spatial"), "polygons")[[1]])
## Formal class 'Polygons' [package "sp"] with 5 slots
## ..@ Polygons :List of 1
## .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
## .. .. .. ..@ labpt : num [1:2] 295460 9120358
## .. .. .. ..@ area : num 79139
## .. .. .. ..@ hole : logi FALSE
## .. .. .. ..@ ringDir: int 1
## .. .. .. ..@ coords : num [1:13, 1:2] 295638 295599 295506 295599 295560 ...
## ..@ plotOrder: int 1
## ..@ labpt : num [1:2] 295460 9120358
## ..@ ID : chr "ID1"
## ..@ area : num 79139
The recent sf package bundles GDAL and GEOS (sp just defined the classes and methods, leaving I/O and computational geometry to other packages). sf used data.frame
objects with one (or more) geometry column for vector data. The representation follows ISO 19125 (Simple Features), and has WKT (text) and WKB (binary) representations (used by GDAL and GEOS internally). The drivers include PostGIS and other database constructions permitting selection, and WFS for server APIs (rgdal does too, but requires more from the user).
strwrap(st_as_text(st_geometry(olinda)[[1]]))
## [1] "POLYGON ((295637.8 9120506, 295598.5 9120446, 295505.7 9120324,"
## [2] "295599.4 9120260, 295560.2 9120190, 295519.9 9120125, 295313.9"
## [3] "9120294, 295331 9120477, 295352.3 9120494, 295386.4 9120520,"
## [4] "295420.5 9120528, 295637.8 9120507, 295637.8 9120506))"
sf
sf uses the OSGEO stack of software:
sf_extSoftVersion()
## GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
## "3.7.2" "3.0.1" "6.1.1" "true" "true"
Internal (R) and system dependencies look like this:
Package sf provides handling of feature data, where feature geometries are points, lines, polygons or combinations of those. It implements the full set of geometric functions described in the simple feature access standard, and some. The basic storage is very simple, and uses only base R types (list, matrix).
"sf"
objects, inheriting from "data.frame"
"sf"
objects have at least one simple feature geometry list-column of class "sfc"
[
"sfc"
geometry list-columns have a bounding box and a coordinate reference system as attribute, and a class attribute pointing out the common type (or "GEOMETRY"
in case of a mix)"sfg"
, and further classes pointing out dimension and typeStorage of simple feature geometry:
"POINT"
is a numeric vector"LINESTRING"
and "MULTIPOINT"
are numeric matrix, points/vertices in rows"POLYGON"
and "MULTILINESTRING"
are lists of matrices"MULTIPOLYGON"
is a lists of those"GEOMETRYCOLLECTION"
is a list of typed geometriesTo build from scratch:
p1 = st_point(c(3,5))
class(p1)
## [1] "XY" "POINT" "sfg"
p2 = st_point(c(4,6))
p3 = st_point(c(4,4))
pts = st_sfc(p1, p2, p3)
class(pts)
## [1] "sfc_POINT" "sfc"
sf = st_sf(a = c(3,2.5,4), b = c(1,2,4), geom = pts)
class(sf)
## [1] "sf" "data.frame"
sf
## Simple feature collection with 3 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 3 ymin: 4 xmax: 4 ymax: 6
## epsg (SRID): NA
## proj4string: NA
## a b geom
## 1 3.0 1 POINT (3 5)
## 2 2.5 2 POINT (4 6)
## 3 4.0 4 POINT (4 4)
lwgeom is useful for fixing geometry probems and other tasks.
library(osmdata)
## Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
bbox <- opq(bbox = 'olinda brazil')
(osm_water <- osmdata_sf(add_osm_feature(bbox, key = 'natural', value = 'water'))$osm_points)
## Simple feature collection with 1823 features and 4 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -34.96294 ymin: -8.067045 xmax: -34.82622 ymax: -7.954524
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## amenity ferry natural public_transport
## 100179264 <NA> <NA> <NA> <NA>
## 100179822 <NA> <NA> <NA> <NA>
## 100180313 <NA> <NA> <NA> <NA>
## 100182875 <NA> <NA> <NA> <NA>
## 100183508 <NA> <NA> <NA> <NA>
## 100183586 <NA> <NA> <NA> <NA>
## 100184595 <NA> <NA> <NA> <NA>
## 100187700 <NA> <NA> <NA> <NA>
## 100188502 <NA> <NA> <NA> <NA>
## 100191024 <NA> <NA> <NA> <NA>
## geometry
## 100179264 POINT (-34.89981 -8.066445)
## 100179822 POINT (-34.86795 -8.042321)
## 100180313 POINT (-34.8687 -8.043948)
## 100182875 POINT (-34.86651 -8.04296)
## 100183508 POINT (-34.8691 -8.036723)
## 100183586 POINT (-34.866 -8.04497)
## 100184595 POINT (-34.87256 -8.050785)
## 100187700 POINT (-34.87331 -8.048795)
## 100188502 POINT (-34.86052 -8.026084)
## 100191024 POINT (-34.8734 -8.047688)
We write in EPSG:31985, SIRGAS 2000 UTM zone 25S projected coordinates, rather than WGS84 geographical coordinates.
if (!dir.exists("output")) dir.create("output")
tf <- "output/water-pts.gpkg"
st_write(st_transform(osm_water, 31985), dsn=tf, layer="water-pts", driver="GPKG", layer_options="OVERWRITE=YES")
## Updating layer `water-pts' to data source `output/water-pts.gpkg' using driver `GPKG'
## options: OVERWRITE=YES
## features: 1823
## fields: 4
## geometry type: Point
st_layers(tf)
## Driver: GPKG
## Available layers:
## layer_name geometry_type features fields
## 1 water-pts Point 1823 4
water_pts <- st_read(tf)
## Reading layer `water-pts' from data source `/home/rsb/presentations/geomed19-workshop/output/water-pts.gpkg' using driver `GPKG'
## Simple feature collection with 1823 features and 4 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 283669.2 ymin: 9107804 xmax: 298699.3 ymax: 9120281
## epsg (SRID): NA
## proj4string: +proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
Here we transformed to SIRGAS 2000 UTM zone 25S (EPSG:31985), to match Landsat raster dat to be used later.
olinda_sirgas2000 <- st_transform(olinda, 31985)
st_crs(olinda_sirgas2000)
## Coordinate Reference System:
## EPSG: 31985
## proj4string: "+proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
if (!dir.exists("output")) dir.create("output")
st_write(olinda_sirgas2000, dsn="output/olinda_sirgas2000.gpkg", driver="GPKG", layer_options="OVERWRITE=YES")
## Updating layer `olinda_sirgas2000' to data source `output/olinda_sirgas2000.gpkg' using driver `GPKG'
## options: OVERWRITE=YES
## features: 243
## fields: 10
## geometry type: Polygon
Try doing this in pairs or small groups and discuss what is going on.
Look at the North Carolina SIDS data vignette for background:
library(sf)
nc <- st_read(system.file("shapes/sids.shp", package="spData")[1], quiet=TRUE)
st_crs(nc) <- "+proj=longlat +datum=NAD27"
row.names(nc) <- as.character(nc$FIPSNO)
head(nc)
## Simple feature collection with 6 features and 22 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
## epsg (SRID): 4267
## proj4string: +proj=longlat +datum=NAD27 +no_defs
## CNTY_ID AREA PERIMETER CNTY_ NAME FIPS FIPSNO CRESS_ID
## 37009 1825 0.114 1.442 1825 Ashe 37009 37009 5
## 37005 1827 0.061 1.231 1827 Alleghany 37005 37005 3
## 37171 1828 0.143 1.630 1828 Surry 37171 37171 86
## 37053 1831 0.070 2.968 1831 Currituck 37053 37053 27
## 37131 1832 0.153 2.206 1832 Northampton 37131 37131 66
## 37091 1833 0.097 1.670 1833 Hertford 37091 37091 46
## BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 east north x y
## 37009 1091 1 10 1364 0 19 164 176 -81.67 4052.29
## 37005 487 0 10 542 3 12 183 182 -50.06 4059.70
## 37171 3188 5 208 3616 6 260 204 174 -16.14 4043.76
## 37053 508 1 123 830 2 145 461 182 406.01 4035.10
## 37131 1421 9 1066 1606 3 1197 385 176 281.10 4029.75
## 37091 1452 7 954 1838 5 1237 411 176 323.77 4028.10
## lon lat L_id M_id geometry
## 37009 -81.48594 36.43940 1 2 MULTIPOLYGON (((-81.47276 3...
## 37005 -81.14061 36.52443 1 2 MULTIPOLYGON (((-81.23989 3...
## 37171 -80.75312 36.40033 1 2 MULTIPOLYGON (((-80.45634 3...
## 37053 -76.04892 36.45655 1 4 MULTIPOLYGON (((-76.00897 3...
## 37131 -77.44057 36.38799 1 4 MULTIPOLYGON (((-77.21767 3...
## 37091 -76.96474 36.38189 1 4 MULTIPOLYGON (((-76.74506 3...
The variables are largely count data, L_id
and M_id
are grouping variables. We can also read the original neighbour object:
library(spdep)
## Loading required package: spData
gal_file <- system.file("weights/ncCR85.gal", package="spData")[1]
ncCR85 <- read.gal(gal_file, region.id=nc$FIPSNO)
ncCR85
## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 492
## Percentage nonzero weights: 4.92
## Average number of links: 4.92
plot(st_geometry(nc), border="grey")
plot(ncCR85, st_centroid(st_geometry(nc), of_largest_polygon), add=TRUE, col="blue")
## Warning in st_centroid.sfc(st_geometry(nc), of_largest_polygon):
## st_centroid does not give correct centroids for longitude/latitude data
Now generate a random variable. Here I’ve set the seed - maybe choose your own, and compare outcomes with the people around you. With many people in the room, about 5 in 100 may get a draw that is autocorrelated when tested with Moran’s \(I\) (why?).
set.seed(1)
nc$rand <- rnorm(nrow(nc))
lw <- nb2listw(ncCR85, style="B")
moran.test(nc$rand, listw=lw, alternative="two.sided")
##
## Moran I test under randomisation
##
## data: nc$rand
## weights: lw
##
## Moran I statistic standard deviate = -0.82038, p-value = 0.412
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## -0.060754158 -0.010101010 0.003812269
Now we’ll create a trend (maybe try plotting LM
to see the trend pattern). Do we get different test outcomes by varying beta and sigma (alpha is constant).
nc$LM <- as.numeric(interaction(nc$L_id, nc$M_id))
alpha <- 1
beta <- 0.5
sigma <- 2
nc$trend <- alpha + beta*nc$LM + sigma*nc$rand
moran.test(nc$trend, listw=lw, alternative="two.sided")
##
## Moran I test under randomisation
##
## data: nc$trend
## weights: lw
##
## Moran I statistic standard deviate = 6.1163, p-value = 9.58e-10
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.368410659 -0.010101010 0.003829901
To get back to reality, include the trend in a linear model, and test again.
lm.morantest(lm(trend ~ LM, nc), listw=lw, alternative="two.sided")
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = trend ~ LM, data = nc)
## weights: lw
##
## Moran I statistic standard deviate = -0.50788, p-value = 0.6115
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I Expectation Variance
## -0.049567147 -0.018718176 0.003689356
So we can manipulate a missing variable mis-specification to look like spatial autocorrelation. Is this informative?
Sometimes we only have data on a covariate for aggregates of our units of observation. What happens when we “copy out” these aggregate values to the less aggregated observations? First we’ll aggregate nc
by LM
, then make a neighbour object for the aggregate units
aggLM <- aggregate(nc[,"LM"], list(nc$LM), head, n=1)
(aggnb <- poly2nb(aggLM))
## Neighbour list object:
## Number of regions: 12
## Number of nonzero links: 46
## Percentage nonzero weights: 31.94444
## Average number of links: 3.833333
plot(st_geometry(aggLM))
Next, draw a random sample for the aggregated units:
set.seed(1)
LMrand <- rnorm(nrow(aggLM))
Check that it does not show any spatial autocorrelation:
moran.test(LMrand, nb2listw(aggnb, style="B"))
##
## Moran I test under randomisation
##
## data: LMrand
## weights: nb2listw(aggnb, style = "B")
##
## Moran I statistic standard deviate = -0.24911, p-value = 0.5984
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.13101880 -0.09090909 0.02592483
Copy it out to the full data set, indexing on values of LM; the pattern now looks pretty autocorrelated
nc$LMrand <- LMrand[match(nc$LM, aggLM$LM)]
plot(nc[,"LMrand"])
which it is:
moran.test(nc$LMrand, listw=lw, alternative="two.sided")
##
## Moran I test under randomisation
##
## data: nc$LMrand
## weights: lw
##
## Moran I statistic standard deviate = 9.4301, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.575091242 -0.010101010 0.003850884
We’ve manipulated ourselves into a situation with abundant spatial autocorrelation at the level of the counties, but only by copying out from a more aggregated level. What is going on?
Spatial raster data is observed using rectangular (often square) cells, within which attribute data are observed. Raster data are very rarely object-based, very often they are field-based and could have been observed everywhere. We probably do not know where within the raster cell the observed value is correct; all we know is that at the chosen resolution, this is the value representing the whole cell area.
library(elevatr)
elevation <- get_elev_raster(as(olinda_sirgas2000, "Spatial"), z = 14)
## Merging DEMs
## Reprojecting DEM to original projection
## Note: Elevation units are in meters.
## Note: The coordinate reference system is:
## +proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
elevation[elevation$layer < 1] <- NA
mapview(elevation, col=terrain.colors)
## Warning in rasterCheckSize(x, maxpixels = maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ;
## the supplied Raster* has 9588312
## ... decreasing Raster* resolution to 5e+05 pixels
## to view full resolution set 'maxpixels = 9588312 '
The raster package S4 representation builds on the sp representation, using a GridTopology
S4 object to specify the grid, and a data frame to hold the data. raster defines RasterLayer
, and combinations of layers as stacks (may be different storage classes) or bricks (same storage class - array). Adding time remains an issue; raster can avoid reading data into memory using rgdal mechanisms.
elevation
## class : RasterLayer
## dimensions : 3096, 3097, 9588312 (nrow, ncol, ncell)
## resolution : 4.73, 4.7 (x, y)
## extent : 286508, 301156.8, 9106220, 9120771 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## source : memory
## names : layer
## values : 1, 93 (min, max)
sp::gridparameters(as(elevation, "SpatialGrid"))
## cellcentre.offset cellsize cells.dim
## s1 286510.4 4.73 3097
## s2 9106222.6 4.70 3096
library(stars)
## Loading required package: abind
e1 <- st_as_stars(elevation)
e1
## stars object with 2 dimensions and 1 attribute
## attribute(s), summary of first 1e+05 cells:
## layer
## Min. : 1.00
## 1st Qu.: 8.00
## Median :21.04
## Mean :25.04
## 3rd Qu.:37.50
## Max. :85.50
## NA's :50010
## dimension(s):
## from to offset delta refsys point values
## x 1 3097 286508 4.73 +proj=utm +zone=25 +south... NA NULL [x]
## y 1 3096 9120771 -4.7 +proj=utm +zone=25 +south... NA NULL [y]
if (!dir.exists("output")) dir.create("output")
write_stars(e1, "output/elevation.tif")
Package raster
has been around for a long time, is robust, well tested, versatile, and works for large rasters. We will not discuss it here.
The more recent stars
package has a different take on raster data, and will be used here. The following example loads 6 bands from a (section of a) Landsat 7 image, and plots it:
fn <- system.file("tif/L7_ETMs.tif", package = "stars")
system.time(L7 <- read_stars(fn))
## user system elapsed
## 0.014 0.002 0.019
L7
## stars object with 3 dimensions and 1 attribute
## attribute(s):
## L7_ETMs.tif
## Min. : 1.00
## 1st Qu.: 54.00
## Median : 69.00
## Mean : 68.91
## 3rd Qu.: 86.00
## Max. :255.00
## dimension(s):
## from to offset delta refsys point values
## x 1 349 288776 28.5 +proj=utm +zone=25 +south... FALSE NULL [x]
## y 1 352 9120761 -28.5 +proj=utm +zone=25 +south... FALSE NULL [y]
## band 1 6 NA NA NA NA NULL
plot(L7)
The irregular color breaks come from the default of using “type = quantile” to classInt::classIntervals()
, which causes histogram stretching; this is done over all 6 bands.
We can compute an index like ndvi, and plot it:
ndvi <- function(x) (x[4] - x[3])/(x[4] + x[3])
(s2.ndvi <- st_apply(L7, c("x", "y"), ndvi))
## stars object with 2 dimensions and 1 attribute
## attribute(s):
## ndvi
## Min. :-0.75342
## 1st Qu.:-0.20301
## Median :-0.06870
## Mean :-0.06432
## 3rd Qu.: 0.18667
## Max. : 0.58667
## dimension(s):
## from to offset delta refsys point values
## x 1 349 288776 28.5 +proj=utm +zone=25 +south... FALSE NULL [x]
## y 1 352 9120761 -28.5 +proj=utm +zone=25 +south... FALSE NULL [y]
if (!dir.exists("output")) dir.create("output")
write_stars(s2.ndvi, "output/L7_ndvi.tif")
system.time(plot(s2.ndvi))
## user system elapsed
## 0.103 0.000 0.103
As we can see, the proxy object contains no data, but only a pointer to the data. Plotting takes less time, because only pixels that can be seen are read:
system.time(L7 <- read_stars(fn, proxy=TRUE))
## user system elapsed
## 0.002 0.000 0.003
L7
## stars_proxy object with 1 attribute in file:
## $L7_ETMs.tif
## [1] "/home/rsb/lib/r_libs/stars/tif/L7_ETMs.tif"
##
## dimension(s):
## from to offset delta refsys point values
## x 1 349 288776 28.5 +proj=utm +zone=25 +south... FALSE NULL [x]
## y 1 352 9120761 -28.5 +proj=utm +zone=25 +south... FALSE NULL [y]
## band 1 6 NA NA NA NA NULL
In calculating the NDVI, lazy evaluation is used: s2.ndvi
still contains no pixels, but only plot
calls for them, and instructs st_apply
to only work on the resolution called for, and so optimizes plotting time too.
(s2.ndvi = st_apply(L7, c("x", "y"), ndvi))
## stars_proxy object with 1 attribute in file:
## $L7_ETMs.tif
## [1] "/home/rsb/lib/r_libs/stars/tif/L7_ETMs.tif"
##
## dimension(s):
## from to offset delta refsys point values
## x 1 349 288776 28.5 +proj=utm +zone=25 +south... FALSE NULL [x]
## y 1 352 9120761 -28.5 +proj=utm +zone=25 +south... FALSE NULL [y]
## band 1 6 NA NA NA NA NULL
## call list:
## [[1]]
## st_apply(X = X, MARGIN = c("x", "y"), FUN = ndvi)
system.time(plot(s2.ndvi))
## user system elapsed
## 0.246 0.000 0.248
OpenEO proposes proof-of-concept client-server API approaches. The sample data sets are from the southern Alps, and the project is under development.
if (!dir.exists("output")) dir.create("output")
library(openeo) # v 0.3.1
euracHost = "http://saocompute.eurac.edu/openEO_0_3_0/openeo/"
eurac = connect(host = euracHost,disable_auth = TRUE)
## Connected to host
pgb = eurac %>% pgb()
bb <- list(west=10.98,east=11.25,south=46.58,north=46.76)
tt <- c("2017-01-01T00:00:00Z","2017-01-31T00:00:00Z")
pgb$collection$S2_L2A_T32TPS_20M %>%
pgb$filter_bbox(extent=bb) %>%
pgb$filter_daterange(extent=tt) %>%
pgb$NDVI(red="B04",nir="B8A") %>%
pgb$min_time() -> task
tf <- "output/preview.tif"
preview <- preview(eurac, task=task, format="GTiff",
output_file=tf)
plot(read_stars(preview))
Earth Observation Data Cubes from Satellite Image Collections - extension of the stars proxy mechansim and the raster out-of-memory approach: (https://github.com/appelmar/gdalcubes_R).
Processing collections of Earth observation images as on-demand multispectral, multitemporal data cubes. Users define cubes by spatiotemporal extent, resolution, and spatial reference system and let ‘gdalcubes’ automatically apply cropping, reprojection, and resampling using the ‘Geospatial Data Abstraction Library’ (‘GDAL’).
Implemented functions on data cubes include reduction over space and time, applying arithmetic expressions on pixel band values, moving window aggregates over time, filtering by space, time, bands, and predicates on pixel values, materializing data cubes as ‘netCDF’ files, and plotting. User-defined ‘R’ functions can be applied over chunks of data cubes. The package implements lazy evaluation and multithreading. See also a one month old blog post.
library(stars)
tif <- system.file("tif/L7_ETMs.tif", package = "stars")
x <- read_stars(tif)
str(x)
## List of 1
## $ L7_ETMs.tif: num [1:349, 1:352, 1:6] 69 69 63 60 61 61 62 60 64 63 ...
## - attr(*, "dimensions")=List of 3
## ..$ x :List of 7
## .. ..$ from : num 1
## .. ..$ to : num 349
## .. ..$ offset: num 288776
## .. ..$ delta : num 28.5
## .. ..$ refsys: chr "+proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
## .. ..$ point : logi FALSE
## .. ..$ values: NULL
## .. ..- attr(*, "class")= chr "dimension"
## ..$ y :List of 7
## .. ..$ from : num 1
## .. ..$ to : num 352
## .. ..$ offset: num 9120761
## .. ..$ delta : num -28.5
## .. ..$ refsys: chr "+proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
## .. ..$ point : logi FALSE
## .. ..$ values: NULL
## .. ..- attr(*, "class")= chr "dimension"
## ..$ band:List of 7
## .. ..$ from : num 1
## .. ..$ to : Named int 6
## .. .. ..- attr(*, "names")= chr "band"
## .. ..$ offset: num NA
## .. ..$ delta : num NA
## .. ..$ refsys: chr NA
## .. ..$ point : logi NA
## .. ..$ values: NULL
## .. ..- attr(*, "class")= chr "dimension"
## ..- attr(*, "raster")=List of 3
## .. ..$ affine : num [1:2] 0 0
## .. ..$ dimensions : chr [1:2] "x" "y"
## .. ..$ curvilinear: logi FALSE
## .. ..- attr(*, "class")= chr "stars_raster"
## ..- attr(*, "class")= chr "dimensions"
## - attr(*, "class")= chr "stars"
rgb = c(3,2,1)
for RGB, and rgb = c(4,3,2)
for false color.plot(x, rgb = c(3, 2, 1))
plot(x, rgb = c(4, 3, 2))
x6 <- split(x, "band")
to create a 2-dimensional raster with 6 attributes(x6 <- split(x, "band"))
## stars object with 2 dimensions and 6 attributes
## attribute(s):
## X1 X2 X3 X4
## Min. : 47.00 Min. : 32.00 Min. : 21.00 Min. : 9.00
## 1st Qu.: 67.00 1st Qu.: 55.00 1st Qu.: 49.00 1st Qu.: 52.00
## Median : 78.00 Median : 66.00 Median : 63.00 Median : 63.00
## Mean : 79.15 Mean : 67.57 Mean : 64.36 Mean : 59.24
## 3rd Qu.: 89.00 3rd Qu.: 79.00 3rd Qu.: 77.00 3rd Qu.: 75.00
## Max. :255.00 Max. :255.00 Max. :255.00 Max. :255.00
## X5 X6
## Min. : 1.00 Min. : 1.00
## 1st Qu.: 63.00 1st Qu.: 32.00
## Median : 89.00 Median : 60.00
## Mean : 83.18 Mean : 59.98
## 3rd Qu.:112.00 3rd Qu.: 88.00
## Max. :255.00 Max. :255.00
## dimension(s):
## from to offset delta refsys point values
## x 1 349 288776 28.5 +proj=utm +zone=25 +south... FALSE NULL [x]
## y 1 352 9120761 -28.5 +proj=utm +zone=25 +south... FALSE NULL [y]
x6
. What has changed?plot(x6)
str(x6)
## List of 6
## $ X1: num [1:349, 1:352] 69 69 63 60 61 61 62 60 64 63 ...
## $ X2: num [1:349, 1:352] 56 57 52 45 52 50 51 51 51 52 ...
## $ X3: num [1:349, 1:352] 46 49 45 35 44 37 41 37 39 49 ...
## $ X4: num [1:349, 1:352] 79 75 66 66 76 78 76 78 68 70 ...
## $ X5: num [1:349, 1:352] 86 88 75 69 92 74 79 80 67 84 ...
## $ X6: num [1:349, 1:352] 46 49 41 38 60 38 43 38 32 53 ...
## - attr(*, "dimensions")=List of 2
## ..$ x:List of 7
## .. ..$ from : num 1
## .. ..$ to : num 349
## .. ..$ offset: num 288776
## .. ..$ delta : num 28.5
## .. ..$ refsys: chr "+proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
## .. ..$ point : logi FALSE
## .. ..$ values: NULL
## .. ..- attr(*, "class")= chr "dimension"
## ..$ y:List of 7
## .. ..$ from : num 1
## .. ..$ to : num 352
## .. ..$ offset: num 9120761
## .. ..$ delta : num -28.5
## .. ..$ refsys: chr "+proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
## .. ..$ point : logi FALSE
## .. ..$ values: NULL
## .. ..- attr(*, "class")= chr "dimension"
## ..- attr(*, "raster")=List of 3
## .. ..$ affine : num [1:2] 0 0
## .. ..$ dimensions : chr [1:2] "x" "y"
## .. ..$ curvilinear: logi FALSE
## .. ..- attr(*, "class")= chr "stars_raster"
## ..- attr(*, "class")= chr "dimensions"
## - attr(*, "class")= chr "stars"
x6
x6$mean <- (x6[[1]] + x6[[2]] + x6[[3]] + x6[[4]] + x6[[5]] + x6[[6]])/6
mean
over the x
and y
dimensions (essentially reducing “band”), by using st_apply
; compare the results of the two approachesxm <- st_apply(x, c("x", "y"), mean)
all.equal(xm[[1]], x6$mean)
## [1] TRUE
str(xm)
## List of 1
## $ mean: num [1:349, 1:352] 63.7 64.5 57 52.2 64.2 ...
## - attr(*, "dimensions")=List of 2
## ..$ x:List of 7
## .. ..$ from : num 1
## .. ..$ to : num 349
## .. ..$ offset: num 288776
## .. ..$ delta : num 28.5
## .. ..$ refsys: chr "+proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
## .. ..$ point : logi FALSE
## .. ..$ values: NULL
## .. ..- attr(*, "class")= chr "dimension"
## ..$ y:List of 7
## .. ..$ from : num 1
## .. ..$ to : num 352
## .. ..$ offset: num 9120761
## .. ..$ delta : num -28.5
## .. ..$ refsys: chr "+proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
## .. ..$ point : logi FALSE
## .. ..$ values: NULL
## .. ..- attr(*, "class")= chr "dimension"
## ..- attr(*, "raster")=List of 3
## .. ..$ affine : num [1:2] 0 0
## .. ..$ dimensions : chr [1:2] "x" "y"
## .. ..$ curvilinear: logi FALSE
## .. ..- attr(*, "class")= chr "stars_raster"
## ..- attr(*, "class")= chr "dimensions"
## - attr(*, "class")= chr "stars"
We have already seen some plot methods for "sf"
, "sfc"
and "nb"
objects in several packages for static plots, and mapview for interactive visualization. Let’s run through available packages, functions and methods quickly.
classInt provides the key class interval determination for thematic mapping of continuous variables. The classIntervals()
function takes a numeric vector (now also of classes POSIXt or units), a target number of intervals, and a style of class interval. Other arguments control the closure and precision of the intervals found.
library(classInt)
args(classIntervals)
## function (var, n, style = "quantile", rtimes = 3, ..., intervalClosure = c("left",
## "right"), dataPrecision = NULL, warnSmallN = TRUE, warnLargeN = TRUE,
## largeN = 3000L, samp_prop = 0.1, gr = c("[", "]"))
## NULL
We’ll find 7 intervals using Fisher natural breaks for the deprivation variable:
(cI <- classIntervals(olinda_sirgas2000$DEPRIV, n=7, style="fisher"))
## style: fisher
## [0,0.1215) [0.1215,0.244) [0.244,0.339) [0.339,0.439)
## 37 53 37 26
## [0.439,0.5435) [0.5435,0.6695) [0.6695,0.907]
## 37 33 20
We also need to assign a palette of graphical values, most often colours, to use to fill the intervals, and can inspect the intervals and fill colours with a plot method:
The RColorBrewer package gives by permission access to the ColorBrewer palettes accesible from the ColorBrewer website. Note that ColorBrewer limits the number of classes tightly, only 3–9 sequential classes
library(RColorBrewer)
pal <- RColorBrewer::brewer.pal((length(cI$brks)-1), "Reds")
plot(cI, pal)
We can also display all the ColorBrewer palettes:
display.brewer.all()
The sp package provided base graphics plot and image methods. sf provides plot methods using base graphics; the method for "sf"
objects re-arranges the plot window to provide a colour key, so extra steps are needed if overplotting is needed:
plot(olinda_sirgas2000[,"DEPRIV"], breaks=cI$brks, pal=pal)
(returns current par()
settings); the method also supports direct use of classInt:
plot(olinda_sirgas2000[,"DEPRIV"], nbreaks=7, breaks="fisher", pal=pal)
Earlier we used the plot method for "sfc"
objects which does not manipulate the graphics device, and is easier for overplotting.
mapview: Quickly and conveniently create interactive visualisations of spatial data with or without background maps. Attributes of displayed features are fully queryable via pop-up windows. Additional functionality includes methods to visualise true- and false-color raster images, bounding boxes, small multiples and 3D raster data cubes. It uses leaflet and other HTML packages.
mapview(olinda_sirgas2000, zcol="DEPRIV", col.regions=pal, at=cI$brks)
tmap: Thematic maps show spatial distributions. The theme refers to the phenomena that is shown, which is often demographical, social, cultural, or economic. The best known thematic map type is the choropleth, in which regions are colored according to the distribution of a data variable. The R package tmap offers a coherent plotting system for thematic maps that is based on the layered grammar of graphics. Thematic maps are created by stacking layers, where per layer, data can be mapped to one or more aesthetics. It is also possible to generate small multiples. Thematic maps can be further embellished by configuring the map layout and by adding map attributes, such as a scale bar and a compass. Besides plotting thematic maps on the graphics device, they can also be made interactive as an HTML widget. In addition, the R package tmaptools contains several convenient functions for reading and processing spatial data. See (Tennekes 2018) and Chapter 8 in (Lovelace, Nowosad, and Muenchow 2019).
The tmap package provides cartographically informed, grammar of graphics (gg) based functionality now, like ggplot2 using grid graphics. John McIntosh tried with ggplot2, with quite nice results. I suggested he look at tmap, and things got better, because tmap can switch between interactive and static viewing. tmap also provides direct access to classInt class intervals.
library(tmap)
tmap_mode("plot")
## tmap mode set to plotting
o <- tm_shape(olinda_sirgas2000) + tm_fill("DEPRIV", style="fisher", n=7, palette="Reds")
class(o)
## [1] "tmap"
returns a "tmap"
object, a grid GROB (graphics object), with print methods.
o
Since the objects are GROBs, they can be updated, as in lattice with latticeExtra or ggplot2:
o + tm_borders(alpha=0.5, lwd=0.5)
Using tmap_mode()
, we can switch between presentation ("plot"
) and interactive ("view"
) plotting:
tmap_mode("view")
## tmap mode set to interactive viewing
o + tm_borders(alpha=0.5, lwd=0.5)
tmap_mode("plot")
## tmap mode set to plotting
There is also a Shiny tool for exploring palettes:
tmaptools::palette_explorer()
cartography helps to design cartographic representations such as proportional symbols, choropleth, typology, flows or discontinuities maps. It also offers several features that improve the graphic presentation of maps, for instance, map palettes, layout elements (scale, north arrow, title…), labels or legends. (Giraud and Lambert 2016, 2017), http://riatelab.github.io/cartography/vignettes/cheatsheet/cartography_cheatsheet.pdf. The package is associated with rosm: Download and plot Open Street Map http://www.openstreetmap.org/, Bing Maps http://www.bing.com/maps and other tiled map sources. Use to create basemaps quickly and add hillshade to vector-based maps. https://cran.r-project.org/web/packages/rosm/vignettes/rosm.html
The package organizes extra palettes:
library(cartography)
display.carto.all()
The plotting functions (mot methods) use base graphics:
choroLayer(olinda_sirgas2000, var="DEPRIV", method="fisher-jenks", nclass=7, col=pal, legend.values.rnd=3)
(returns NULL)
The ggplot2 package provides the geom_sf()
facility for mapping:
library(ggplot2)
g <- ggplot(olinda_sirgas2000) + geom_sf(aes(fill=DEPRIV))
g
It is possible to set a theme that drops the arguably unnecessary graticule:
g + theme_void()
g + theme_void() + scale_fill_distiller(palette="Reds", direction=1)
but there is a lot of jumping through hoops to get a simple map. To get proper class intervals involves even more work, because ggplot2 takes specific, not general, positions on how graphics are observed. ColorBrewer eschews continuous colour scales based on cognitive research, but ggplot2 enforces them for continuous variables (similarly for graticules, which may make sense for data plots but not for maps).
Since the break is close, try exploring alternative class interval definitions and palettes, maybe also visiting http://hclwizard.org/ and its hclwizard()
Shiny app, returning a palette generating function on clicking the “Return to R” button:
library(colorspace)
hcl_palettes("sequential (single-hue)", n = 7, plot = TRUE)
pal <- hclwizard()
pal(6)
The end of rainbow discussion is informative:
wheel <- function(col, radius = 1, ...)
pie(rep(1, length(col)), col = col, radius = radius, ...)
opar <- par(mfrow=c(1,2))
wheel(rainbow_hcl(12))
wheel(rainbow(12))
par(opar)
Bivand, Roger S., Edzer Pebesma, and Virgilio Gomez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY. http://www.asdar-book.org/.
Giraud, Timothée, and Nicolas Lambert. 2016. “Cartography: Create and Integrate Maps in Your R Workflow.” JOSS 1 (4). The Open Journal. https://doi.org/10.21105/joss.00054.
———. 2017. “Reproducible Cartography.” In Advances in Cartography and Giscience. ICACI 2017. Lecture Notes in Geoinformation and Cartography., edited by Michael Peterson, 173–83. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-319-57336-6_13.
Lapa, Tiago, Ricardo Ximenes, Nilza Nunes Silva, Wayner Souza, Maria de Fátima Militão Albuquerque, and Gisele Campozana. 2001. “Vigilância da hanseníase em Olinda, Brasil, utilizando técnicas de análise espacial.” Cadernos de Saúde Pública 17 (October). scielo: 1153–62. https://doi.org/10.1590/S0102-311X2001000500016.
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with R. Boca Raton, FL: Chapman and Hall/CRC. https://geocompr.robinlovelace.net/.
Pebesma, Edzer, and Roger S. Bivand. n.d. Spatial Data Science; Uses Cases in R. CRC. https://r-spatial.org/book/.
Tennekes, Martijn. 2018. “Tmap: Thematic Maps in R.” Journal of Statistical Software, Articles 84 (6): 1–39. https://doi.org/10.18637/jss.v084.i06.