td <- tempfile()
dir.create(td)
Sys.setenv("PROJ_USER_WRITABLE_DIRECTORY"=td)
#Sys.setenv("PROJ_NETWORK"="ON")
options("rgdal_show_exportToProj4_warnings"="none")

Required current contributed CRAN packages:

I am running R 4.0.5, with recent update.packages().

needed <- c("RSQLite", "rgdal", "units", "gdistance", "igraph", "stars", "raster", 
"terra", "sp", "mapview", "ggplot2", "mapsf", "tmap", "colorspace", "RColorBrewer", 
"sf", "classInt")

Script

Script and data at https://github.com/rsbivand/UAM21_I/raw/main/UAM21_I_210510.zip. Download to suitable location, unzip and use as basis.

Schedule

  • Today, R as a GIS, planar and spherical geometries, projections and transformations
Time Topic
Wednesday 5/5
09.00-12.00 What is R: programming language, community, ecosystem? What may it be used for in analysing spatial data in a social science setting? What are the basic data structures in R? How can we start writing an R markdown notebook? How to access help in using R? How to use built-in data sets and why? How to write reproducible examples? What can we learn from code examples? How can R help us in furthering reproducible research?
13.00-16.00 What kinds of data objects are used in R? What is the structure of a data.frame? What is a list object? What kinds of data can be contained in data objects?
Thursday 6/5
09.00-12.00 How may we read data into R? From files, including spatial data files, and from online resources? How can we choose between output formats for notebooks and other output media? How can one choose between the basic graphics functions and devices in R?
13.00-16.00 When our data include spatial data objects, in which ways may they be represented in R? How can one make simple thematic maps using R? (sf, stars, tmap)
Monday 10/5
09.00-12.00 How can we use class intervals and colour palettes to communicate? Rather than “lying with maps,” how can we explore the impact of choices made in thematic cartography? How can we condition on continuous or discrete variables to permit visual comparison? How can we combine multiple graphical elements in data visualization? (classInt, sf, tmap, mapsf) May we use R “like a GIS?” How may we structure temporal and spatio-temporal data? Closer introduction to R-spatial (sf, stars, gdalcubes, terra, GDAL, GEOS)
13.00-16.00 Planar and spherical geometries, projections and transformations (s2, PROJ, tmap, mapview, leaflet, geogrid)
Tuesday 11/5
09.00-12.00 Doing things with spatial data … (osmdata, …)
13.00-16.00 Doing things with spatial data … (osmdata, …)
Thursday 20/11
09.00-12.00 Presentations/consultations/discussion
13.00-16.00 Presentations/consultations/discussion

Class intervals

classInt provides the key class interval determination for thematic mapping of continuous variables. The classIntervals() function takes a numeric vector (now also of classes POSIXt or units), a target number of intervals, and a style of class interval. Other arguments control the closure and precision of the intervals found.

library(classInt)
args(classIntervals)
## function (var, n, style = "quantile", rtimes = 3, ..., intervalClosure = c("left", 
##     "right"), dataPrecision = NULL, warnSmallN = TRUE, warnLargeN = TRUE, 
##     largeN = 3000L, samp_prop = 0.1, gr = c("[", "]")) 
## NULL

Lapa et al. (2001) (Leprosy surveillance in Olinda, Brazil, using spatial analysis techniques) made available the underlying data set of Olinda census tracts (setor) in the Corrego Alegre 1970-72 / UTM zone 25S projection (EPSG:22525). Marilia Sá Carvalho and I wrote a tutorial in 2003/4 based on this data set; there is more information in the tutorial.

We’ll find 7 intervals using Fisher natural breaks for the deprivation variable:

library(sf)
## Linking to GEOS 3.10.0dev, GDAL 3.3.0, PROJ 8.0.1
olinda_sirgas2000 <- st_read("../data/olinda_sirgas2000.gpkg")
## Reading layer `olinda_sirgas2000' from data source 
##   `/home/rsb/und/uam21/data/olinda_sirgas2000.gpkg' using driver `GPKG'
## Simple feature collection with 243 features and 10 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 288984.5 ymin: 9111038 xmax: 298474.8 ymax: 9120522
## CRS:           unknown
(cI <- classIntervals(olinda_sirgas2000$DEPRIV, n=7, style="fisher"))
## style: fisher
##      [0,0.1215)  [0.1215,0.244)   [0.244,0.339)   [0.339,0.439)  [0.439,0.5435) 
##              37              53              37              26              37 
## [0.5435,0.6695)  [0.6695,0.907] 
##              33              20
Sys.setenv("PROJ_NETWORK"="OFF")
sf_proj_network()
## [1] FALSE
list.files(sf_proj_search_paths()[1])
## character(0)
sf_extSoftVersion()
##           GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
##    "3.10.0dev"        "3.3.0"        "8.0.1"         "true"         "true" 
##           PROJ 
##        "8.0.1"

We also need to assign a palette of graphical values, most often colours, to use to fill the intervals, and can inspect the intervals and fill colours with a plot method:

The RColorBrewer package gives by permission access to the ColorBrewer palettes accesible from the ColorBrewer website. Note that ColorBrewer limits the number of classes tightly, only 3–9 sequential classes

library(RColorBrewer)
pal <- RColorBrewer::brewer.pal((length(cI$brks)-1), "Reds")
plot(cI, pal)

We can also display all the ColorBrewer palettes:

display.brewer.all()

Try exploring alternative class interval definitions and palettes, maybe also visiting http://hclwizard.org/ and its hclwizard() Shiny app, returning a palette generating function on clicking the “Return to R” button:

library(colorspace)
hcl_palettes("sequential (single-hue)", n = 7, plot = TRUE)

pal <- hclwizard()
pal(6)

The end of rainbow discussion is informative:

wheel <- function(col, radius = 1, ...)
  pie(rep(1, length(col)), col = col, radius = radius, ...) 
opar <- par(mfrow=c(1,2))
wheel(rainbow_hcl(12))
wheel(rainbow(12))

par(opar)

See recent R blog.

See also treatments in Fundamentals of Data Visualization.

Thematic mapping

The sp package provided base graphics plot and image methods. sf provides plot methods using base graphics; the method for "sf" objects re-arranges the plot window to provide a colour key, so extra steps are needed if overplotting is needed:

plot(olinda_sirgas2000[,"DEPRIV"], breaks=cI$brks, pal=pal)

(returns current par() settings); the method also supports direct use of classInt:

plot(olinda_sirgas2000[,"DEPRIV"], nbreaks=7, breaks="fisher", pal=pal)

Earlier we used the plot method for "sfc" objects which does not manipulate the graphics device, and is easier for overplotting.

The tmap package

tmap: Thematic maps show spatial distributions. The theme refers to the phenomena that is shown, which is often demographical, social, cultural, or economic. The best known thematic map type is the choropleth, in which regions are colored according to the distribution of a data variable. The R package tmap offers a coherent plotting system for thematic maps that is based on the layered grammar of graphics. Thematic maps are created by stacking layers, where per layer, data can be mapped to one or more aesthetics. It is also possible to generate small multiples. Thematic maps can be further embellished by configuring the map layout and by adding map attributes, such as a scale bar and a compass. Besides plotting thematic maps on the graphics device, they can also be made interactive as an HTML widget. In addition, the R package tmaptools contains several convenient functions for reading and processing spatial data. See (Tennekes 2018) and Chapter 8 in (Lovelace, Nowosad, and Muenchow 2019).

The tmap package provides cartographically informed, grammar of graphics (gg) based functionality now, like ggplot2 using grid graphics. John McIntosh tried with ggplot2, with quite nice results. I suggested he look at tmap, and things got better, because tmap can switch between interactive and static viewing. tmap also provides direct access to classInt class intervals.

library(tmap)
tmap_mode("plot")
## tmap mode set to plotting
o <- tm_shape(olinda_sirgas2000) + tm_fill("DEPRIV", style="fisher", n=7, palette="Reds")
class(o)
## [1] "tmap"

returns a "tmap" object, a grid GROB (graphics object), with print methods.

o

Since the objects are GROBs, they can be updated, as in lattice with latticeExtra or ggplot2:

o + tm_borders(alpha=0.5, lwd=0.5)

There is also a Shiny tool for exploring palettes:

tmaptools::palette_explorer()

The mapsf, formerly cartography package

cartography helped to design cartographic representations such as proportional symbols, choropleth, typology, flows or discontinuities maps. It also offers several features that improve the graphic presentation of maps, for instance, map palettes, layout elements (scale, north arrow, title…), labels or legends. (Giraud and Lambert 2016, 2017), http://riatelab.github.io/cartography/vignettes/cheatsheet/cartography_cheatsheet.pdf. The package is associated with rosm: Download and plot Open Street Map http://www.openstreetmap.org/, Bing Maps http://www.bing.com/maps and other tiled map sources. Use to create basemaps quickly and add hillshade to vector-based maps. https://cran.r-project.org/web/packages/rosm/vignettes/rosm.html. mapsf has followed up on suggestions to improve cartography, using recent changes in base R colour handling: https://developer.r-project.org/Blog/public/2019/11/21/a-new-palette-for-r/index.html.

The package organizes extra palettes:

library(mapsf)
cols <- mf_get_pal(n = c(5, 5), pal = c("Reds 2", "Greens"))
plot(1:10, rep(1, 10), bg = cols, pch = 22, cex = 4, axes=FALSE, ann=FALSE)

The plotting functions (not methods) use base graphics:

mf_choro(olinda_sirgas2000, var="DEPRIV", breaks="fisher", nbreaks=7, pal=pal, leg_val_rnd=3)

(returns NULL)

The ggplot2 package

The ggplot2 package provides the geom_sf() facility for mapping:

library(ggplot2)
g <- ggplot(olinda_sirgas2000) + geom_sf(aes(fill=DEPRIV))
g

It is possible to set a theme that drops the arguably unnecessary graticule:

g + theme_void()

g + theme_void() + scale_fill_distiller(palette="Reds", direction=1)

but there is a lot of jumping through hoops to get a simple map. To get proper class intervals involves even more work, because ggplot2 takes specific, not general, positions on how graphics are observed. ColorBrewer eschews continuous colour scales based on cognitive research, but ggplot2 enforces them for continuous variables (similarly for graticules, which may make sense for data plots but not for maps).

Interactive maps

tmap modes

Using tmap_mode(), we can switch between presentation ("plot") and interactive ("view") plotting:

tmap_mode("view")
## tmap mode set to interactive viewing
o + tm_borders(alpha=0.5, lwd=0.5)
tmap_mode("plot")
## tmap mode set to plotting

The mapview package

mapview: Quickly and conveniently create interactive visualisations of spatial data with or without background maps. Attributes of displayed features are fully queryable via pop-up windows. Additional functionality includes methods to visualise true- and false-color raster images, bounding boxes, small multiples and 3D raster data cubes. It uses leaflet and other HTML packages.

library(mapview)
if (sf:::CPL_gdal_version() >= "3.1.0") mapviewOptions(fgb = FALSE)
mapview(olinda_sirgas2000, zcol="DEPRIV", col.regions=pal, at=cI$brks)
## Warning: Found less unique colors (7) than unique zcol values (8)! 
## Interpolating color vector to match number of zcol values.

Further examples

data("pol_pres15", package = "spDataLarge")
pol_pres15 <- st_buffer(pol_pres15, dist=0)
library(tmap)
o <- tm_shape(pol_pres15) + tm_facets(free.scales=FALSE) + tm_borders(lwd=0.5, alpha=0.4) + tm_layout(panel.labels=c("Duda", "Komorowski"))
o + tm_fill(c("I_Duda_share", "I_Komorowski_share"), n=6, style="pretty", title="I round\nshare of votes")

o + tm_fill(c("II_Duda_share", "II_Komorowski_share"), n=6, style="pretty", title="II round\nshare of votes")

Spatial data

Spatial data typically combine position data in 2D (or 3D), attribute data and metadata related to the position data. Much spatial data could be called map data or GIS data. We collect and handle much more position data since global navigation satellite systems (GNSS) like GPS came on stream 20 years ago, earth observation satellites have been providing data for longer.

suppressPackageStartupMessages(library(osmdata))
library(sf)
bbox <- opq(bbox = 'bergen norway')
byb0 <- osmdata_sf(add_osm_feature(bbox, key = 'railway',
  value = 'light_rail'))$osm_lines
tram <- osmdata_sf(add_osm_feature(bbox, key = 'railway',
  value = 'tram'))$osm_lines
byb1 <- tram[!is.na(tram$name),]
o <- intersect(names(byb0), names(byb1))
byb <- rbind(byb0[,o], byb1[,o])
saveRDS(byb, file="byb.rds")

Spatial vector data is based on points, from which other geometries are constructed. Vector data is often also termed object-based spatial data. The light rail tracks are 2D vector data. The points themselves are stored as double precision floating point numbers, typically without recorded measures of accuracy (GNSS provides a measure of accuracy). Here, lines are constructed from points.

byb <- readRDS("byb.rds")
library(mapview)
mapviewOptions(fgb = FALSE)
mapview(byb)

Advancing from the sp representation

Representing spatial vector data in R (sp)

The sp package was a child of its time, using S4 formal classes, and the best compromise we then had of positional representation (not arc-node, but hard to handle holes in polygons). If we coerse byb to the sp representation, we see the formal class structure. Input/output used OGR/GDAL vector drivers in the rgdal package, and topological operations used GEOS in the rgeos package.

library(sp)
byb_sp <- as(byb, "Spatial")
str(byb_sp, max.level=2)
## Formal class 'SpatialLinesDataFrame' [package "sp"] with 4 slots
##   ..@ data       :'data.frame':  187 obs. of  11 variables:
##   ..@ lines      :List of 187
##   ..@ bbox       : num [1:2, 1:2] 5.23 60.29 5.36 60.39
##   .. ..- attr(*, "dimnames")=List of 2
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
str(slot(byb_sp, "lines")[[1]])
## Formal class 'Lines' [package "sp"] with 2 slots
##   ..@ Lines:List of 1
##   .. ..$ :Formal class 'Line' [package "sp"] with 1 slot
##   .. .. .. ..@ coords: num [1:14, 1:2] 5.33 5.33 5.33 5.33 5.33 ...
##   .. .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. .. ..$ : chr [1:14] "4870663682" "331531217" "331531216" "331531215" ...
##   .. .. .. .. .. ..$ : chr [1:2] "lon" "lat"
##   ..@ ID   : chr "30098967"
library(terra)
## terra version 1.2.5
## 
## Attaching package: 'terra'
## The following object is masked from 'package:colorspace':
## 
##     coords
(byb_sv <- as(byb, "SpatVector"))
##  class       : SpatVector 
##  geometry    : lines 
##  dimensions  : 187, 11  (geometries, attributes)
##  extent      : 5.232051, 60.39211, 0, 0  (xmin, xmax, ymin, ymax)
##  coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
##  names       :   osm_id    name  electrified frequency gauge layer maxspeed
##  type        :    <chr>   <chr>        <chr>     <chr> <chr> <chr>    <chr>
##  values      : 30098967 Bybanen contact_line         0  1435    NA       NA
##                30098968 Bybanen contact_line         0  1435    NA       NA
##                30098969 Bybanen contact_line         0  1435    -1       NA
##            name.en    railway          source voltage
##              <chr>      <chr>           <chr>   <chr>
##  Bergen light rail light_rail Bergen Light r~     800
##  Bergen light rail light_rail Bergen Light r~     800
##  Bergen light rail light_rail Bergen Light r~     800
str(byb_sv)
## Formal class 'SpatVector' [package "terra"] with 1 slot
##   ..@ ptr:Reference class 'Rcpp_SpatVector' [package "terra"] with 3 fields
##   .. ..$ df      :Reference class 'Rcpp_SpatDataFrame' [package "terra"] with 6 fields
##   .. .. ..$ iplace  : num [1:11] 0 1 2 3 4 5 6 7 8 9 ...
##   .. .. ..$ itype   : num [1:11] 2 2 2 2 2 2 2 2 2 2 ...
##   .. .. ..$ messages:Reference class 'Rcpp_SpatMessages' [package "terra"] with 2 fields
##   .. .. .. ..$ has_error  : logi FALSE
##   .. .. .. ..$ has_warning: logi FALSE
##   .. .. .. ..and 18 methods, of which 4 are  possibly relevant:
##   .. .. .. ..  finalize, getError, getWarnings, initialize
##   .. .. ..$ names   : chr [1:11] "osm_id" "name" "electrified" "frequency" ...
##   .. .. ..$ ncol    : num 11
##   .. .. ..$ nrow    : num 187
##   .. .. ..and 31 methods, of which 17 are  possibly relevant:
##   .. .. ..  add_column_double, add_column_long, add_column_string, cbind,
##   .. .. ..  finalize, get_datatypes, getError, getWarnings, has_error,
##   .. .. ..  has_warning, initialize, rbind, remove_column, subset_cols,
##   .. .. ..  subset_rows, unique, values
##   .. ..$ messages:Reference class 'Rcpp_SpatMessages' [package "terra"] with 2 fields
##   .. .. ..$ has_error  : logi FALSE
##   .. .. ..$ has_warning: logi FALSE
##   .. .. ..and 18 methods, of which 4 are  possibly relevant:
##   .. .. ..  finalize, getError, getWarnings, initialize
##   .. ..$ names   : chr [1:11] "osm_id" "name" "electrified" "frequency" ...
##   .. ..and 101 methods, of which 87 are  possibly relevant:
##   .. ..  add_column_double, add_column_empty, add_column_long,
##   .. ..  add_column_string, aggregate, aggregate_nofield, allerretour, append,
##   .. ..  area, as_lines, as_points, bienvenue, buffer, cbind, centroid, chull,
##   .. ..  coordinates, couldBeLonLat, cover, crop_ext, crop_vct, deepcopy,
##   .. ..  delauny, disaggregate, distance_other, distance_self, erase, extent,
##   .. ..  finalize, flip, geos_isvalid, geos_isvalid_msg, geosdist_other,
##   .. ..  geosdist_self, get_crs, get_datatypes, get_geometry, get_geometryDF,
##   .. ..  get_holes, getDF, getError, getGeometryWKT, getWarnings, has_error,
##   .. ..  has_warning, hex, initialize, intersect, is_valid, isLonLat, length,
##   .. ..  make_valid, ncol, near_between, near_within, nrow, project, rbind,
##   .. ..  read, relate_between, relate_first, relate_within, remove_column,
##   .. ..  remove_df, remove_holes, remove_rows, rescale, rotate, sample,
##   .. ..  sampleGeom, set_crs, set_holes, setGeometry, shift, size, subset_cols,
##   .. ..  subset_rows, symdif, transpose, type, union, union_self, union_unary,
##   .. ..  voronoi, wkb, wkt, write
geomtype(byb_sv)
## [1] "lines"
str(geom(byb_sv))
##  num [1:7906, 1:5] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:5] "geom" "part" "x" "y" ...

Raster data

Spatial raster data is observed using rectangular (often square) cells, within which attribute data are observed. Raster data are very rarely object-based, very often they are field-based and could have been observed everywhere. We probably do not know where within the raster cell the observed value is correct; all we know is that at the chosen resolution, this is the value representing the whole cell area.

library(elevatr)
elevation <- get_elev_raster(byb_sp, z = 10)
is.na(elevation) <- elevation < 1
saveRDS(elevation, file="elevation.rds")
library(raster)
## 
## Attaching package: 'raster'
## The following object is masked from 'package:colorspace':
## 
##     RGB
(elevation <- readRDS("elevation.rds"))
## class      : RasterLayer 
## dimensions : 645, 1947, 1255815  (nrow, ncol, ncell)
## resolution : 0.0005417705, 0.0005417705  (x, y)
## extent     : 4.921875, 5.976702, 60.06441, 60.41385  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : memory
## names      : file1888081600ff6b 
## values     : 1, 1294  (min, max)
str(elevation, max.level=2)
## Formal class 'RasterLayer' [package "raster"] with 12 slots
##   ..@ file    :Formal class '.RasterFile' [package "raster"] with 13 slots
##   ..@ data    :Formal class '.SingleLayerData' [package "raster"] with 13 slots
##   ..@ legend  :Formal class '.RasterLegend' [package "raster"] with 5 slots
##   ..@ title   : chr(0) 
##   ..@ extent  :Formal class 'Extent' [package "raster"] with 4 slots
##   ..@ rotated : logi FALSE
##   ..@ rotation:Formal class '.Rotation' [package "raster"] with 2 slots
##   ..@ ncols   : int 1947
##   ..@ nrows   : int 645
##   ..@ crs     :Formal class 'CRS' [package "sp"] with 1 slot
##   ..@ history : list()
##   ..@ z       : list()
str(slot(elevation, "data"))
## Formal class '.SingleLayerData' [package "raster"] with 13 slots
##   ..@ values    : int [1:1255815] 12 13 12 10 7 3 NA NA NA NA ...
##   ..@ offset    : num 0
##   ..@ gain      : num 1
##   ..@ inmemory  : logi TRUE
##   ..@ fromdisk  : logi FALSE
##   ..@ isfactor  : logi FALSE
##   ..@ attributes: list()
##   ..@ haveminmax: logi TRUE
##   ..@ min       : int 1
##   ..@ max       : int 1294
##   ..@ band      : int 1
##   ..@ unit      : chr ""
##   ..@ names     : chr "file1888081600ff6b"
str(as(elevation, "SpatialGridDataFrame"), max.level=2)
## Formal class 'SpatialGridDataFrame' [package "sp"] with 4 slots
##   ..@ data       :'data.frame':  1255815 obs. of  1 variable:
##   ..@ grid       :Formal class 'GridTopology' [package "sp"] with 3 slots
##   ..@ bbox       : num [1:2, 1:2] 4.92 60.06 5.98 60.41
##   .. ..- attr(*, "dimnames")=List of 2
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
mapview(elevation, col=terrain.colors)
## Warning in rasterCheckSize(x, maxpixels = maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ; 
## the supplied Raster* has 1255815 
##  ... decreasing Raster* resolution to 5e+05 pixels
##  to view full resolution set 'maxpixels =  1255815 '
(elevation_sr <- as(elevation, "SpatRaster"))
## class       : SpatRaster 
## dimensions  : 645, 1947, 1  (nrow, ncol, nlyr)
## resolution  : 0.0005417705, 0.0005417705  (x, y)
## extent      : 4.921875, 5.976702, 60.06441, 60.41385  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
## source      : memory 
## name        : file1888081600ff6b 
## min value   :                  1 
## max value   :               1294
str(elevation_sr)
## Formal class 'SpatRaster' [package "terra"] with 1 slot
##   ..@ ptr:Reference class 'Rcpp_SpatRaster' [package "terra"] with 17 fields
##   .. ..$ depth    : num 0
##   .. ..$ extent   :Reference class 'Rcpp_SpatExtent' [package "terra"] with 2 fields
##   .. .. ..$ valid : logi TRUE
##   .. .. ..$ vector: num [1:4] 4.92 5.98 60.06 60.41
##   .. .. ..and 27 methods, of which 13 are  possibly relevant:
##   .. .. ..  align, as.points, ceil, compare, finalize, floor, initialize,
##   .. .. ..  intersect, round, sample, sampleRandom, sampleRegular, union
##   .. ..$ filenames: chr ""
##   .. ..$ hasRange : logi TRUE
##   .. ..$ hasTime  : logi FALSE
##   .. ..$ hasValues: logi TRUE
##   .. ..$ inMemory : logi TRUE
##   .. ..$ messages :Reference class 'Rcpp_SpatMessages' [package "terra"] with 2 fields
##   .. .. ..$ has_error  : logi FALSE
##   .. .. ..$ has_warning: logi FALSE
##   .. .. ..and 18 methods, of which 4 are  possibly relevant:
##   .. .. ..  finalize, getError, getWarnings, initialize
##   .. ..$ names    : chr "file1888081600ff6b"
##   .. ..$ origin   : num [1:2] -1.10e-04 -5.71e-05
##   .. ..$ range_max: num 1294
##   .. ..$ range_min: num 1
##   .. ..$ res      : num [1:2] 0.000542 0.000542
##   .. ..$ rgb      : logi FALSE
##   .. ..$ time     : num 0
##   .. ..$ timestep : chr "seconds"
##   .. ..$ units    : chr ""
##   .. ..and 192 methods, of which 178 are  possibly relevant:
##   .. ..  addSource, adjacent, aggregate, align, apply, area_by_value,
##   .. ..  arith_numb, arith_rast, as_points, as_polygons, atan2, bilinearValues,
##   .. ..  boundaries, buffer, canProcessInMemory, cellFromRowCol,
##   .. ..  cellFromRowColCombine, cellFromXY, chunkSize, clamp, classify,
##   .. ..  colFromX, collapse_sources, combineSources, compare_geom,
##   .. ..  couldBeLonLat, count, cover, createCategories, crop, cum, deepcopy,
##   .. ..  dense_extent, disaggregate, expand, extCells, extractCell,
##   .. ..  extractVector, extractVectorFlat, extractXY, finalize, flip, focal1,
##   .. ..  focal2, focal3, focalValues, freq, geometry, get_aggregate_dims,
##   .. ..  get_aggregates, get_crs, get_sourcenames, get_sourcenames_long,
##   .. ..  getBlockSize, getCategories, getCatIndex, getColors, getError,
##   .. ..  getLabels, getMessage, getNAflag, getRGB, getValues, getWarnings,
##   .. ..  global, global_weighted_mean, gridDistance, has_error, has_warning,
##   .. ..  hasCategories, hasColors, hasWindow, initf, initialize, initv, is_in,
##   .. ..  is_in_cells, isfinite, isGlobalLonLat, isinfinite, isLonLat, isnan,
##   .. ..  logic_numb, logic_rast, make_vrt, makeCategorical, mask_raster,
##   .. ..  mask_vector, math, math2, mem_needs, modal, ncol, nlyr, nlyrBySource,
##   .. ..  nrow, nsrc, patches, polygonize, quantile, rapply, rappvals,
##   .. ..  rastDistance, rasterize, rasterizeLyr, readAll, readStart, readStop,
##   .. ..  readValues, rectify, removeCategories, removeColors, removeRGB,
##   .. ..  removeWindow, replace, replaceValues, reverse, rgb2col, rotate,
##   .. ..  rowColFromCell, rowFromY, rst_area, same, sampleRandomRaster,
##   .. ..  sampleRandomValues, sampleRegularRaster, sampleRegularValues, scale,
##   .. ..  selRange, separate, set_crs, set_depth, set_resolution,
##   .. ..  set_sourcenames, set_sourcenames_long, set_units, setCategories,
##   .. ..  setCatIndex, setColors, setDepth, setLabels, setNAflag, setNames,
##   .. ..  setRange, setRGB, settime, setTime, setUnit, setValues, setWindow,
##   .. ..  shift, sieve, size, sources_to_disk, spatinit, stretch, subset,
##   .. ..  sum_area, summary, summary_numb, terrain, transpose, trig, trim,
##   .. ..  unique, vectCells, vectDistance, warp, wmean_rast, wmean_vect,
##   .. ..  writeRaster, writeStart, writeStop, writeValues, xFromCol, xyFromCell,
##   .. ..  yFromRow, zonal
str(values(elevation_sr))
##  num [1:1255815, 1] 12 13 12 10 7 3 NA NA NA NA ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr "file1888081600ff6b"

Raster data

The raster package complemented sp for handling raster objects and their interactions with vector objects.

It added to input/output using GDAL through rgdal, and better access to NetCDF files for GDAL built without the relevant drivers.

It may be mentioned in passing that thanks to help from CRAN administrators and especially Brian Ripley, CRAN binary builds of rgdal for Windows and Apple Mac OSX became available from 2006, but with a limited set of vector and raster drivers.

Support from CRAN adminstrators remains central to making packages available to users who are not able to install R source packages themselves, particularly linking to external libraries.

Initially, raster was written in R using functionalities in sp and rgdal with rgeos coming later.

It used a feature of GDAL raster drivers permitting the successive reading of subsets of rasters by row and column, permitting the processing of much larger objects than could be held in memory.

In addition, the concepts of bricks and stacks of rasters were introduced, diverging somewhat from the sp treatment of raster bands as stacked columns as vectors in a data frame.

From this year, a new package called terra steps away from sp class representations, linking directly to GDAL, PROJ and GEOS.

Questions arose

As raster evolved, two other packages emerged raising issues with the ways in which spatial objects had been conceptualized in sp.

The rgeos package used the C application programming interface (API) to the C++ GEOS library, which is itself a translation of the Java Topology Suite (JTS).

While the GDAL vector drivers did use the standard Simple Features representation of ector geometries, it was not strongly enforced.

This laxity now seems most closely associated with the use of ESRI Shapefiles as a de-facto file standard for representation, in which many Simple Features are not consistently representable.

Need for vector standards compliance

Both JTS and GEOS required a Simple Feature compliant representation, and led to the need for curious and fragile adaptations.

For example, these affected the representation of sp "Polygons" objects, which were originally conceptualized after the Shapefile specification: ring direction determined whether a ring was exterior or interior (a hole), but no guidance was given to show which exterior ring holes might belong to.

As R provides a way to add a character string comment to any object, comments were added to each "Polygons" object encoding the necessary information.

In this way, GEOS functionality could be used, but the fragility of vector representation in sp was made very obvious.

Spatio-temporal data

Another package affecting thinking about representation was spacetime, as it diverged from raster by stacking vectors for regular spatio-temporal objects with space varying faster than time.

So a single earth observation band observed repeatedly would be stored in a single vector in a data frame, rather than in the arguably more robust form of a four-dimensional array, with the band taking one position on the final dimension.

The second edition of (Bivand, Pebesma, and Gomez-Rubio 2013) took up all of these issues in one way or another, but after completing a spatial statistics special issue of the Journal of Statistical Software (Pebesma, Bivand, and Ribeiro 2015), it was time to begin fresh implementations of classes for spatial data.

Simple Features in R

It was clear that vector representations needed urgent attention, so the sf package was begun, aiming to implement the most frequently used parts of the specification (ISO 2004b; Kralidis 2008; Herring 2011).

Development was supported by a grant from the then newly started R Consortium, which brings together R developers and industry members.

A key breakthrough came at the useR! 2016 conference, following an earlier decision to re-base vector objects on data frames, rather than as in sp to embed a data frame inside a collection of spatial features of the same kind.

However, although data frame objects in S and R have always been able to take list columns as valid columns, such list columns were not seen as “tidy” (Wickham 2014).

Refresher: data frame objects

First, let us see that is behind the data.frame object: the list object. list objects are vectors that contain other objects, which can be addressed by name or by 1-based indices . Like the vectors we have already met, lists can be accessed and manipulated using square brackets []. Single list elements can be accessed and manipulated using double square brackets [[]]

Starting with four vectors of differing types, we can assemble a list object; as we see, its structure is quite simple. The vectors in the list may vary in length, and lists can (and do often) include lists

V1 <- 1:3
V2 <- letters[1:3]
V3 <- sqrt(V1)
V4 <- sqrt(as.complex(-V1))
L <- list(v1=V1, v2=V2, v3=V3, v4=V4)
str(L)
## List of 4
##  $ v1: int [1:3] 1 2 3
##  $ v2: chr [1:3] "a" "b" "c"
##  $ v3: num [1:3] 1 1.41 1.73
##  $ v4: cplx [1:3] 0+1i 0+1.41i 0+1.73i
L$v3[2]
## [1] 1.414214
L[[3]][2]
## [1] 1.414214

Our list object contains four vectors of different types but of the same length; conversion to a data.frame is convenient. Note that by default strings are converted into factors:

DF <- as.data.frame(L)
str(DF)
## 'data.frame':    3 obs. of  4 variables:
##  $ v1: int  1 2 3
##  $ v2: chr  "a" "b" "c"
##  $ v3: num  1 1.41 1.73
##  $ v4: cplx  0+1i 0+1.41i 0+1.73i
DF <- as.data.frame(L, stringsAsFactors=FALSE)
str(DF)
## 'data.frame':    3 obs. of  4 variables:
##  $ v1: int  1 2 3
##  $ v2: chr  "a" "b" "c"
##  $ v3: num  1 1.41 1.73
##  $ v4: cplx  0+1i 0+1.41i 0+1.73i

We can also provoke an error in conversion from a valid list made up of vectors of different length to a data.frame:

V2a <- letters[1:4]
V4a <- factor(V2a)
La <- list(v1=V1, v2=V2a, v3=V3, v4=V4a)
DFa <- try(as.data.frame(La, stringsAsFactors=FALSE), silent=TRUE)
message(DFa)
## Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
##   arguments imply differing number of rows: 3, 4

We can access data.frame elements as list elements, where the $ is effectively the same as [[]] with the list component name as a string:

DF$v3[2]
## [1] 1.414214
DF[[3]][2]
## [1] 1.414214
DF[["v3"]][2]
## [1] 1.414214

Since a data.frame is a rectangular object with named columns with equal numbers of rows, it can also be indexed like a matrix, where the rows are the first index and the columns (variables) the second:

DF[2, 3]
## [1] 1.414214
DF[2, "v3"]
## [1] 1.414214
str(DF[2, 3])
##  num 1.41
str(DF[2, 3, drop=FALSE])
## 'data.frame':    1 obs. of  1 variable:
##  $ v3: num 1.41

If we coerce a data.frame containing a character vector or factor into a matrix, we get a character matrix; if we extract an integer and a numeric column, we get a numeric matrix.

as.matrix(DF)
##      v1  v2  v3         v4           
## [1,] "1" "a" "1.000000" "0+1.000000i"
## [2,] "2" "b" "1.414214" "0+1.414214i"
## [3,] "3" "c" "1.732051" "0+1.732051i"
as.matrix(DF[,c(1,3)])
##      v1       v3
## [1,]  1 1.000000
## [2,]  2 1.414214
## [3,]  3 1.732051

The fact that data.frame objects descend from list objects is shown by looking at their lengths; the length of a matrix is not its number of columns, but its element count:

length(L)
## [1] 4
length(DF)
## [1] 4
length(as.matrix(DF))
## [1] 12

There are dim methods for data.frame objects and matrices (and arrays with more than two dimensions); matrices and arrays are seen as vectors with dimensions; list objects have no dimensions:

dim(L)
## NULL
dim(DF)
## [1] 3 4
dim(as.matrix(DF))
## [1] 3 4
str(as.matrix(DF))
##  chr [1:3, 1:4] "1" "2" "3" "a" "b" "c" "1.000000" "1.414214" "1.732051" ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:4] "v1" "v2" "v3" "v4"

data.frame objects have names and row.names, matrices have dimnames, colnames and rownames; all can be used for setting new values:

row.names(DF)
## [1] "1" "2" "3"
names(DF)
## [1] "v1" "v2" "v3" "v4"
names(DF) <- LETTERS[1:4]
names(DF)
## [1] "A" "B" "C" "D"
str(dimnames(as.matrix(DF)))
## List of 2
##  $ : NULL
##  $ : chr [1:4] "A" "B" "C" "D"

R objects have attributes that are not normally displayed, but which show their structure and class (if any); we can see that data.frame objects are quite different internally from matrices:

str(attributes(DF))
## List of 3
##  $ names    : chr [1:4] "A" "B" "C" "D"
##  $ class    : chr "data.frame"
##  $ row.names: int [1:3] 1 2 3
str(attributes(as.matrix(DF)))
## List of 2
##  $ dim     : int [1:2] 3 4
##  $ dimnames:List of 2
##   ..$ : NULL
##   ..$ : chr [1:4] "A" "B" "C" "D"

If the reason for different vector lengths was that one or more observations are missing on that variable, NA should be used; the lengths are then equal, and a rectangular table can be created:

V1a <- c(V1, NA)
V3a <- sqrt(V1a)
La <- list(v1=V1a, v2=V2a, v3=V3a, v4=V4a)
DFa <- as.data.frame(La, stringsAsFactors=FALSE)
str(DFa)
## 'data.frame':    4 obs. of  4 variables:
##  $ v1: int  1 2 3 NA
##  $ v2: chr  "a" "b" "c" "d"
##  $ v3: num  1 1.41 1.73 NA
##  $ v4: Factor w/ 4 levels "a","b","c","d": 1 2 3 4

Tidy list columns

DF$E <- list(d=1, e="1", f=TRUE)
str(DF)
## 'data.frame':    3 obs. of  5 variables:
##  $ A: int  1 2 3
##  $ B: chr  "a" "b" "c"
##  $ C: num  1 1.41 1.73
##  $ D: cplx  0+1i 0+1.41i 0+1.73i
##  $ E:List of 3
##   ..$ d: num 1
##   ..$ e: chr "1"
##   ..$ f: logi TRUE

At useR! in 2016, list columns were declared “tidy,” using examples including the difficulty of encoding polygon interior rings in non-list columns. The decision to accommodate “tidy” workflows as well as base-R workflows had already been made, as at least some users only know how to use ``tidy’’ workflows.

sf begins

(Pebesma 2018) shows the status of the sf towards the end of 2017, with a geometry list column containing R wrappers around objects adhering to Simple Features specification definitions. The feature geometries are stored in numeric vectors, matrices, or lists of matrices, and may also be subject to arithmetic operations. Features are held in the "XY" class if two-dimensional, or "XYZ", "XYM" or "XYZM" if such coordinates are available; all single features are "sfg" (Simple Feature geometry) objects:

pt1 <- st_point(c(1,3))
pt2 <- pt1 + 1
pt3 <- pt2 + 1
str(pt3)
##  'XY' num [1:2] 3 5

Geometries may be represented as “Well Known Text” (WKT):

st_as_text(pt3)
## [1] "POINT (3 5)"

or as “Well Known Binary” (WKB) as in databases’ “binary large objects” (BLOBs), resolving the problem of representation when working with GDAL vector drivers and functions, and with GEOS predicates and topological operations:

st_as_binary(pt3)
##  [1] 01 01 00 00 00 00 00 00 00 00 00 08 40 00 00 00 00 00 00 14 40

A column of simple feature geometries ("sfc") is constructed as a list of "sfg" objects, which do not have to belong to the same Simple Features category:

pt_sfc <- st_as_sfc(list(pt1, pt2, pt3))
str(pt_sfc)
## sfc_POINT of length 3; first list element:  'XY' num [1:2] 1 3

Finally, an "sfc" object, a geometry column, can be added to a data.frame object using st_geometry(), which sets a number of attributes on the object and defines it as also being an "sf" object (the "agg" attribute if populated shows how observations on non-geometry columns should be understood):

st_geometry(DF) <- pt_sfc
(DF)
## Simple feature collection with 3 features and 5 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 1 ymin: 3 xmax: 3 ymax: 5
## CRS:           NA
##   A B        C           D    E    geometry
## 1 1 a 1.000000 0+1.000000i    1 POINT (1 3)
## 2 2 b 1.414214 0+1.414214i    1 POINT (2 4)
## 3 3 c 1.732051 0+1.732051i TRUE POINT (3 5)

The sf package does not implement all of the Simple Features geometry categories, but geometries may be converted to the chosen subset, using for example the gdal_utils() function with util="ogr2ogr", options="-nlt CONVERT_TO_LINEAR" to convert curve geometries in an input file to linear geometries.

Many of the functions in the sf package begin with st_ as a reference to the same usage in PostGIS, where the letters were intended to symbolise space and time, but where time has not yet been implemented.

sf also integrates GEOS topological predicates and operations into the same framework, replacing the rgeos package for access to GEOS functionality. The precision and scale defaults differ between sf and rgeos slightly; both remain fragile with respect to invalid geometries, of which there are many in circulation.

(buf_DF <- st_buffer(DF, dist=0.3))
## Simple feature collection with 3 features and 5 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 0.7 ymin: 2.7 xmax: 3.3 ymax: 5.3
## CRS:           NA
##   A B        C           D    E                       geometry
## 1 1 a 1.000000 0+1.000000i    1 POLYGON ((1.3 3, 1.299589 2...
## 2 2 b 1.414214 0+1.414214i    1 POLYGON ((2.3 4, 2.299589 3...
## 3 3 c 1.732051 0+1.732051i TRUE POLYGON ((3.3 5, 3.299589 4...

Raster representations: stars

Like sf, stars was supported by an R Consortium grant, for scalable, spatio-temporal tidy arrays for R.

Spatio-temporal arrays were seen as an alternative way of representing multivariate spatio-temporal data from the choices made in the spacetime package, where a two-dimensional data frame contained stacked positions within stacked time points or intervals.

The proposed arrays might collapse to a raster layer if only one variable was chosen for one time point or interval.

More important, the development of the package was extended to accommodate a backend for earth data processing in which the data are retrieved and rescaled as needed from servers, most often cloud-based servers.

This example only covers a multivariate raster taken from a Landsat 7 view of a small part of the Brazilian coast. In the first part, a GeoTIFF file is read into memory, using three array dimensions, two in planar space, the third across six bands:

library(stars)
## Loading required package: abind
fn <- system.file("tif/L7_ETMs.tif", package = "stars")
L7 <- read_stars(fn)
L7
## stars object with 3 dimensions and 1 attribute
## attribute(s):
##   L7_ETMs.tif    
##  Min.   :  1.00  
##  1st Qu.: 54.00  
##  Median : 69.00  
##  Mean   : 68.91  
##  3rd Qu.: 86.00  
##  Max.   :255.00  
## dimension(s):
##      from  to  offset delta                       refsys point values x/y
## x       1 349  288776  28.5 UTM Zone 25, Southern Hem... FALSE   NULL [x]
## y       1 352 9120761 -28.5 UTM Zone 25, Southern Hem... FALSE   NULL [y]
## band    1   6      NA    NA                           NA    NA   NULL
(L7_R <- as(L7, "Raster"))
## class      : RasterBrick 
## dimensions : 352, 349, 122848, 6  (nrow, ncol, ncell, nlayers)
## resolution : 28.5, 28.5  (x, y)
## extent     : 288776.3, 298722.8, 9110729, 9120761  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=25 +south +ellps=GRS80 +units=m +no_defs 
## source     : memory
## names      : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6 
## min values :      47,      32,      21,       9,       1,       1 
## max values :     255,     255,     255,     255,     255,     255 
## time       : 1, 2, 3, 4, 5, 6
(as(L7_R, "SpatRaster"))
## class       : SpatRaster 
## dimensions  : 352, 349, 6  (nrow, ncol, nlyr)
## resolution  : 28.5, 28.5  (x, y)
## extent      : 288776.3, 298722.8, 9110729, 9120761  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=25 +south +ellps=GRS80 +units=m +no_defs 
## source      : memory 
## names       : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6 
## min values  :      47,      32,      21,       9,       1,       1 
## max values  :     255,     255,     255,     255,     255,     255

The bands can be operated on arithmetically, for example to generate a new object containing values of the normalized difference vegetation index through a function applied across the \(x\) and \(y\) spatial dimensions:

ndvi <- function(x) (x[4] - x[3])/(x[4] + x[3])
(s2.ndvi <- st_apply(L7, c("x", "y"), ndvi))
## stars object with 2 dimensions and 1 attribute
## attribute(s):
##      ndvi          
##  Min.   :-0.75342  
##  1st Qu.:-0.20301  
##  Median :-0.06870  
##  Mean   :-0.06432  
##  3rd Qu.: 0.18667  
##  Max.   : 0.58667  
## dimension(s):
##   from  to  offset delta                       refsys point values x/y
## x    1 349  288776  28.5 UTM Zone 25, Southern Hem... FALSE   NULL [x]
## y    1 352 9120761 -28.5 UTM Zone 25, Southern Hem... FALSE   NULL [y]

The same file can also be accessed using the proxy mechanism, shich creates a link to the external entity, here a file:

L7p <- read_stars(fn, proxy=TRUE)
L7p
## stars_proxy object with 1 attribute in 1 file(s):
## $L7_ETMs.tif
## [1] "[...]/L7_ETMs.tif"
## 
## dimension(s):
##      from  to  offset delta                       refsys point values x/y
## x       1 349  288776  28.5 UTM Zone 25, Southern Hem... FALSE   NULL [x]
## y       1 352 9120761 -28.5 UTM Zone 25, Southern Hem... FALSE   NULL [y]
## band    1   6      NA    NA                           NA    NA   NULL

The same function can also be applied across the same two spatial dimentions of the array, but no calculation is carried out until the data is needed and the output resolution known:

(L7p.ndvi = st_apply(L7p, c("x", "y"), ndvi))
## stars_proxy object with 1 attribute in 1 file(s):
## $L7_ETMs.tif
## [1] "[...]/L7_ETMs.tif"
## 
## dimension(s):
##      from  to  offset delta                       refsys point values x/y
## x       1 349  288776  28.5 UTM Zone 25, Southern Hem... FALSE   NULL [x]
## y       1 352 9120761 -28.5 UTM Zone 25, Southern Hem... FALSE   NULL [y]
## band    1   6      NA    NA                           NA    NA   NULL    
## call_list:
## [[1]]
## st_apply(X = X, MARGIN = MARGIN, FUN = FUN, CLUSTER = CLUSTER, 
##     PROGRESS = PROGRESS, FUTURE = FUTURE, rename = rename, .fname = .fname)
## attr(,".Environment")
## <environment: 0x12ae8cf0>

The array object can also be split, here on the band dimension, to yield a representation as six rasters in list form:

(x6 <- split(L7, "band"))
## stars object with 2 dimensions and 6 attributes
## attribute(s):
##       X1               X2               X3               X4         
##  Min.   : 47.00   Min.   : 32.00   Min.   : 21.00   Min.   :  9.00  
##  1st Qu.: 67.00   1st Qu.: 55.00   1st Qu.: 49.00   1st Qu.: 52.00  
##  Median : 78.00   Median : 66.00   Median : 63.00   Median : 63.00  
##  Mean   : 79.15   Mean   : 67.57   Mean   : 64.36   Mean   : 59.24  
##  3rd Qu.: 89.00   3rd Qu.: 79.00   3rd Qu.: 77.00   3rd Qu.: 75.00  
##  Max.   :255.00   Max.   :255.00   Max.   :255.00   Max.   :255.00  
##       X5               X6         
##  Min.   :  1.00   Min.   :  1.00  
##  1st Qu.: 63.00   1st Qu.: 32.00  
##  Median : 89.00   Median : 60.00  
##  Mean   : 83.18   Mean   : 59.98  
##  3rd Qu.:112.00   3rd Qu.: 88.00  
##  Max.   :255.00   Max.   :255.00  
## dimension(s):
##   from  to  offset delta                       refsys point values x/y
## x    1 349  288776  28.5 UTM Zone 25, Southern Hem... FALSE   NULL [x]
## y    1 352 9120761 -28.5 UTM Zone 25, Southern Hem... FALSE   NULL [y]

These rasters may also be subjected to arithmetical operations, and as may be seen, explicit arithmetic on the six rasters has the same outcome as applying the same calculatiob to the three-dimensional array:

x6$mean <- (x6[[1]] + x6[[2]] + x6[[3]] + x6[[4]] + x6[[5]] +
              x6[[6]])/6
xm <- st_apply(L7, c("x", "y"), mean)
all.equal(xm[[1]], x6$mean)
## [1] TRUE

openeo

OpenEO proposes proof-of-concept client-server API approaches. The project is under development.

gdalcubes

Earth Observation Data Cubes from Satellite Image Collections - extension of the stars proxy mechansim and the raster out-of-memory approach: (https://github.com/appelmar/gdalcubes_R).

Processing collections of Earth observation images as on-demand multispectral, multitemporal data cubes. Users define cubes by spatiotemporal extent, resolution, and spatial reference system and let ‘gdalcubes’ automatically apply cropping, reprojection, and resampling using the ‘Geospatial Data Abstraction Library’ (‘GDAL’).

Implemented functions on data cubes include reduction over space and time, applying arithmetic expressions on pixel band values, moving window aggregates over time, filtering by space, time, bands, and predicates on pixel values, materializing data cubes as ‘netCDF’ files, and plotting. User-defined ‘R’ functions can be applied over chunks of data cubes. The package implements lazy evaluation and multithreading. See also this blog post.

Support

Support expressses the relationship between the spatial and temporal entities of observation used for capturing underlying data generation processes, and those processes themselves.

The processes and their associated spatial and temporal scales (“footprints”) and strides may not be well-understood, so the ways that we conduct observations may or may not give use good “handles” on the underlying realities.

Since we are most often interested in drawing conclusions about the underlying realities, we should try to be aware of issues raised when we mix things up, the ecological fallacy being a typical example (Wakefield and Lyons 2010), involving the drawing of conclusions about individuals from aggregates.

Change of support occurs when the observational entities we are using differ in their spatial and/or temporal footprint, and we need to impute or interpolate from one support to another (Gotway and Young 2002). Areal interpolation is one of the pathways (Do, Thomas-Agnan, and Vanhems 2015).

Often we are not clear about the aggregation status of variables that we observe by entity either. In many entities, we are dealing with count aggregates. Say we have an accident count on a road segment, it is clear that if we subset the segment, we would need to impute the count to the subsegments that we have chosen. The variable is an aggregate, and subsetting should preserve the sum over all subsets rule.

byb <- readRDS("byb.rds")
names(attributes(byb))
## [1] "names"     "row.names" "sf_column" "agr"       "class"
library(sf)
st_agr(byb)
##      osm_id        name electrified   frequency       gauge       layer 
##        <NA>        <NA>        <NA>        <NA>        <NA>        <NA> 
##    maxspeed     name.en     railway      source     voltage 
##        <NA>        <NA>        <NA>        <NA>        <NA> 
## Levels: constant aggregate identity

Work by (Stasch et al. 2014; Scheider et al. 2016) has shown that misunderstandings about whether variable values are constant over a segment (we really want the gauge to be constant), whether they are identities (osm_id), or whether they are measured over the whole observed time period at the point, line segment, polygon, or raster cell by counting or other aggregation, are quite prevalent.

All "sf" objects have an "agr" attribute, set by default to unknown (NA) for each non-geometry column in the data frame. In this case the information is of very poor quality (many missing values, others guessed), but use can be made of the facility in other datasets.

byb$length <- st_length(byb)
summary(byb$length)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    2.739   54.088  133.712  256.810  382.945 1217.250
str(byb$length)
##  Units: [m] num [1:187] 378 770 374 818 814 ...

Unfortunately, the simple examples given in SDSR do not work. The introduction of units, shown here and in (Pebesma, Mailund, and Hiebert 2016) also do not (yet) provide the background for issuing warnings with regard to support that were anticipated when the ideas were first considered. The idea underlying st_agr() has been to warn when an aggregate variable is copied across to a part of an entity as though it was a constant.

GEOS, topology operations

(precision in sf, scale in rgeos)

Broad Street Cholera Data

Even though we know that John Snow already had a working hypothesis about cholera epidemics, his data remain interesting, especially if we use a GIS to find the street distances from mortality dwellings to the Broad Street pump in Soho in central London. Brody et al. (2000) point out that John Snow did not use maps to find the Broad Street pump, the polluted water source behind the 1854 cholera epidemic, because he associated cholera with water contaminated with sewage, based on earlier experience.

The basic data to be used here were made available by Jim Detwiler, who had collated them for David O’Sullivan for use on the cover of O’Sullivan and Unwin (2003), based on earlier work by Waldo Tobler and others. The files were a shapefile of counts of deaths at front doors of houses, two shapefiles of pump locations and a georeferenced copy of the Snow map as an image; the files were registered in the British National Grid CRS. These have been converted to GPKG format. In GRASS, a suitable location was set up in this CRS and the image file was imported; the building contours were then digitised as a vector layer and cleaned.

We would like to find the line of equal distances shown on the extract from John Snow’s map shown in Brody et al. (2000) shown here, or equivalently find the distances from the pumps to the front doors of houses with mortalities following the roads, not the straight line distance. We should recall that we only have the locations of counts of mortalities, not of people at risk or of survivors.

library(sf)
bbo <- st_read("snow/bbo.gpkg")
## Reading layer `bbo' from data source `/home/rsb/und/uam21/units_1/snow/bbo.gpkg' using driver `GPKG'
## Simple feature collection with 1 feature and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 528890.7 ymin: 180557.9 xmax: 529808.7 ymax: 181415.7
## Projected CRS: OSGB36 / British National Grid
buildings <- st_read("snow/buildings.gpkg", quiet=TRUE)
deaths <- st_read("snow/deaths.gpkg", quiet=TRUE)
sum(deaths$Num_Css)
## [1] 578
b_pump <- st_read("snow/b_pump.gpkg", quiet=TRUE)
nb_pump <- st_read("snow/nb_pump.gpkg", quiet=TRUE)

As there is a small difference between the CRS values, we copy across before conducting an intersection operation to clip the buildings to the boundary, then we buffer in the buildings object (to make the roads broader).

library(sf)
st_crs(buildings) <- st_crs(bbo)
buildings1 <- st_intersection(buildings, bbo)
buildings2 <- st_buffer(buildings1, dist=-4)
plot(st_geometry(buildings2))

Next we create a dummy raster using raster with 1 meter resolution in the extent of the buildings object (note that raster::extent() works with sf objects, but the CRS must be given as a string):

library(raster)
resolution <- 1
r <- raster(extent(buildings2), resolution=resolution, crs=st_crs(bbo)$proj4string)
r[] <- resolution
summary(r)
##         layer
## Min.        1
## 1st Qu.     1
## Median      1
## 3rd Qu.     1
## Max.        1
## NA's        0

One of the building3 component geometries was empty (permitted in sf, not in sp), so should be dropped before running raster::cellFromPolygon() to list raster cells in each geometry (so we need unlist() to assign NA to the in-buffered buildings):

buildings3 <- as(buildings2[!st_is_empty(buildings2),], "Spatial")
cfp <- cellFromPolygon(r, buildings3)
is.na(r[]) <- unlist(cfp)
summary(r)
##          layer
## Min.         1
## 1st Qu.      1
## Median       1
## 3rd Qu.      1
## Max.         1
## NA's    330537
plot(r)

Using gdistance, we create a symmetric transition object with an internal sparse matrix representation, from which shortest paths can be computed:

library(gdistance)
tr1 <- transition(r, transitionFunction=function(x) 1/mean(x), directions=8, symm=TRUE)

We need to find shortest paths from addresses with mortalities to the Broad Street pump first:

sp_deaths <- as(deaths, "Spatial")
d_b_pump <- st_length(st_as_sfc(shortestPath(tr1, as(b_pump, "Spatial"), sp_deaths, output="SpatialLines")))

and then in a loop from the same addresses to each of the other pumps in turn, finally taking the minimum:

res <- matrix(NA, ncol=nrow(nb_pump), nrow=nrow(deaths))
sp_nb_pump <- as(nb_pump, "Spatial")
for (i in 1:nrow(nb_pump)) res[,i] <- st_length(st_as_sfc(shortestPath(tr1, sp_nb_pump[i,], sp_deaths, output="SpatialLines")))
d_nb_pump <- apply(res, 1, min)

Because sf::st_length() uses units units, but they get lost in assigning to a matrix, we need to re-assign before testing whether the Broad Street pump is closer or not:

library(units)
## udunits database from /usr/share/udunits/udunits2.xml
units(d_nb_pump) <- "m"
deaths$b_nearer <- d_b_pump < d_nb_pump
by(deaths$Num_Css, deaths$b_nearer, sum)
## deaths$b_nearer: FALSE
## [1] 217
## ------------------------------------------------------------ 
## deaths$b_nearer: TRUE
## [1] 361

Coordinate reference systems: background

The usefulness of spatial data is linked to knowing its coordinate reference system. The coordinate reference system may be geographic, usually measured in decimal degrees, or projected, layered on a known geographic CRS, usually measured in metres (planar). The underlying geographical CRS must specify an ellipsoid, with associated major and minor axis lengths:

library(sp)
library(rgdal)
## rgdal: version: 1.5-24, (SVN revision 1131M)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.3.0, released 2021/04/26
## Path to GDAL shared files: /usr/local/share/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 8.0.1, March 5th, 2021, [PJ_VERSION: 801]
## Path to PROJ shared files: /tmp/RtmpC8XQwF/file1c31ca8265b56:/usr/local/share/proj:/usr/local/share/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-6
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
## 
## Attaching package: 'rgdal'
## The following object is masked from 'package:terra':
## 
##     project
rgdal_extSoftVersion()
##           GDAL GDAL_with_GEOS           PROJ             sp           EPSG 
##        "3.3.0"         "TRUE"        "8.0.1"        "1.4-6"      "v10.018"
projInfo("ellps")
##         name          major                  ell
## 1      MERIT    a=6378137.0           rf=298.257
## 2      SGS85    a=6378136.0           rf=298.257
## 3      GRS80    a=6378137.0     rf=298.257222101
## 4      IAU76    a=6378140.0           rf=298.257
## 5       airy  a=6377563.396       rf=299.3249646
## 6     APL4.9    a=6378137.0            rf=298.25
## 7      NWL9D    a=6378145.0            rf=298.25
## 8   mod_airy  a=6377340.189        b=6356034.446
## 9     andrae   a=6377104.43             rf=300.0
## 10    danish a=6377019.2563             rf=300.0
## 11   aust_SA    a=6378160.0            rf=298.25
## 12     GRS67    a=6378160.0    rf=298.2471674270
## 13   GSK2011    a=6378136.5       rf=298.2564151
## 14    bessel  a=6377397.155       rf=299.1528128
## 15  bess_nam  a=6377483.865       rf=299.1528128
## 16    clrk66    a=6378206.4          b=6356583.8
## 17    clrk80  a=6378249.145          rf=293.4663
## 18 clrk80ign    a=6378249.2 rf=293.4660212936269
## 19       CPM    a=6375738.7            rf=334.29
## 20    delmbr     a=6376428.             rf=311.5
## 21   engelis   a=6378136.05          rf=298.2566
## 22   evrst30  a=6377276.345          rf=300.8017
## 23   evrst48  a=6377304.063          rf=300.8017
## 24   evrst56  a=6377301.243          rf=300.8017
## 25   evrst69  a=6377295.664          rf=300.8017
## 26   evrstSS  a=6377298.556          rf=300.8017
## 27   fschr60     a=6378166.             rf=298.3
## 28  fschr60m     a=6378155.             rf=298.3
## 29   fschr68     a=6378150.             rf=298.3
## 30   helmert     a=6378200.             rf=298.3
## 31     hough    a=6378270.0              rf=297.
## 32      intl    a=6378388.0              rf=297.
## 33     krass    a=6378245.0             rf=298.3
## 34     kaula     a=6378163.            rf=298.24
## 35     lerch     a=6378139.           rf=298.257
## 36     mprts     a=6397300.              rf=191.
## 37  new_intl    a=6378157.5          b=6356772.2
## 38   plessis     a=6376523.           b=6355863.
## 39      PZ90    a=6378136.0         rf=298.25784
## 40    SEasia    a=6378155.0       b=6356773.3205
## 41   walbeck    a=6376896.0       b=6355834.8467
## 42     WGS60    a=6378165.0             rf=298.3
## 43     WGS66    a=6378145.0            rf=298.25
## 44     WGS72    a=6378135.0            rf=298.26
## 45     WGS84    a=6378137.0     rf=298.257223563
## 46    sphere    a=6370997.0          b=6370997.0
##                                description
## 1                               MERIT 1983
## 2                Soviet Geodetic System 85
## 3                     GRS 1980(IUGG, 1980)
## 4                                 IAU 1976
## 5                                Airy 1830
## 6                      Appl. Physics. 1965
## 7                 Naval Weapons Lab., 1965
## 8                            Modified Airy
## 9               Andrae 1876 (Den., Iclnd.)
## 10          Andrae 1876 (Denmark, Iceland)
## 11         Australian Natl & S. Amer. 1969
## 12                       GRS 67(IUGG 1967)
## 13                                GSK-2011
## 14                             Bessel 1841
## 15                   Bessel 1841 (Namibia)
## 16                             Clarke 1866
## 17                        Clarke 1880 mod.
## 18                      Clarke 1880 (IGN).
## 19         Comm. des Poids et Mesures 1799
## 20                 Delambre 1810 (Belgium)
## 21                            Engelis 1985
## 22                            Everest 1830
## 23                            Everest 1948
## 24                            Everest 1956
## 25                            Everest 1969
## 26               Everest (Sabah & Sarawak)
## 27            Fischer (Mercury Datum) 1960
## 28                   Modified Fischer 1960
## 29                            Fischer 1968
## 30                            Helmert 1906
## 31                                   Hough
## 32 International 1924 (Hayford 1909, 1910)
## 33                        Krassovsky, 1942
## 34                              Kaula 1961
## 35                              Lerch 1979
## 36                         Maupertius 1738
## 37                  New International 1967
## 38                   Plessis 1817 (France)
## 39                                   PZ-90
## 40                          Southeast Asia
## 41                                 Walbeck
## 42                                  WGS 60
## 43                                  WGS 66
## 44                                  WGS 72
## 45                                  WGS 84
## 46               Normal Sphere (r=6370997)

Other parameters should be specified, such as the prime meridian, often taken as Greenwich. Before PROJ version 6, legacy PROJ (and GDAL) used a +datum= tag introduced after the library migrated beyond USGS (around version 4.4). The underlying problem was not that projection and inverse projection could not be carried out between projected CRS and geograpghical CRS, but that national mapping agencies defined often many datums, keying the specification of a geographical CRS to a national or regional datum. Some of these, especially for North America, were supported, but support for others was patchy. The +datum= tag supported a partly informal listing of values, themselves linked to three or seven coefficient datum transformation sets, used through the +towgs84= tag. Coefficient lookup through the +datum= tag, or direct specification of coefficients through the +towgs84= tag became a convenient way to handle datum transformation in addition to projection and inverse projection.

The default “hub” for transformation was to go through the then newly achieved WGS84 datum. Spatial data files often encoded the geographic and projected CRS with reference to these values, in some cases using PROJ 4 strings. These used a pseudo projection +proj=longlat to indicate a geographical CRS, and many other possible values of +proj= for projected CRS.

The Grids & Datums column in Photogrammetric Engineering & Remote Sensing gives insight into some of the peculiarities of national mapping agencies - authority is typically national but may be subnational:

data("GridsDatums")
GridsDatums[grep("Poland", GridsDatums$country),]
##                    country       month year ISO
## 28  The Republic of Poland (September) 2000 POL
## 234                 Poland       (May) 2018 POL

Beyond this, the database successively developed by the European Petroleum Survey Group was copied to local CSV files for PROJ and GDAL, providing lookup by code number. From PROJ 6, GDAL no longer uses these CSV files, and PROJ makes available a SQLite database copy of the EPSG database:

EPSG <- make_EPSG()
EPSG[grep("Poland", EPSG$note), 1:2]
##      code                               note
## 318  2171   Pulkovo 1942(58) / Poland zone I
## 319  2172  Pulkovo 1942(58) / Poland zone II
## 320  2173 Pulkovo 1942(58) / Poland zone III
## 321  2174  Pulkovo 1942(58) / Poland zone IV
## 322  2175   Pulkovo 1942(58) / Poland zone V
## 1971 3120   Pulkovo 1942(58) / Poland zone I
EPSG[grep("GUGiK", EPSG$note), 1:2]
##      code                        note
## 2791 3328 Pulkovo 1942(58) / GUGiK-80
EPSG[grep("PL", EPSG$note), 1:2]
##      code                             note
## 323  2176          ETRF2000-PL / CS2000/15
## 324  2177          ETRF2000-PL / CS2000/18
## 325  2178          ETRF2000-PL / CS2000/21
## 329  2179          ETRF2000-PL / CS2000/24
## 330  2180               ETRF2000-PL / CS92
## 6767 9651               EVRF2007-PL height
## 6768 9656 ETRF2000-PL + Baltic 1986 height
## 6769 9657 ETRF2000-PL + EVRF2007-PL height
## 6785 9700                      ETRF2000-PL
## 6786 9701                      ETRF2000-PL
## 6787 9702                      ETRF2000-PL

The GUGiK data downloaded online uses "EPSG:2180":

st_crs(2180)
## Coordinate Reference System:
##   User input: EPSG:2180 
##   wkt:
## PROJCRS["ETRF2000-PL / CS92",
##     BASEGEOGCRS["ETRF2000-PL",
##         DATUM["ETRF2000 Poland",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",9702]],
##     CONVERSION["Poland CS92",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",19,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9993,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",-5300000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (x)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (y)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Topographic mapping (medium and small scale)."],
##         AREA["Poland - onshore and offshore."],
##         BBOX[49,14.14,55.93,24.15]],
##     ID["EPSG",2180]]

Up to and including PROJ 5, downstream software, like sf and rgdal, have been able to rely on the provision of ad-hoc transformation capabilities, with apparently predictable consequences. Everybody knew (or should have known) that each new release of the PROJ and GDAL CSV metadata files could update transformation coefficients enough to shift outcomes a little. Everyone further chose to ignore the timestamping of coordinates, or at least of datasets; we could guess (as above) that US Census tract boundaries for 1980 must use the NAD27 datum framework - suprisingly many used NAD83 anyway (both for Boston and the North Carolina SIDS data set).

Use of KML files to provide zoom and pan for these boundaries, and now leaflet and mapview exposes approximations mercilessly. Use of coefficients of transformation of an unknown degree of approximation, and authority “googled it” was reaching its limits, or likely had exceeded them.

sp classes used a PROJ string to define the CRS (in an S4 "CRS" object):

getClass("CRS")
## Class "CRS" [package "sp"]
## 
## Slots:
##                 
## Name:   projargs
## Class: character

sf used an S3 "crs" object with an integer EPSG code and a PROJ string; if instantiated from the EPSG code, both were provided. In current sf, the "crs" object has user input and wkt components, and methods to access the PROJ string and the EPSG code for backward compatibility, and from PROJ 8 and EPSG 10 we get datum ensembles:

st_crs(4326)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     ENSEMBLE["World Geodetic System 1984 ensemble",
##         MEMBER["World Geodetic System 1984 (Transit)"],
##         MEMBER["World Geodetic System 1984 (G730)"],
##         MEMBER["World Geodetic System 1984 (G873)"],
##         MEMBER["World Geodetic System 1984 (G1150)"],
##         MEMBER["World Geodetic System 1984 (G1674)"],
##         MEMBER["World Geodetic System 1984 (G1762)"],
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ENSEMBLEACCURACY[2.0]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

Modernising PROJ and issues

PROJ

Because so much open source (and other) software uses the PROJ library and framework, many are affected when PROJ upgrades. Until very recently, PROJ has been seen as very reliable, and the changes taking place now are intended to confirm and reinforce this reliability. Before PROJ 5 (PROJ 6 was released in 2019, PROJ 7 was released in March 2020), the +datum= tag was used, perhaps with +towgs84= with three or seven coefficients, and possibly +nadgrids= where datum transformation grids were available. However, transformations from one projection to another first inversed to longitude-latitude in WGS84, then projected on to the target projection.

Fast-forward 35 years and PROJ.4 is everywhere: It provides coordinate handling for almost every geospatial program, open or closed source. Today, we see a drastical increase in the need for high accuracy GNSS coordinate handling, especially in the agricultural and construction engineering sectors. This need for geodetic-accuracy transformations is not satisfied by “classic PROJ.4.” But with the ubiquity of PROJ.4, we can provide these transformations “everywhere,” just by implementing them as part of PROJ.4 (Evers and Knudsen 2017).

Escaping the WGS84 hub/pivot: PROJ and OGC WKT2

Following the introduction of geodetic modules and pipelines in PROJ 5 (Knudsen and Evers 2017; Evers and Knudsen 2017), PROJ 6 moves further. Changes in the legacy PROJ representation and WGS84 transformation hub have been coordinated through the GDAL barn raising initiative. Crucially WGS84 often ceases to be the pivot for moving between datums. A new OGC WKT is coming, and an SQLite EPSG file database has replaced CSV files. SRS will begin to support 3D by default, adding time too as SRS change. See also PROJ migration notes.

There are very useful postings on the PROJ mailing list from Martin Desruisseaux, first proposing clarifications and a follow-up including a summary:

  • “Early binding” ≈ hub transformation technique.
  • “Late binding” ≈ hub transformation technique NOT used, replaced by a more complex technique consisting in searching parameters in the EPSG database after the transformation context (source, target, epoch, area of interest) is known.
  • The problem of hub transformation technique is independent of WGS84. It is caused by the fact that transformations to/from the hub are approximate. Any other hub we could invent in replacement of WGS84 will have the same problem, unless we can invent a hub for which transformations are exact (I think that if such hub existed, we would have already heard about it).

The solution proposed by ISO 19111 (in my understanding) is:

  • Forget about hub (WGS84 or other), unless the simplicity of early-binding is considered more important than accuracy.
  • Associating a CRS to a coordinate set (geometry or raster) is no longer sufficient. A {CRS, epoch} tuple must be associated. ISO 19111 calls this tuple “Coordinate metadata.” From a programmatic API point of view, this means that getCoordinateReferenceSystem() method in Geometry objects (for instance) needs to be replaced by a getCoordinateMetadata() method.

This page gives a picture of why the changes in PROJ matter - the arrows are in cm per year displacement:

To handle this level of detail, PROJ 7 introduces an on-demand content delivery network under user control as an alternative to dowloading many possibly unnecessary time-specific vertical and horizontal transformation grids. PROJ 8 will introduce datum ensembles, particularly for commonly used datums such as EPSG:4326, which have had (especially in Noth America) multiple, differing, instantiations as time has progressed; for now we do not know what the consequences of this will be - it is already in the EPSG database.

Upstream software dependencies of the R-spatial ecosystem

When changes occur in upstream external software, R packages using these libraries often need to adapt, but package maintainers try very hard to shield users from any consequences, so that legacy workflows continue to provide the same or at least similar results from the same data.

The code shown in (Bivand, Pebesma, and Gomez-Rubio 2008, 2013) is almost all run nightly on a platform with updated R packages and external software.

This does not necessarily trap all differences (figures are not compared), but is helpful in detecting impacts of changes in packages or external software.

It is also very helpful that CRAN servers using the released and development versions of R, and with different levels of external software also run nightly checks.

Again, sometimes changes are only noticed by users, but quite often checks run by maintainers and by CRAN alert us to impending challenges.

Tracking the development mailing lists of the external software communities, all open source, can also show how thinking is evolving, although sometimes code tidying in external software can have unexpected consequences, breaking not sf or sp with rgdal or rgeos, but a package further downstream.

(Bivand 2014) discusses open source geospatial software stacks more generally, but here we will consider ongoing changes in PROJ, see also (Bivand 2020).

(Knudsen and Evers 2017; Evers and Knudsen 2017) not only point out how the world has changed since a World Geodetic System of 1984 (WGS84) was adopted as a hub for coordinate transformation in PROJ, but also introduced transformation pipelines.

In using a transformation hub, PROJ had worked adequately when the errors introduced by transforming first to WGS84 and then from WGS84 to the target coordinate reference system, but with years passing from 1984, the world has undergone sufficient tectonic shifts for errors to increase.

In addition, the need for precision has risen in agriculture and engineering. So PROJ, as it was, risked ceasing to be fit for purpose as a fundamental component of the geospatial open source software stack.

Following major changes in successive iterations of the international standards for coordinate reference systems (ISO 2004a), PROJ is changing from preferring “late-binding” transformations, pivoting through a known transformation hub in going from input to target coordinate reference systems, to “early-binding” transformations.

This means that the user may be offered alternative paths from input to target coordinate reference systems, some of which may go directly, and more will use higher precision transformation grids, enlarging the existing practice of using North American Datum (NAD) grids.

In other cases, three or seven coefficient transformations may be offered, but the default fallback, where little is known about the input or target specification, may be less satisfactory than PROJ has previously offered.

Grid CDN mechanism

The grid CDN is available at https://cdn.proj.org, and can be turned on for use in rgdal; I’m not aware that sf has made control of it available yet. Files are now stored in a single GTiff format (I think cloud-optimised).

Transformation pipelines

In addition, the current iteration of the standard makes it more important to declare the epoch of interest of coordinates (when the position was recorded and how) and the region of interest.

A transformation pathway may have an undefined epoch and a global span, but cannot achieve optimal precision everywhere.

By bounding the region of interest say within a tectonic plate, and the epoch to a given five-year period, very high precision transformations may be possible.

These choices have not so far been required explicitly, but for example matching against the "area" table in the database may reduce the number of transformation pathways offered dramatically.

CRS in R before PROJ

The mapproj package provided coordinate reference system and projection support for the maps package. From mapproj/src/map.h, line 20, we can see that the eccentricity of the Earth is defined as 0.08227185422, corrresponding to the Clark 1866 ellipsoid (Iliffe and Lott 2008):

ellps <- sf_proj_info("ellps")
(clrk66 <- unlist(ellps[ellps$name=="clrk66",]))
##          name         major           ell   description 
##      "clrk66" "a=6378206.4" "b=6356583.8" "Clarke 1866"

With a very few exceptions, projections included in mapproj::mapproject() use the Clarke 1866 ellipsoid, with the remainder using a sphere with the Clarke 1866 major axis radius. The function returns coordinates for visualization in an unknown metric; no inverse projections are available.

eval(parse(text=clrk66["major"]))
eval(parse(text=clrk66["ell"]))
print(sqrt((a^2-b^2)/a^2), digits=10)
## [1] 0.08227185422

Approaching and implemented changes in PROJ

  • PROJ developers not only point out how the world has changed since a World Geodetic System of 1984 (WGS84) was adopted as a hub for coordinate transformation in PROJ, but also introduced transformation pipelines (Knudsen and Evers 2017; Evers and Knudsen 2017).

  • In using a transformation hub, PROJ had worked adequately when the errors introduced by transforming first to WGS84 and then from WGS84 to the target coordinate reference system were acceptable, but with years passing from 1984, the world has undergone sufficient tectonic shifts for errors to increase, just as needs for accuracy sharpen.

  • In addition, the need for accuracy has risen in agriculture and engineering.

  • So PROJ, as it was, risked ceasing to be fit for purpose as a fundamental component of the geospatial open source software stack.

PROJ will also become more tightly linked to authorities responsible for the specification components. While the original well-known text (WKT1) descriptions also contained authorities, WKT2 (2019) is substantially more stringent. PROJ continues to use the European Petroleum Survey Group (EPSG) database, the local copy PROJ uses is now an SQLite database, with a large number of tables:

sf_proj_search_paths()
## [1] "/tmp/RtmpC8XQwF/file1c31ca8265b56" "/usr/local/share/proj"            
## [3] "/usr/local/share/proj"
library(RSQLite)
DB0 <- sf_proj_search_paths()
DB <- file.path(DB0[length(DB0)], "proj.db")
db <- dbConnect(SQLite(), dbname=DB)
cat(strwrap(paste(dbListTables(db), collapse=", ")), sep="\n")
## alias_name, authority_list, authority_to_authority_preference, axis,
## celestial_body, compound_crs, concatenated_operation,
## concatenated_operation_step, conversion, conversion_method,
## conversion_param, conversion_table, coordinate_operation_method,
## coordinate_operation_view, coordinate_operation_with_conversion_view,
## coordinate_system, crs_view, deprecation, ellipsoid, extent,
## geodetic_crs, geodetic_datum, geodetic_datum_ensemble_member,
## geoid_model, grid_alternatives, grid_packages, grid_transformation,
## helmert_transformation, helmert_transformation_table, metadata,
## object_view, other_transformation, prime_meridian, projected_crs,
## scope, supersession, unit_of_measure, usage, vertical_crs,
## vertical_datum, vertical_datum_ensemble_member
dbDisconnect(db)
  • The initial use of coordinate reference systems for objects defined in sp was based on the PROJ string representation, which built on a simplified key=value form.

  • Keys began with plus (+), and the value format depended on the key.

  • If essential keys were missing, some might be added by default from a file that has now been eliminated as misleading; if +ellps= was missing and not added internally from other keys, +ellps=WGS84 would be added silently to refer to the World Geodetic System 1984 ellipsoid definition.

  • Accurate coordinate transformation has always been needed for the integration of data from different sources, but has become much more pressing as web mapping has become available in R, through the leaflet package (Cheng, Karambelkar, and Xie 2021), on which mapview and the "view" mode of tmap build.

  • As web mapping provides zooming and panning, possible infelicities that were too small to detect as mismatches in transformation jump into prominence.

  • The web mapping workflow transforms input objects to OGC:CRS84 (geographical CRS WGS 84, World area of relevance, WGS84 datum, visualization order) as expected by leaflet, then on to EPSG:3857 (WGS 84 / Pseudo-Mercator) for display on web map backgrounds (this is carried out internally in leaflet).

  • Objects shown in mapview and tmap are now coerced to sf classes, then st_transform() transforms to OGC:CRS84 if necessary (or until we are sure EPSG:4326 never swaps axes).

Broad Street Cholera Data

John Snow did not use maps to find the Broad Street pump, the polluted water source behind the 1854 cholera epidemic in Soho in central London, because he associated cholera with water contaminated with sewage, based on earlier experience (Brody et al. 2000). The basic data to be used here were made available by Jim Detwiler, who had collated them for David O’Sullivan for use on the cover of O’Sullivan and Unwin (2003), based on earlier work by Waldo Tobler and others.

Where is the Broad Street pump?

We’ll use the example of the location of the Broad Street pump in Soho, London, to be distributed with sf; the object has planar coordinates in the OSGB British National Grid projected CRS with the OSGB datum:

bp_file <- system.file("gpkg/b_pump.gpkg", package="sf")
b_pump_sf <- st_read(bp_file)
## Reading layer `b_pump' from data source `/home/rsb/lib/r_libs/sf/gpkg/b_pump.gpkg' using driver `GPKG'
## Simple feature collection with 1 feature and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 529393.5 ymin: 181020.6 xmax: 529393.5 ymax: 181020.6
## Projected CRS: Transverse_Mercator

Proj4 string degradation

Before R packages upgraded the way coordinate reference systems were represented in early 2020, our Proj4 string representation suffered degradation. Taking the Proj4 string defined in PROJ 5 for the British National Grid, there is a +datum=OSGB36 key-value pair. But when processing this input with PROJ 6 and GDAL 3, this key is removed. Checking, we can see that reading the input string appears to work, but the output for the Proj4 string drops the +datum=OSGB36 key-value pair, introducing instead the ellipse implied by that datum:

proj5 <- paste0("+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717",
 " +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs")
legacy <- st_crs(proj5)
proj6 <- legacy$proj4string
proj5_parts <- unlist(strsplit(proj5, " "))
proj6_parts <- unlist(strsplit(proj6, " "))
proj5_parts[!is.element(proj5_parts, proj6_parts)]
## [1] "+datum=OSGB36"
proj6_parts[!is.element(proj6_parts, proj5_parts)]
## [1] "+ellps=airy"

We can emulate the problem seen following the release in May 2019 of GDAL 3.0.0 using PROJ 6, by inserting the degraded Proj4 string into the Broad Street pump object. The coordinate reference system representation is now ignorant of the proper datum specification. The apparent "proj4string" component of the sf "crs" is used to permit packages to adapt, even though its contents are degraded.

b_pump_sf1 <- b_pump_sf
st_crs(b_pump_sf1) <- st_crs(st_crs(b_pump_sf1)$proj4string)
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that

Why does this matter? For visualization on a web map, for example using the mapview package, the projected geometries are transformed to the same WGS84 ellipse and datum (OGC:CRS84) that were used in PROJ 4 as a transformation hub. OGC:CRS84 is the visualization axis order equivalent of EPSG:4326. In leaflet, these are projected to Web Mercator (EPSG:3856). Inside mapview(), the sf::st_transform() function is used, so we will emulate this coordinate operation before handing on the geometries for display.

However, because the one of the objects now has a degraded Proj4 string representation of its coordinate reference system, the output points, apparently transformed identically to WGS84, are now some distance apart:

Sys.setenv("PROJ_NETWORK"="OFF")
sf_proj_network(FALSE)
## character(0)
list.files(sf_proj_search_paths()[1])
## character(0)
b_pump_sf_ll <- st_transform(b_pump_sf, "OGC:CRS84")
list.files(sf_proj_search_paths()[1])
## character(0)
b_pump_sf1_ll <- st_transform(b_pump_sf1, "OGC:CRS84")
list.files(sf_proj_search_paths()[1])
## character(0)
sf_use_s2(FALSE)
st_distance(b_pump_sf_ll, b_pump_sf1_ll)
## Units: [m]
##          [,1]
## [1,] 125.0578

The Broad Street pump is within 1m or 2m of the green point (relative accuracy preserved) but the red point is now in Ingestre Place, because of the loss of the datum specification.

library(mapview)
if (sf:::CPL_gdal_version() >= "3.1.0") mapviewOptions(fgb = FALSE)
pts <- rbind(b_pump_sf_ll, b_pump_sf1_ll)
pts$CRS <- c("original", "degraded")
mapview(pts, zcol="CRS", map.type="OpenStreetMap", col.regions=c("green", "red"), cex=18)

Implemented resolutions: WKT2 2019

  • Once PROJ 6 and GDAL 3 had stabilized in the summer of 2019, we identified the underlying threat as lying in the advertised degradation of GDAL’s exportToProj4() function.

  • When reading raster and vector files, the coordinate reference system representation using Proj4 strings would often be degraded, so that further transformation within R (also using GDAL/PROJ functionality) would be at risk of much greater inaccuracy than before.

  • Since then, sf, sp with rgdal and raster have adopted the 2019 version of the “Well-Known Text” coordinate reference system representation WKT2-2019 (ISO 2004a) instead of Proj4 strings to contain coordinate reference system definitions.

  • Accommodations have also been provided so that the S3 class "crs" objects used in objects defined in sf, and the formal S4 class "CRS" objects used objects defined in sp and raster, can continue to attempt to support Proj4 strings in addition, while other package maintainers and workflow users catch up.

  • Following an extended campaign of repeated checking about 900 reverse dependencies (packages depending on sp, rgdal and others) and dozens of github issues, most of the consequences of the switch to WKT2 among packages have now been addressed.

  • More recently (late August 2020), 115 packages are being offered rebuilt stored objects that had included "CRS" objects without WKT2 definitions.

  • This approach has ensured that spatial objects, whether created within R, read in from external data sources, or read as stored objects, all have WKT2 string representations of their coordinate reference systems, and for backward compatibility can represent these in addition as Proj4 strings.

  • Operations on objects should carry forward the new representations, which should also be written out to external data formats correctly.

Specified axis order

There is a minor divergence between sf and sp (and thus rgdal): in sf, the axis order of the CRS is preserved as instantiated, but objects do not have their axes swapped to accord with authorities unless sf::st_axis_order() is set `TRUE´. This can appear odd, because although the representation records a northings-eastings axis order, data are treated as eastings-northings in plotting, variogram construction and so on:

st_crs("EPSG:4326")
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     ENSEMBLE["World Geodetic System 1984 ensemble",
##         MEMBER["World Geodetic System 1984 (Transit)"],
##         MEMBER["World Geodetic System 1984 (G730)"],
##         MEMBER["World Geodetic System 1984 (G873)"],
##         MEMBER["World Geodetic System 1984 (G1150)"],
##         MEMBER["World Geodetic System 1984 (G1674)"],
##         MEMBER["World Geodetic System 1984 (G1762)"],
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ENSEMBLEACCURACY[2.0]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

In sp/rgdal, attempts are made to ensure that axis order is in the form termed GIS, traditional, or visualization, that is always eastings-northings. From very recent rgdal commits (from rev. 1060), PROJ rather than GDAL is used for instantiating CRS:

library(sp)
cat(wkt(CRS("EPSG:4326")), "\n")
## GEOGCRS["WGS 84 (with axis order normalized for visualization)",
##     ENSEMBLE["World Geodetic System 1984 ensemble",
##         MEMBER["World Geodetic System 1984 (Transit)",
##             ID["EPSG",1166]],
##         MEMBER["World Geodetic System 1984 (G730)",
##             ID["EPSG",1152]],
##         MEMBER["World Geodetic System 1984 (G873)",
##             ID["EPSG",1153]],
##         MEMBER["World Geodetic System 1984 (G1150)",
##             ID["EPSG",1154]],
##         MEMBER["World Geodetic System 1984 (G1674)",
##             ID["EPSG",1155]],
##         MEMBER["World Geodetic System 1984 (G1762)",
##             ID["EPSG",1156]],
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",7030]],
##         ENSEMBLEACCURACY[2.0],
##         ID["EPSG",6326]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433],
##         ID["EPSG",8901]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     REMARK["Axis order reversed compared to EPSG:4326"]]

The probability of confusion increases when coercing from sf to sp and vice-versa, with the representations most often remaining unchanged.

sf_from_sp <- st_crs(CRS("EPSG:4326"))
o <- strsplit(sf_from_sp$wkt, "\n")[[1]]
cat(paste(o[grep("CS|AXIS|ORDER", o)], collapse="\n"))
##     CS[ellipsoidal,2],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[1],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[2],

Both of these coercions are using the same underlying PROJ and GDAL versions, and the same PROJ metadata. Work on this question has not yet stabilized; we perhaps prefer all data to be GIS-order, but to be able to read/write from and to authority order.

sp_from_sf <- as(st_crs("EPSG:4326"), "CRS")
o <- strsplit(wkt(sp_from_sf), "\n")[[1]]
cat(paste(o[grep("CS|AXIS|ORDER", o)], collapse="\n"))
##     CS[ellipsoidal,2],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[1],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[2],

Current thinking is to avoid the EPSG:4326 axis order issue by recommending use of OGC:CRS84, which is the representation used in GeoJSON, and also known as urn:ogc:def:crs:OGC::CRS84. This specification is always eastings-northings:

cat(st_crs("OGC:CRS:84")$wkt, "\n")
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ID["EPSG",6326]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433],
##         ID["EPSG",8901]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]]]
cat(wkt(CRS("OGC:CRS84")), "\n")
## GEOGCRS["WGS 84 (CRS84)",
##     ENSEMBLE["World Geodetic System 1984 ensemble",
##         MEMBER["World Geodetic System 1984 (Transit)"],
##         MEMBER["World Geodetic System 1984 (G730)"],
##         MEMBER["World Geodetic System 1984 (G873)"],
##         MEMBER["World Geodetic System 1984 (G1150)"],
##         MEMBER["World Geodetic System 1984 (G1674)"],
##         MEMBER["World Geodetic System 1984 (G1762)"],
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ENSEMBLEACCURACY[2.0]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Not known."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["OGC","CRS84"]]

Coordinate operations

Transformation in sf uses code in GDAL, which in turn uses functions in PROJ; in sp/rgdal, PROJ is used directly for transformation. In order to demonstrate more of what is happening, sf_proj_pipelines() has been added to sf:

(o <- sf_proj_pipelines(st_crs(b_pump_sf), "OGC:CRS84"))
## Candidate coordinate operations found:  9 
## Strict containment:     FALSE 
## Axis order auth compl:  FALSE 
## Source:  PROJCRS["Transverse_Mercator",
##     BASEGEOGCRS["GCS_OSGB 1936",
##         DATUM["Ordnance Survey of Great Britain 1936",
##             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["Degree",0.0174532925199433]],
##         ID["EPSG",4277]],
##     CONVERSION["unnamed",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",49,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-2,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.999601,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",400000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",-100000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["easting",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["northing",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]] 
## Target:  OGC:CRS84 
## Best instantiable operation has accuracy: 2 m
## Description: Inverse of unnamed + OSGB36 to WGS 84 (6) + axis order change
##              (2D)
## Definition:  +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
##              +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy
##              +step +proj=push +v_3 +step +proj=cart +ellps=airy
##              +step +proj=helmert +x=446.448 +y=-125.157
##              +z=542.06 +rx=0.15 +ry=0.247 +rz=0.842 +s=-20.489
##              +convention=position_vector +step +inv +proj=cart
##              +ellps=WGS84 +step +proj=pop +v_3 +step
##              +proj=unitconvert +xy_in=rad +xy_out=deg
## Operation 8 is lacking 1 grid with accuracy 1 m
## Missing grid: uk_os_OSTN15_NTv2_OSGBtoETRS.tif 
## URL: https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif
as.data.frame(o)[,c(5,8)]
##   accuracy instantiable
## 1        2         TRUE
## 2       21         TRUE
## 3       10         TRUE
## 4       21         TRUE
## 5       18         TRUE
## 6       35         TRUE
## 7        3         TRUE
## 8        1        FALSE
## 9       NA         TRUE

Areas of interest

In sf, areas-of-interest need to be given by the users. The provision of areas-of-interest is intended to reduce the number of candidate coordinate operations found by PROJ.

aoi0 <- sf_project(st_crs(b_pump_sf), "OGC:CRS84", matrix(unclass(st_bbox(b_pump_sf)), 2, 2, byrow=TRUE))
aoi <- c(t(aoi0 + c(-0.1, +0.1)))
(o_aoi <- sf_proj_pipelines(st_crs(b_pump_sf), "OGC:CRS84", AOI=aoi))
## Candidate coordinate operations found:  5 
## Strict containment:     FALSE 
## Axis order auth compl:  FALSE 
## Source:  PROJCRS["Transverse_Mercator",
##     BASEGEOGCRS["GCS_OSGB 1936",
##         DATUM["Ordnance Survey of Great Britain 1936",
##             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["Degree",0.0174532925199433]],
##         ID["EPSG",4277]],
##     CONVERSION["unnamed",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",49,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-2,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.999601,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",400000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",-100000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["easting",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["northing",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]] 
## Target:  OGC:CRS84 
## Best instantiable operation has accuracy: 2 m
## Description: Inverse of unnamed + OSGB36 to WGS 84 (6) + axis order change
##              (2D)
## Definition:  +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
##              +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy
##              +step +proj=push +v_3 +step +proj=cart +ellps=airy
##              +step +proj=helmert +x=446.448 +y=-125.157
##              +z=542.06 +rx=0.15 +ry=0.247 +rz=0.842 +s=-20.489
##              +convention=position_vector +step +inv +proj=cart
##              +ellps=WGS84 +step +proj=pop +v_3 +step
##              +proj=unitconvert +xy_in=rad +xy_out=deg
## Operation 5 is lacking 1 grid with accuracy 1 m
## Missing grid: uk_os_OSTN15_NTv2_OSGBtoETRS.tif 
## URL: https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif
as.data.frame(o_aoi)[,c(5,8)]
##   accuracy instantiable
## 1        2         TRUE
## 2       10         TRUE
## 3       21         TRUE
## 4       21         TRUE
## 5        1        FALSE

sf::sf_proj_pipelines() accesses the PROJ metadata database to search through candidate coordinate operations, ranking them by accuracy, returning a data frame of operations. When an area-of-interest is provided, candidates not intersecting it are dropped. Coordinate operations that cannot be instantiated because of missing grids are also listed. Here without an area-of-interest: 9 candidate operations are found when the WKT string contains datum information. Of these, 8 may be instantiated, with 1 needing a grid. 4 operations cease to be candidates if we use an area-of-interest.

Coordinate operations

In sp/rgdal, the coordinate operation last used is returned, and can be retrieved using rgdal::get_last_coordOp(); coordinate operations are represented as pipelines (Knudsen and Evers 2017; Evers and Knudsen 2017), introduced in PROJ 5 and using the PROJ key-value pair notation:

(helm <- as.data.frame(o_aoi)[o_aoi$accuracy==2, "definition"])
## [1] "+proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +step +proj=push +v_3 +step +proj=cart +ellps=airy +step +proj=helmert +x=446.448 +y=-125.157 +z=542.06 +rx=0.15 +ry=0.247 +rz=0.842 +s=-20.489 +convention=position_vector +step +inv +proj=cart +ellps=WGS84 +step +proj=pop +v_3 +step +proj=unitconvert +xy_in=rad +xy_out=deg"
b_pump_sf_ll <- st_transform(b_pump_sf, "OGC:CRS84", pipe=helm, aoi=aoi)

Here we can see that an inverse projection from the specified Transverse Mercator projection is made to geographical coordinates, followed by a seven-parameter Helmert transformation to WGS84 ellipsoid and datum. The parameters are contained in the best instantiable coordinate operation retrieved from the PROJ database. The +push +v_3 and +pop +v_3 operations are used when only horizontal components are needed in the Helmert transformation.

(o_aoi1 <- sf_proj_pipelines(st_crs(b_pump_sf1), "OGC:CRS84", AOI=aoi))
## Candidate coordinate operations found:  1 
## Strict containment:     FALSE 
## Axis order auth compl:  FALSE 
## Source:  PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["Unknown based on Airy 1830 ellipsoid",
##             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
##                 LENGTHUNIT["metre",1,
##                     ID["EPSG",9001]]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",49,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-2,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.999601,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",400000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",-100000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]] 
## Target:  OGC:CRS84 
## Best instantiable operation has only ballpark accuracy 
## Description: Inverse of unknown + Ballpark geographic offset from unknown to
##              WGS 84 (CRS84)
## Definition:  +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
##              +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy
##              +step +proj=unitconvert +xy_in=rad +xy_out=deg
b_pump_sf1_ll <- st_transform(b_pump_sf1, "OGC:CRS84")
st_distance(b_pump_sf_ll, b_pump_sf1_ll)
## Units: [m]
##          [,1]
## [1,] 125.0578

Going on to the case of the degraded representation, only 1 operation is found, with only ballpark accuracy. With our emulation of the dropping of +datum= support in GDAL’s exportToProj4(), we see that the coordinate operation pipeline only contains the inverse projection step, accounting for the observed shift of the Broad Street pump to Ingestre Place.

Using the content download network to access grids

Finally, sp/rgdal may use the provision of on-demand downloading of transformation grids to provide more accuracy (CDN, from PROJ 7, https://cdn.proj.org ). Before finding and choosing to use a coordinate operation using an on-demand downloaded grid, the designated directory is empty:

sf_proj_network()
## [1] FALSE
Sys.setenv("PROJ_NETWORK"="ON")
sf_proj_network(TRUE)
## [1] "https://cdn.proj.org"

Using the CDN, all the candidate operations are instantiable, and the pipeline now shows a horizontal grid transformation rather than a Helmert transformation.

(og <- sf_proj_pipelines(st_crs(b_pump_sf), "OGC:CRS84", AOI=aoi))
## Candidate coordinate operations found:  5 
## Strict containment:     FALSE 
## Axis order auth compl:  FALSE 
## Source:  PROJCRS["Transverse_Mercator",
##     BASEGEOGCRS["GCS_OSGB 1936",
##         DATUM["Ordnance Survey of Great Britain 1936",
##             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["Degree",0.0174532925199433]],
##         ID["EPSG",4277]],
##     CONVERSION["unnamed",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",49,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-2,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.999601,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",400000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",-100000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["easting",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["northing",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]] 
## Target:  OGC:CRS84 
## Best instantiable operation has accuracy: 1 m
## Description: Inverse of unnamed + OSGB36 to WGS 84 (9) + axis order change
##              (2D)
## Definition:  +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
##              +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy
##              +step +proj=hgridshift
##              +grids=uk_os_OSTN15_NTv2_OSGBtoETRS.tif +step
##              +proj=unitconvert +xy_in=rad +xy_out=deg
list.files(sf_proj_search_paths()[1])
## character(0)
b_pump_sf_llgd <- st_transform(b_pump_sf, "OGC:CRS84")
list.files(sf_proj_search_paths()[1])
## character(0)
library(rgdal)
is_proj_CDN_enabled()
## [1] TRUE
b_pump_sp_llgd <- spTransform(as(b_pump_sf, "Spatial"), "OGC:CRS84")
list.files(sf_proj_search_paths()[1])
## [1] "cache.db"
DB <- file.path(DB0[1], "cache.db")
db <- dbConnect(SQLite(), dbname=DB)
cat(strwrap(paste(dbListTables(db), collapse=", ")), sep="\n")
## chunk_data, chunks, downloaded_file_properties, linked_chunks,
## linked_chunks_head_tail, properties, sqlite_sequence
dbReadTable(db, "chunks")
##    id                                                   url  offset data_id
## 1   1 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif       0       1
## 2   2 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2621440       2
## 3   3 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2637824       3
## 4   4 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2654208       4
## 5   5 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2670592       5
## 6   6 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2686976       6
## 7   7 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2703360       7
## 8   8 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2719744       8
## 9   9 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2736128       9
## 10 10 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2752512      10
## 11 11 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2768896      11
## 12 12 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2785280      12
## 13 13 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2801664      13
## 14 14 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2818048      14
## 15 15 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2834432      15
## 16 16 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2850816      16
##    data_size
## 1      16384
## 2      16384
## 3      16384
## 4      16384
## 5      16384
## 6      16384
## 7      16384
## 8      16384
## 9      16384
## 10     16384
## 11     16384
## 12     16384
## 13     16384
## 14     16384
## 15     16384
## 16     16384
dbDisconnect(db)
b_pump_sf_llgdsp <- st_as_sf(b_pump_sp_llgd)

Now the downloaded grid is cached in the database in the designated CDN directory, and may be used for other transformations using the same operation.

Sys.setenv("PROJ_NETWORK"="OFF")
sf_proj_network(FALSE)
## character(0)
c(st_distance(b_pump_sf_ll, b_pump_sf1_ll), st_distance(b_pump_sf_ll, b_pump_sf_llgd), st_distance(b_pump_sf_ll, b_pump_sf_llgdsp))
## Units: [m]
## [1] 125.057812   0.000000   1.751479

Once again, the distance between the point transformed from the sf object as read from file and the point with the degraded coordinate reference system emulating the effect of the change in behaviour of GDAL’s exportToProj4() in GDAL 6 and later is about 125m. Using the CDN shifts the output point by 1.7m, here using both the new s2 and legacy interfaces for measurements in sf using geographical coordinates (Dunnington, Pebesma, and Rubak 2021):

sf_use_s2(TRUE)
c(st_distance(b_pump_sf_ll, b_pump_sf1_ll), st_distance(b_pump_sf_ll, b_pump_sf_llgd), st_distance(b_pump_sf_ll, b_pump_sf_llgdsp))
## Units: [m]
## [1] 124.728570   0.000000   1.745973
sf_use_s2(FALSE)

To be sure that sf uses the CDN, it appears that you need to set Sys.setenv("PROJ_NETWORK"="ON") before sf is loaded and attached. This does not apply to rgdal, but possibly reflects the fact that rgdal uses PROJ directly, while sf::st_transform() uses PROJ through GDAL (and sf::sf_proj_pipelines() uses PROJ directly).

CRS representation: status

  • Although it appears that most of the consequences of the change in representation of coordinate reference systems from Proj4 to WKT2 strings have now been addressed, we still see signs on the mailing list and on Twitter that users, naturally directing their attention to their analytical or visualization work, may still be confused.

  • The extent of the spatial cluster of R packages is so great that it will undoubtedly take time before the dust settles.

  • However, we trust that the operation of upgrading representations is now largely complete.

  • Multiple warnings issued in sp workflows, now noisily drawing attention to possible degradations in workflows, will by default be muted when sp 1.5 and rgdal 1.6 are released.

R’s sessionInfo()

sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Fedora 33 (Workstation Edition)
## 
## Matrix products: default
## BLAS:   /home/rsb/topics/R/R405-share/lib64/R/lib/libRblas.so
## LAPACK: /home/rsb/topics/R/R405-share/lib64/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] RSQLite_2.2.7      rgdal_1.5-24       units_0.7-1        gdistance_1.3-6   
##  [5] Matrix_1.3-3       igraph_1.2.6       stars_0.5-2        abind_1.4-5       
##  [9] raster_3.4-5       terra_1.2-5        sp_1.4-6           osmdata_0.1.5     
## [13] mapview_2.9.9      ggplot2_3.3.3      mapsf_0.2.0        tmap_3.3-1        
## [17] colorspace_2.0-0   RColorBrewer_1.1-2 sf_0.9-9           classInt_0.4-3    
## 
## loaded via a namespace (and not attached):
##  [1] satellite_1.0.2         bit64_4.0.5             lubridate_1.7.10       
##  [4] webshot_0.5.2           httr_1.4.2              tools_4.0.5            
##  [7] bslib_0.2.4             utf8_1.2.1              R6_2.5.0               
## [10] KernSmooth_2.23-18      DBI_1.1.1               withr_2.4.2            
## [13] tidyselect_1.1.1        leaflet_2.0.4.1         bit_4.0.4              
## [16] curl_4.3.1              compiler_4.0.5          leafem_0.1.3           
## [19] rvest_1.0.0             xml2_1.3.2              labeling_0.4.2         
## [22] sass_0.3.1              scales_1.1.1            proxy_0.4-25           
## [25] systemfonts_1.0.1       stringr_1.4.0           digest_0.6.27          
## [28] rmarkdown_2.7           svglite_2.0.0           base64enc_0.1-3        
## [31] dichromat_2.0-0         pkgconfig_2.0.3         htmltools_0.5.1.1      
## [34] fastmap_1.1.0           highr_0.9               htmlwidgets_1.5.3      
## [37] rlang_0.4.11            jquerylib_0.1.4         farver_2.1.0           
## [40] generics_0.1.0          jsonlite_1.7.2          crosstalk_1.1.1        
## [43] dplyr_1.0.5             magrittr_2.0.1          s2_1.0.4.9000          
## [46] Rcpp_1.0.6              munsell_0.5.0           fansi_0.4.2            
## [49] lifecycle_1.0.0         stringi_1.5.3           leafsync_0.1.0         
## [52] yaml_2.2.1              tmaptools_3.1-1         blob_1.2.1             
## [55] grid_4.0.5              parallel_4.0.5          crayon_1.4.1           
## [58] lattice_0.20-44         leafpop_0.0.6           knitr_1.33             
## [61] pillar_1.6.0            uuid_0.1-4              codetools_0.2-18       
## [64] stats4_4.0.5            wk_0.4.1                XML_3.99-0.6           
## [67] glue_1.4.2              evaluate_0.14           leaflet.providers_1.9.0
## [70] png_0.1-7               vctrs_0.3.8             gtable_0.3.0           
## [73] purrr_0.3.4             assertthat_0.2.1        cachem_1.0.4           
## [76] xfun_0.22               lwgeom_0.2-6            e1071_1.7-6            
## [79] class_7.3-18            viridisLite_0.4.0       tibble_3.1.1           
## [82] memoise_2.0.0           ellipsis_0.3.2          brew_1.0-6
Bivand, Roger. 2014. “Geocomputation and Open Source Software: Components and Software Stacks.” In Geocomputation, edited by Robert J. Abrahart and Linda M. See, 329–55. Boca Raton: CRC Press. http://hdl.handle.net/11250/163358.
———. 2020. “Progress in the R Ecosystem for Representing and Handling Spatial Data.” Journal of Geographical Systems. https://doi.org/10.1007/s10109-020-00336-0.
Bivand, Roger, Edzer Pebesma, and Virgilio Gomez-Rubio. 2008. Applied Spatial Data Analysis with R. Springer, NY. https://asdar-book.org/.
———. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY. https://asdar-book.org/.
Brody, H., M. R. Rip, P. Vinten-Johansen, N. Paneth, and S.Rachman. 2000. “Map-Making and Myth-Making in Broad Street: The London Cholera Epidemic, 1854.” Lancet 356: 64–68.
Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2021. Leaflet: Create Interactive Web Maps with the JavaScript ’Leaflet’ Library. https://CRAN.R-project.org/package=leaflet.
Do, Van Huyen, Christine Thomas-Agnan, and Anne Vanhems. 2015. “Accuracy of Areal Interpolation Methods for Count Data.” Spatial Statistics 14: 412–38. https://doi.org/10.1016/j.spasta.2015.07.005.
Dunnington, Dewey, Edzer Pebesma, and Ege Rubak. 2021. S2: Spherical Geometry Operators Using the S2 Geometry Library; Package Version 1.0.2. https://CRAN.R-project.org/package=s2.
Evers, Kristian, and Thomas Knudsen. 2017. Transformation Pipelines for PROJ.4. https://www.fig.net/resources/proceedings/fig\_proceedings/fig2017/papers/iss6b/ISS6B\_evers\_knudsen\_9156.pdf.
Giraud, Timothée, and Nicolas Lambert. 2016. cartography: Create and Integrate Maps in Your R Workflow.” Journal of Open Source Software 1 (4). https://doi.org/10.21105/joss.00054.
———. 2017. “Reproducible Cartography.” In Advances in Cartography and GIScience. ICACI 2017. Lecture Notes in Geoinformation and Cartography, edited by Michael Peterson, 173–83. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-319-57336-6\_13.
Gotway, C. A., and L. J. Young. 2002. “Combining Incompatible Spatial Data.” Journal of the American Statistical Association 97: 632–48.
Herring, John R. 2011. “OpenGIS Implementation Standard for Geographic Information-Simple Feature Access-Part 1: Common Architecture.” Open Geospatial Consortium Inc, 111. http://portal.opengeospatial.org/files/?artifact\_id=25355.
Iliffe, Jonathan, and Roger Lott. 2008. Datums and Map Projections: For Remote Sensing, GIS and Surveying. Boca Raton: CRC.
ISO. 2004a. ISO 19111:2019 Geographic Information – Referencing by Coordinates. https://www.iso.org/standard/74039.html.
———. 2004b. ISO 19125-1:2004 Geographic Information – Simple Feature Access – Part 1: Common Architecture. https://www.iso.org/standard/40114.html.
Knudsen, Thomas, and Kristian Evers. 2017. Transformation Pipelines for PROJ.4. https://meetingorganizer.copernicus.org/EGU2017/EGU2017-8050.pdf.
Kralidis, Athanasios Tom. 2008. “Geospatial Open Source and Open Standards Convergences.” In Open Source Approaches in Spatial Data Handling, edited by G. B. Hall and M. Leahy, 1–20. Berlin: Springer-Verlag.
Lapa, Tiago, Ricardo Ximenes, Nilza Nunes Silva, Wayner Souza, Maria de Fátima Militão Albuquerque, and Gisele Campozana. 2001. Vigilância da hanseníase em Olinda, Brasil, utilizando técnicas de análise espacial.” Cadernos de Saúde Pública 17 (October): 1153–62. https://doi.org/10.1590/S0102-311X2001000500016.
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with R. Boca Raton, FL: Chapman and Hall/CRC. https://geocompr.robinlovelace.net/.
O’Sullivan, D., and D. J. Unwin. 2003. Geographical Information Analysis. Hoboken, NJ: Wiley.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
Pebesma, Edzer, Roger Bivand, and Paulo Ribeiro. 2015. “Software for Spatial Statistics.” Journal of Statistical Software, Articles 63 (1): 1–8. https://doi.org/10.18637/jss.v063.i01.
Pebesma, Edzer, Thomas Mailund, and James Hiebert. 2016. Measurement Units in R.” The R Journal 8 (2): 486–94. https://doi.org/10.32614/RJ-2016-061.
Scheider, Simon, Benedikt Gräler, Edzer Pebesma, and Christoph Stasch. 2016. “Modeling Spatiotemporal Information Generation.” International Journal of Geographical Information Science 30 (10): 1980–2008. https://doi.org/10.1080/13658816.2016.1151520.
Stasch, Christoph, Simon Scheider, Edzer Pebesma, and Werner Kuhn. 2014. “Meaningful Spatial Prediction and Aggregation.” Environmental Modelling & Software 51: 149–65. https://doi.org/10.1016/j.envsoft.2013.09.006.
Tennekes, Martijn. 2018. tmap: Thematic Maps in R.” Journal of Statistical Software 84 (6): 1–39. https://www.jstatsoft.org/v084/i06.
Wakefield, J. C., and H. Lyons. 2010. “Spatial Aggregation and the Ecological Fallacy.” In Handbook of Spatial Statistics, edited by Alan E. Gelfand, Peter Diggle, Peter Guttorp, and Montserrat Fuentes, 541–58. Boca Raton: Chapman & Hall/CRC.
Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software, Articles 59 (10): 1–23. https://doi.org/10.18637/jss.v059.i10.