Required current contributed CRAN packages:

I am running R 3.6.1, with recent update.packages().

needed <- c("sf", "mapview", "sp", "raster", "classInt", "RColorBrewer", "tmap", "rgdal", "rgrass7", "units", "gdistance", "rgeos", "lwgeom", "maptools", "RSQLite")

I also have GRASS 7.6.1 (https://grass.osgeo.org/download/software/), but this is not essential (learning GRASS in 30 minutes is not easy).

Script

Script at https://github.com/rsbivand/geomed19-workshop/raw/master/Geomed19_II.zip. Download to suitable location and use as basis.

Session II

  • 11:10-11:40 (20+10) Ongoing changes in external sofware (GEOS, GDAL), including software and standards used for representing spatial reference systems (PROJ)

  • 11:40-12:10 (20+10) GIS bridges (description and using GRASS and rgrass7)

  • 12:10-12:50 (20+20) Using R as a GIS (topological operations)

Ongoing changes in external sofware (GEOS, GDAL, PROJ)

PROJ

Because so much open source (and other) software uses the PROJ library and framework, many are affected when PROJ upgrades. Until very recently, PROJ has been seen as very reliable, and the changes taking place now are intended to confirm and reinforce this reliability. Before PROJ 5 (PROJ 6 is out now, PROJ 7 is coming early in 2020), the +datum= tag was used, perhaps with +towgs84= with three or seven coefficients, and possibly +nadgrids= where datum transformation grids were available. However, transformations from one projection to another first inversed to longitude-latitude in WGS84, then projected on to the target projection.

Big bump coming:

‘Fast-forward 35 years and PROJ.4 is everywhere: It provides coordinate handling for almost every geospatial program, open or closed source. Today, we see a drastical increase in the need for high accuracy GNSS coordinate handling, especially in the agricultural and construction engineering sectors. This need for geodetic-accuracy transformations is not satisfied by “classic PROJ.4”. But with the ubiquity of PROJ.4, we can provide these transformations “everywhere”, just by implementing them as part of PROJ.4’ (Evers and Knudsen 2017).

Escaping the WGS84 hub/pivot: PROJ and OGC WKT2

Following the introduction of geodetic modules and pipelines in PROJ 5 (Knudsen and Evers 2017; Evers and Knudsen 2017), PROJ 6 moves further. Changes in the legacy PROJ representation and WGS84 transformation hub have been coordinated through the GDAL barn raising initiative. Crucially WGS84 often ceases to be the pivot for moving between datums. A new OGC WKT is coming, and an SQLite EPSG file database has replaced CSV files. SRS will begin to support 3D by default, adding time too as SRS change. See also PROJ migration notes.

There are very useful postings on the PROJ mailing list from Martin Desruisseaux, first proposing clarifications and a follow-up including a summary:

  • “Early binding” ≈ hub transformation technique.
  • “Late binding” ≈ hub transformation technique NOT used, replaced by a more complex technique consisting in searching parameters in the EPSG database after the transformation context (source, target, epoch, area of interest) is known.
  • The problem of hub transformation technique is independent of WGS84. It is caused by the fact that transformations to/from the hub are approximate. Any other hub we could invent in replacement of WGS84 will have the same problem, unless we can invent a hub for which transformations are exact (I think that if such hub existed, we would have already heard about it).

The solution proposed by ISO 19111 (in my understanding) is:

  • Forget about hub (WGS84 or other), unless the simplicity of early-binding is considered more important than accuracy.
  • Associating a CRS to a coordinate set (geometry or raster) is no longer sufficient. A {CRS, epoch} tuple must be associated. ISO 19111 calls this tuple “Coordinate metadata”. From a programmatic API point of view, this means that getCoordinateReferenceSystem() method in Geometry objects (for instance) needs to be replaced by a getCoordinateMetadata() method.

In QGIS built on current PROJ 6 with the proj.h API (and GDAL built on current PROJ 6 with the proj.h API), we see the following sequence of GUI windows when trying to open the olinda.gpkg file.

Instead of using the declared coordinate reference system of the added layer to provide a transformation/conversion relationship to possible WGS84 geographical coordinate or web mapping backgrounds, the user of the most recent QGIS version with PROJ 6 faces a choice of three alternatives with varying availabilities and precisions:

The third alternative has better precision, but depends on finding and installing an NTv2 grid file in the PROJ shared/proj metadata folder:

If we install the file, the choices change to promote the more precise NTv2-based path to the first position:

library(sf)
## Linking to GEOS 3.7.2, GDAL 3.0.1, PROJ 6.1.1
packageVersion("sf")
## [1] '0.8.0'

The final element reported by sf::sf_extSoftVersion() shows whether sf was built with the proj.h interface to PROJ, or the legacy proj.api.h interface. However, GDAL also has to be built with the proj.h interface for everything to line up:

sf_extSoftVersion()
##           GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
##        "3.7.2"        "3.0.1"        "6.1.1"         "true"         "true"
st_crs(22525)
## Coordinate Reference System:
##   EPSG: 22525 
##   proj4string: "+proj=utm +zone=25 +south +ellps=intl +towgs84=-205.57,168.77,-4.12,0,0,0,0 +units=m +no_defs"

The OGC WTK2 definition now contains a usage/scope term showing where the definition may be used; there may also be a temporal frame for a definition.

cat(system("projinfo EPSG:22525", intern=TRUE), sep="\n")
## PROJ.4 string:
## +proj=utm +zone=25 +south +ellps=intl +towgs84=-205.57,168.77,-4.12,0,0,0,0 +units=m +no_defs +type=crs
## 
## WKT2_2018 string:
## PROJCRS["Corrego Alegre 1970-72 / UTM zone 25S",
##     BASEGEOGCRS["Corrego Alegre 1970-72",
##         DATUM["Corrego Alegre 1970-72",
##             ELLIPSOID["International 1924",6378388,297,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4225]],
##     CONVERSION["UTM zone 25S",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-33,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",10000000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Brazil - east of 36°W onshore"],
##         BBOX[-10.1,-36,-4.99,-34.74]],
##     ID["EPSG",22525]]

If we ask about possible transformations/conversions, we see choices we saw among those represented in QGIS (I work on two apparently identical systems, which may give different choice counts)

cat(system("projinfo -s EPSG:22525 -t EPSG:31985", intern=TRUE), sep="\n")
## Candidate operations found: 2
## -------------------------------------
## Operation n°1:
## 
## unknown id, Inverse of UTM zone 25S + Corrego Alegre 1970-72 to SIRGAS 2000 (2) + UTM zone 25S, 5 m, Brazil - Corrego Alegre 1970-1972
## 
## PROJ string:
## +proj=pipeline +step +inv +proj=utm +zone=25 +south +ellps=intl +step +proj=push +v_3 +step +proj=cart +ellps=intl +step +proj=helmert +x=-206.05 +y=168.28 +z=-3.82 +step +inv +proj=cart +ellps=GRS80 +step +proj=pop +v_3 +step +proj=utm +zone=25 +south +ellps=GRS80
## 
## WKT2_2018 string:
## CONCATENATEDOPERATION["Inverse of UTM zone 25S + Corrego Alegre 1970-72 to SIRGAS 2000 (2) + UTM zone 25S",
##     SOURCECRS[
##         PROJCRS["Corrego Alegre 1970-72 / UTM zone 25S",
##             BASEGEOGCRS["Corrego Alegre 1970-72",
##                 DATUM["Corrego Alegre 1970-72",
##                     ELLIPSOID["International 1924",6378388,297,
##                         LENGTHUNIT["metre",1]]],
##                 PRIMEM["Greenwich",0,
##                     ANGLEUNIT["degree",0.0174532925199433]],
##                 ID["EPSG",4225]],
##             CONVERSION["UTM zone 25S",
##                 METHOD["Transverse Mercator",
##                     ID["EPSG",9807]],
##                 PARAMETER["Latitude of natural origin",0,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8801]],
##                 PARAMETER["Longitude of natural origin",-33,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8802]],
##                 PARAMETER["Scale factor at natural origin",0.9996,
##                     SCALEUNIT["unity",1],
##                     ID["EPSG",8805]],
##                 PARAMETER["False easting",500000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8806]],
##                 PARAMETER["False northing",10000000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8807]]],
##             CS[Cartesian,2],
##                 AXIS["(E)",east,
##                     ORDER[1],
##                     LENGTHUNIT["metre",1]],
##                 AXIS["(N)",north,
##                     ORDER[2],
##                     LENGTHUNIT["metre",1]],
##             ID["EPSG",22525]]],
##     TARGETCRS[
##         PROJCRS["SIRGAS 2000 / UTM zone 25S",
##             BASEGEOGCRS["SIRGAS 2000",
##                 DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##                     ELLIPSOID["GRS 1980",6378137,298.257222101,
##                         LENGTHUNIT["metre",1]]],
##                 PRIMEM["Greenwich",0,
##                     ANGLEUNIT["degree",0.0174532925199433]],
##                 ID["EPSG",4674]],
##             CONVERSION["UTM zone 25S",
##                 METHOD["Transverse Mercator",
##                     ID["EPSG",9807]],
##                 PARAMETER["Latitude of natural origin",0,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8801]],
##                 PARAMETER["Longitude of natural origin",-33,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8802]],
##                 PARAMETER["Scale factor at natural origin",0.9996,
##                     SCALEUNIT["unity",1],
##                     ID["EPSG",8805]],
##                 PARAMETER["False easting",500000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8806]],
##                 PARAMETER["False northing",10000000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8807]]],
##             CS[Cartesian,2],
##                 AXIS["(E)",east,
##                     ORDER[1],
##                     LENGTHUNIT["metre",1]],
##                 AXIS["(N)",north,
##                     ORDER[2],
##                     LENGTHUNIT["metre",1]],
##             ID["EPSG",31985]]],
##     STEP[
##         CONVERSION["Inverse of UTM zone 25S",
##             METHOD["Inverse of Transverse Mercator",
##                 ID["INVERSE(EPSG)",9807]],
##             PARAMETER["Latitude of natural origin",0,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8801]],
##             PARAMETER["Longitude of natural origin",-33,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8802]],
##             PARAMETER["Scale factor at natural origin",0.9996,
##                 SCALEUNIT["unity",1],
##                 ID["EPSG",8805]],
##             PARAMETER["False easting",500000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8806]],
##             PARAMETER["False northing",10000000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8807]],
##             ID["INVERSE(EPSG)",16125]]],
##     STEP[
##         COORDINATEOPERATION["Corrego Alegre 1970-72 to SIRGAS 2000 (2)",
##             VERSION["PBS-Bra 2005"],
##             SOURCECRS[
##                 GEOGCRS["Corrego Alegre 1970-72",
##                     DATUM["Corrego Alegre 1970-72",
##                         ELLIPSOID["International 1924",6378388,297,
##                             LENGTHUNIT["metre",1]]],
##                     PRIMEM["Greenwich",0,
##                         ANGLEUNIT["degree",0.0174532925199433]],
##                     CS[ellipsoidal,2],
##                         AXIS["geodetic latitude (Lat)",north,
##                             ORDER[1],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                         AXIS["geodetic longitude (Lon)",east,
##                             ORDER[2],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                     ID["EPSG",4225]]],
##             TARGETCRS[
##                 GEOGCRS["SIRGAS 2000",
##                     DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##                         ELLIPSOID["GRS 1980",6378137,298.257222101,
##                             LENGTHUNIT["metre",1]]],
##                     PRIMEM["Greenwich",0,
##                         ANGLEUNIT["degree",0.0174532925199433]],
##                     CS[ellipsoidal,2],
##                         AXIS["geodetic latitude (Lat)",north,
##                             ORDER[1],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                         AXIS["geodetic longitude (Lon)",east,
##                             ORDER[2],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                     ID["EPSG",4674]]],
##             METHOD["Geocentric translations (geog2D domain)",
##                 ID["EPSG",9603]],
##             PARAMETER["X-axis translation",-206.05,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8605]],
##             PARAMETER["Y-axis translation",168.28,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8606]],
##             PARAMETER["Z-axis translation",-3.82,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8607]],
##             OPERATIONACCURACY[5.0],
##             ID["EPSG",6193]]],
##     STEP[
##         CONVERSION["UTM zone 25S",
##             METHOD["Transverse Mercator",
##                 ID["EPSG",9807]],
##             PARAMETER["Latitude of natural origin",0,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8801]],
##             PARAMETER["Longitude of natural origin",-33,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8802]],
##             PARAMETER["Scale factor at natural origin",0.9996,
##                 SCALEUNIT["unity",1],
##                 ID["EPSG",8805]],
##             PARAMETER["False easting",500000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8806]],
##             PARAMETER["False northing",10000000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8807]],
##             ID["EPSG",16125]]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Brazil - Corrego Alegre 1970-1972"],
##         BBOX[-33.78,-58.16,-2.68,-34.74]]]
## 
## -------------------------------------
## Operation n°2:
## 
## unknown id, Inverse of UTM zone 25S + Corrego Alegre 1970-72 to SIRGAS 2000 (1) + UTM zone 25S, 2 m, Brazil - Corrego Alegre 1970-1972, at least one grid missing
## 
## PROJ string:
## +proj=pipeline +step +inv +proj=utm +zone=25 +south +ellps=intl +step +proj=hgridshift +grids=CA7072_003.gsb +step +proj=utm +zone=25 +south +ellps=GRS80
## 
## WKT2_2018 string:
## CONCATENATEDOPERATION["Inverse of UTM zone 25S + Corrego Alegre 1970-72 to SIRGAS 2000 (1) + UTM zone 25S",
##     SOURCECRS[
##         PROJCRS["Corrego Alegre 1970-72 / UTM zone 25S",
##             BASEGEOGCRS["Corrego Alegre 1970-72",
##                 DATUM["Corrego Alegre 1970-72",
##                     ELLIPSOID["International 1924",6378388,297,
##                         LENGTHUNIT["metre",1]]],
##                 PRIMEM["Greenwich",0,
##                     ANGLEUNIT["degree",0.0174532925199433]],
##                 ID["EPSG",4225]],
##             CONVERSION["UTM zone 25S",
##                 METHOD["Transverse Mercator",
##                     ID["EPSG",9807]],
##                 PARAMETER["Latitude of natural origin",0,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8801]],
##                 PARAMETER["Longitude of natural origin",-33,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8802]],
##                 PARAMETER["Scale factor at natural origin",0.9996,
##                     SCALEUNIT["unity",1],
##                     ID["EPSG",8805]],
##                 PARAMETER["False easting",500000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8806]],
##                 PARAMETER["False northing",10000000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8807]]],
##             CS[Cartesian,2],
##                 AXIS["(E)",east,
##                     ORDER[1],
##                     LENGTHUNIT["metre",1]],
##                 AXIS["(N)",north,
##                     ORDER[2],
##                     LENGTHUNIT["metre",1]],
##             ID["EPSG",22525]]],
##     TARGETCRS[
##         PROJCRS["SIRGAS 2000 / UTM zone 25S",
##             BASEGEOGCRS["SIRGAS 2000",
##                 DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##                     ELLIPSOID["GRS 1980",6378137,298.257222101,
##                         LENGTHUNIT["metre",1]]],
##                 PRIMEM["Greenwich",0,
##                     ANGLEUNIT["degree",0.0174532925199433]],
##                 ID["EPSG",4674]],
##             CONVERSION["UTM zone 25S",
##                 METHOD["Transverse Mercator",
##                     ID["EPSG",9807]],
##                 PARAMETER["Latitude of natural origin",0,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8801]],
##                 PARAMETER["Longitude of natural origin",-33,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8802]],
##                 PARAMETER["Scale factor at natural origin",0.9996,
##                     SCALEUNIT["unity",1],
##                     ID["EPSG",8805]],
##                 PARAMETER["False easting",500000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8806]],
##                 PARAMETER["False northing",10000000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8807]]],
##             CS[Cartesian,2],
##                 AXIS["(E)",east,
##                     ORDER[1],
##                     LENGTHUNIT["metre",1]],
##                 AXIS["(N)",north,
##                     ORDER[2],
##                     LENGTHUNIT["metre",1]],
##             ID["EPSG",31985]]],
##     STEP[
##         CONVERSION["Inverse of UTM zone 25S",
##             METHOD["Inverse of Transverse Mercator",
##                 ID["INVERSE(EPSG)",9807]],
##             PARAMETER["Latitude of natural origin",0,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8801]],
##             PARAMETER["Longitude of natural origin",-33,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8802]],
##             PARAMETER["Scale factor at natural origin",0.9996,
##                 SCALEUNIT["unity",1],
##                 ID["EPSG",8805]],
##             PARAMETER["False easting",500000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8806]],
##             PARAMETER["False northing",10000000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8807]],
##             ID["INVERSE(EPSG)",16125]]],
##     STEP[
##         COORDINATEOPERATION["Corrego Alegre 1970-72 to SIRGAS 2000 (1)",
##             VERSION["IBGE-Bra"],
##             SOURCECRS[
##                 GEOGCRS["Corrego Alegre 1970-72",
##                     DATUM["Corrego Alegre 1970-72",
##                         ELLIPSOID["International 1924",6378388,297,
##                             LENGTHUNIT["metre",1]]],
##                     PRIMEM["Greenwich",0,
##                         ANGLEUNIT["degree",0.0174532925199433]],
##                     CS[ellipsoidal,2],
##                         AXIS["geodetic latitude (Lat)",north,
##                             ORDER[1],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                         AXIS["geodetic longitude (Lon)",east,
##                             ORDER[2],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                     ID["EPSG",4225]]],
##             TARGETCRS[
##                 GEOGCRS["SIRGAS 2000",
##                     DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##                         ELLIPSOID["GRS 1980",6378137,298.257222101,
##                             LENGTHUNIT["metre",1]]],
##                     PRIMEM["Greenwich",0,
##                         ANGLEUNIT["degree",0.0174532925199433]],
##                     CS[ellipsoidal,2],
##                         AXIS["geodetic latitude (Lat)",north,
##                             ORDER[1],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                         AXIS["geodetic longitude (Lon)",east,
##                             ORDER[2],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                     ID["EPSG",4674]]],
##             METHOD["NTv2",
##                 ID["EPSG",9615]],
##             PARAMETERFILE["Latitude and longitude difference file","CA7072_003.gsb"],
##             OPERATIONACCURACY[2.0],
##             ID["EPSG",5526]]],
##     STEP[
##         CONVERSION["UTM zone 25S",
##             METHOD["Transverse Mercator",
##                 ID["EPSG",9807]],
##             PARAMETER["Latitude of natural origin",0,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8801]],
##             PARAMETER["Longitude of natural origin",-33,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8802]],
##             PARAMETER["Scale factor at natural origin",0.9996,
##                 SCALEUNIT["unity",1],
##                 ID["EPSG",8805]],
##             PARAMETER["False easting",500000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8806]],
##             PARAMETER["False northing",10000000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8807]],
##             ID["EPSG",16125]]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Brazil - Corrego Alegre 1970-1972"],
##         BBOX[-33.78,-58.16,-2.68,-34.74]]]
## 
## Grid CA7072_003.gsb needed but not found on the system.

The input data use the Corrego Alegre 1970-1972 setting, and still provide a +towgs84= key representation for pivoting through WGS84:

olinda <- st_read("data/olinda.gpkg", quiet=TRUE)
st_crs(olinda)
## Coordinate Reference System:
##   No EPSG code
##   proj4string: "+proj=utm +zone=25 +south +ellps=intl +towgs84=-205.57,168.77,-4.12,0,0,0,0 +units=m +no_defs"

We’ll just use one point to check things out:

xy_c <- st_centroid(st_geometry(olinda[  1,]))
st_coordinates(xy_c)
##          X       Y
## 1 295460.1 9120358

If we manually pivot through WGS84 on the way back to SIRGAS2000 UTM, we get:

st_coordinates(st_transform(st_transform(xy_c, 4326), 31985))
##          X       Y
## 1 295489.3 9120352

Without the NTv2 grid file CA7072_003.gsb we seem to get the same:

# without CA7072_003.gsb
st_coordinates(st_transform(xy_c, 31985))
##          X       Y
## 1 295489.3 9120352

but we also get the same with the grid file if we leave the +towgs84= key in the PROJ string:

# with CA7072_003.gsb
st_coordinates(st_transform(xy_c, 31985))
#          X       Y
# 1 295489.3 9120352

If however we manipulate the PROJ string to specify the grid file instead of the +towgs84= key, we can get the improved precision:

# with CA7072_003.gsb
xy_c1 <- xy_c
st_crs(xy_c1) <- "+proj=utm +zone=25 +south +ellps=intl +units=m +nadgrids=CA7072_003.gsb"
print(st_coordinates(st_transform(xy_c1, 31985)), digits=9)
#            X          Y
# 1 295486.396 9120350.62

Let’s try to use the PROJ utility program cs2cs in its PROJ 6 version. The cs2cs version when the grid file is present matches sf::st_transform() when the input CRS is modified to point to the grid file:

# with CA7072_003.gsb
cat(system(paste0("echo ", paste(xy, collapse=" "), " | cs2cs EPSG:22525 EPSG:31985"), intern=TRUE))
# 295486.40 9120350.62 0.00

cs2cs without the grid file gives:

xy <- st_coordinates(xy_c)
# without CA7072_003.gsb
cat(system(paste0("echo ", paste(xy, collapse=" "), " | cs2cs EPSG:22525 EPSG:31985"), intern=TRUE))
## 295488.64    9120352.16 0.00

This matches the second set of +towgs84= coefficients:

# without CA7072_003.gsb
xy_c2 <- xy_c
st_crs(xy_c2) <- "+proj=utm +zone=25 +south +ellps=intl +units=m +towgs84=-206.05,168.28,-3.82,0,0,0,0"
st_coordinates(st_transform(xy_c2, 31985))
##          X       Y
## 1 295488.6 9120352

Using the lwgeom::st_transform_proj() for now uses the proh_api.h interface:

# without CA7072_003.gsb
# -DACCEPT_USE_OF_DEPRECATED_PROJ_API_H
st_coordinates(lwgeom::st_transform_proj(xy_c, 31985))
##          X       Y
## 1 295489.3 9120352

Our reprojected objects in SIRGAS2000 used the WGS84 pivot with one of two possible sets of +towgs84= coefficients:

olinda <- st_read("output/olinda_sirgas2000.gpkg", quiet=TRUE)
xy_c <- st_centroid(st_geometry(olinda[  1,]))
st_coordinates(xy_c)
##          X       Y
## 1 295489.3 9120352

This is the EPSG description of the grid file: https://epsg.io/5541

It was retrieved from: https://www.eye4software.com/files/ntv2/ca70.zip

This page gives a picture of why the changes in PROJ matter - the arrows are in cm per year displacement.

Some grid files are available from https://proj.org/download.html, but because many others are not as freely available (yet), they may need to be dwnloaded from national mapping agencies. Most are relatively large, and also need to be versioned. Do read the README files in the zip archives!

GEOS

A recent upgrade of GEOS from 3.7.1 to 3.7.2 on a CRAN test server led to failures in three packages using rgeos for topological operations. rgeos 0.4-3 set the checkValidity= argument to for example gIntersection() to FALSE (TRUE threw an error if either geometry was invalid). An issue was opened on the sf github repository (rgeos is developed on R-Forge). The test objects (from an example from inlmisc) will be used here:

rgeos::version_GEOS0()
## [1] "3.7.2"
## attr(,"rev")
## [1] "b55d2125"

For rgeos <= 0.4-3, the default was not to check input geometries for validity before trying topological operations, for >= 0.5-1, the default changes when GEOS > 3.7.1 to check for validity. The mode of the argument also changes to integer from logical:

cV_old_default <- ifelse(rgeos::version_GEOS0() >= "3.7.2", 0L, FALSE)
yy <- rgeos::readWKT(readLines("data/invalid.wkt"))
rgeos::gIsValid(yy, byid=TRUE, reason=TRUE)
##                             1                             2 
## "Ring Self-intersection[2 3]"              "Valid Geometry" 
##                             3                             4 
##              "Valid Geometry"              "Valid Geometry"
sf::sf_extSoftVersion()
##           GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
##        "3.7.2"        "3.0.1"        "6.1.1"         "true"         "true"

The same underlyng GEOS code is used in sf:

sf::st_is_valid(sf::st_as_sf(yy), reason=TRUE)
## [1] "Ring Self-intersection[2 3]" "Valid Geometry"             
## [3] "Valid Geometry"              "Valid Geometry"

The geometries were also invalid in GEOS 3.7.1, but the operations succeeded:

ply <- rgeos::readWKT(readLines("data/ply.wkt"))
oo <- try(rgeos::gIntersection(yy, ply, byid=TRUE, checkValidity=cV_old_default), silent=TRUE)
print(attr(oo, "condition")$message)
## [1] "TopologyException: Input geom 0 is invalid: Ring Self-intersection at or near point 2 3 at 2 3"
ooo <- try(sf::st_intersection(sf::st_as_sf(yy), sf::st_as_sf(ply)), silent=TRUE)
print(attr(oo, "condition")$message)
## [1] "TopologyException: Input geom 0 is invalid: Ring Self-intersection at or near point 2 3 at 2 3"

In rgeos 0.5-1 and GEOS 3.7.2, new warnings are provided, and advice to check validity.

cV_new_default <- ifelse(rgeos::version_GEOS0() >= "3.7.2", 1L, TRUE)
try(rgeos::gIntersection(yy, ply, byid=TRUE, checkValidity=cV_new_default), silent=TRUE)
## Warning in RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid"): Ring Self-
## intersection at or near point 2 3
## yy is invalid
## Warning in rgeos::gIntersection(yy, ply, byid = TRUE, checkValidity
## = cV_new_default): Invalid objects found; consider using
## set_RGEOS_CheckValidity(2L)

New options are provided, get_RGEOS_CheckValidity() and set_RGEOS_CheckValidity(), because in some packages the use of topological operations may happen through other packages, such as raster::crop() calling rgeos::gIntersection() without access to the arguments of the latter function.

If we follow the advice, zero-width buffering is used to try to rectify the invalidity:

oo <- rgeos::gIntersection(yy, ply, byid=TRUE, checkValidity=2L)
## Warning in RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid"): Ring Self-
## intersection at or near point 2 3
## yy is invalid
## Attempting to make yy valid by zero-width buffering
rgeos::gIsValid(oo)
## [1] TRUE

equivalently:

oo <- rgeos::gIntersection(rgeos::gBuffer(yy, byid=TRUE, width=0), ply, byid=TRUE, checkValidity=1L)
rgeos::gIsValid(oo)
## [1] TRUE

and by extension to sf until GEOS 3.7.2 is accommodated:

ooo <- sf::st_intersection(sf::st_buffer(sf::st_as_sf(yy), dist=0), sf::st_as_sf(ply))
all(sf::st_is_valid(ooo))
## [1] TRUE

The actual cause was the use of an ESRI/shapefile style/understanding of the self-touching exterior ring. In OGC style, an interior ring is required, but not in shapefile style. Martin Davis responded in the issue:

The problem turned out to be a noding robustness issue, which caused the valid input linework to have a self-touch after noding. This caused the output to be invalid. The fix was to tighten up the internal overlay noding validation check to catch this situation. This has the side-effect of detecting (and failing) all self-touches in input geometry. Previously, vertex-vertex self-touches were not detected, and in many cases they would simply propagate through the overlay algorithm. (This made the output invalid as well, but since the inputs were already invalid this behaviour was considered acceptable).

The change in GEOS behaviour was not planned as such, but has consequences, fortunately detected because CRAN checks by default much more than say Travis by default. Zero-width buffering will not repair all cases of invalidity, but does work here.

Exercise and review

For a later exercise, we’ll be using the Soho cholera data set; I converted the shapefiles from https://asdar-book.org/bundles2ed/die_bundle.zip to GPKG to be more modern (using ogr2ogr in GDAL 3 built against PROJ 6. sf is installed using the proj.h interface in PROJ 6:

buildings <- sf::st_read("data/snow/buildings.gpkg", quiet=TRUE)
st_crs(buildings)
## Coordinate Reference System:
##   No EPSG code
##   proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"

To make an interactive display in mapview(), conversion/transformation to “Web Mercator” is needed - this uses a WGS84 datum. But PROJ 6 has dropped the +datum= tag, so the display is not correctly registered.

library(mapview)
mapview(buildings)

The CRS/SRS values in the GPKG file (it is a multi-table SQLite database) include the datum definition:

library(RSQLite)
db = dbConnect(SQLite(), dbname="data/snow/buildings.gpkg")
dbReadTable(db, "gpkg_spatial_ref_sys")$definition[4]
## [1] "PROJCS[\"Transverse_Mercator\",GEOGCS[\"GCS_OSGB 1936\",DATUM[\"OSGB_1936\",SPHEROID[\"Airy 1830\",6377563.396,299.3249646,AUTHORITY[\"EPSG\",\"7001\"]],AUTHORITY[\"EPSG\",\"6277\"]],PRIMEM[\"Greenwich\",0],UNIT[\"Degree\",0.0174532925199433],AUTHORITY[\"EPSG\",\"4277\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",49],PARAMETER[\"central_meridian\",-2],PARAMETER[\"scale_factor\",0.999601],PARAMETER[\"false_easting\",400000],PARAMETER[\"false_northing\",-100000],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH]]"
dbDisconnect(db)

Maybe using rgdal which is built using PROJ 6 but the legacy proj_api.h interface, and the shapefile as shipped with ASDAR reproduction materials will help?

buildings1 <- rgdal::readOGR("data/snow/buildings.shp", verbose=FALSE)
sp::proj4string(buildings1)
## [1] "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"

No, same problem:

mapview(buildings1)

But the shapefile has the datum definition:

readLines("data/snow/buildings.prj")
## [1] "PROJCS[\"Transverse_Mercator\",GEOGCS[\"GCS_OSGB 1936\",DATUM[\"D_OSGB_1936\",SPHEROID[\"Airy_1830\",6377563.396,299.3249646]],PRIMEM[\"Greenwich\",0],UNIT[\"Degree\",0.017453292519943295]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",49.00000000000002],PARAMETER[\"central_meridian\",-2],PARAMETER[\"scale_factor\",0.999601],PARAMETER[\"false_easting\",400000],PARAMETER[\"false_northing\",-100000],UNIT[\"Meter\",1]]"

So in both cases with PROJ 6, we need to manipulate the CRS read in with the file to insert our choice of how to make the transformation, because the definition as read no longer contains it:

fixed <- "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +nadgrids=OSTN15_NTv2_OSGBtoETRS.gsb +units=m +no_defs"
st_crs(buildings) <- fixed
sp::proj4string(buildings1) <- sp::CRS(fixed)
mapview(buildings)
mapview(buildings1)

GIS bridges (description and using GRASS and rgrass7)

GIS interfaces

Because GIS can be used as databases, and their tools can be better suited to some analyses and operations, it may be sensible to use one in addition to data analysis software. There is an extra effort required when using linked software systems, because they may not integrate easily. Since R is open source, and R spatial packages use open source components, staying with open source GIS means that many of the underlying software components are shared. This certainly applies to R and GRASS, and through GRASS, also to R and QGIS — QGIS is more file-based than GRASS, which has an underlying data storage specification.

GIS interfaces can be as simple as just reading and writing files using loose coupling, once the file formats have been worked out, that is. The GRASS 7 interface rgrass7 on CRAN is the current, stable interface. In addition to the GRASS interface, which is actively maintained, there are several others: link2GI packages interfaces to several GI systems; RQGIS is for QGIS but links through to GRASS and SAGA (Muenchow, Schratz, and Brenning 2017) using reticulate; RSAGA links to, scripting and running SAGA from R; rpostgis is for PostGIS (Bucklin and Basille 2018). The arcgisbinding package is published and distributed by ESRI using Github, and provides some file exchange facilities for vector and attribute data (newer versions may have raster too).

Layering of shells

The interface between R and GRASS uses the fact that GRASS modules can be run as command line programs at the shell prompt. The shell has certain environment variables for GRASS set, for example saying where the data is stored, but is otherwise a regular shell, from which R can be started. This instance of R inherits the environment variables set by GRASS

Finally, although for research purposes it may be prefered to have data analysis software, such as R, facing the user, it is possible to try to embed this component in the workflow, so that the end user does not need so much training — but then an ``expert’’ has to codify the steps needed in advance.

Two sides of the R/GRASS interface

The R/GRASS interface came into being in 1998/1999, and is covered in Bivand (2000) and a conference paper by Bivand and Neteler; and Bivand (2014). R was started in a GRASS LOCATION, and spatial data was exchanged between GRASS and R, running as it were in tandem; the workflows were not integrated. spgrass6 and its use discussed in Neteler and Mitasova (2008) continued this approach, but about that time steps were taken to permit scripting GRASS from R in existing LOCATIONs, like RSAGA. Shortly afterwards, spgrass6 and now rgrass7 introduced the possibility of creating a temporary GRASS LOCATION permitting GIS operations on data from the R side.

GRASS sessions

The package may be used in two ways, either in an R session started from within a GRASS session from the command line, or with the initGRASS() function. The function may be used with an existing GRASS location and mapset, or with a one-time throw-away location, and takes the GRASS installation directory as its first argument. It then starts a GRASS session within the R session, and is convenient for scripting GRASS in R, rather than Python, which is be the GRASS scripting language in GRASS 7. Other arguments to initGRASS() may be used to set up the default region using standard tools like Sys.setenv; resolution and projection may be set or reset subsequently.

Running GRASS from R

Each GRASS command takes an --interface-description flag, which when run returns an XML description of its flags and parameters. These descriptions are used by the GRASS GUI to populate its menus, and are also used in rgrass7 to check that GRASS commands are used correctly. This also means that the parseGRASS function can set up an object in a searchable list on the R side of the interface, to avoid re-parsing interface descriptions that have already been encountered in a session.

The middle function is doGRASS, which takes the flags and parameters chosen, checks their validity — especially type (real, integer, string), and constructs a command string. Note that multiple parameter values should be a vector of values of the correct type. Finally, execGRASS uses the system or system2 function to execute the GRASS command with the chosen flag and parameter values; the intern= argument asks that what GRASS returns be placed in an R object.

In general use, execGRASS calls doGRASS, which in turn calls parseGRASS. Use of execGRASS has been simplified to permit parameters to be passed through the R ellipsis (\(\ldots\)) argument structure. Consequently, the scripter can readily compare the help page of any GRASS command with the version of the value returned by parseGRASS showing the parameters and flags expected. GRASS add-ons are also accommodated in the same parseGRASS procedure of parsing and caching. We will not need more complex setups here, but it is easy to see that for example execGRASS may be run in an R loop with varying parameter values.

Initialize temporary GRASS session

Here we need three objects to be created, and also set override= to TRUE, as this document may be run many times. initGRASS() looks for an environment variable that GRASS sessions set (GISRC) pointing to a file of GRASS environment variables. Real GRASS sessions remove it on exit, but this interface does not (yet) provide for its removal, hence the need here to override.

library(sf)
olinda_sirgas2000 <- st_read("output/olinda_sirgas2000.gpkg", quiet=TRUE)
bounds <- st_sf(st_union(olinda_sirgas2000))
SG <- maptools::Sobj_SpatialGrid(as(bounds, "Spatial"), n=1000000)$SG

From rgrass7 0.2-1, the user needs to flag whether sf/stars or sp/rgdal object representations are being used, with use_sp() or use_sf(). This is only needed when objects rather than commands move across the interface; because no stars support is yet present, we need to use sp and rgdal support to set the location resolution.

library(rgrass7)
## Loading required package: XML
## GRASS GIS interface loaded with GRASS version: (GRASS not running)
packageVersion("rgrass7")
## [1] '0.2.1'
use_sp()
myGRASS <- "/home/rsb/topics/grass/g761/grass76"
myPROJSHARE <- "/usr/local/share/proj"
if (Sys.getenv("GRASS_PROJSHARE") == "") Sys.setenv(GRASS_PROJSHARE=myPROJSHARE)
loc <- initGRASS(myGRASS, tempdir(), SG=SG, override=TRUE)

Setting the projection correctly

As yet initGRASS does not set the projection from the input "SpatialGrid" object, so we have to do it ourselves, showing how to pass R objects to GRASS parameters:

execGRASS("g.mapset", mapset="PERMANENT", flag="quiet")
execGRASS("g.proj", flag="c", proj4=st_crs(bounds)$proj4string)
## Default region was updated to the new projection, but if you have multiple
## mapsets `g.region -d` should be run in each to update the region from the
## default
## Projection information updated
execGRASS("g.mapset", mapset=loc$MAPSET, flag="quiet")
execGRASS("g.region", flag="d")

We read the elevation data downloaded before into the GRASS location directly:

execGRASS("r.in.gdal", flag=c("overwrite", "quiet"), input="output/elevation.tif", output="dem")
execGRASS("g.region", raster="dem")

Next, we run r.watershed on this high resolution digital elevation model, outputting raster stream lines, then thinned with r.thin:

execGRASS("r.watershed", flag=c("overwrite", "quiet"), elevation="dem", stream="stream", threshold=2500L, convergence=5L, memory=300L)
execGRASS("r.thin", flag=c("overwrite", "quiet"), input="stream", output="stream1", iterations=200L)

To mask the output object we switch to the sf vector representation, copy bounds to GRASS, and set a raster mask using the bounds of the union of tracts. Then we convert the thinned stream lines within the mask to vector representation, and copy this object from GRASS to the R workspace. In both cases, we use GPKG representation for intermediate files.

use_sf()
writeVECT(bounds, "bounds", v.in.ogr_flags=c("overwrite", "quiet"))
## Writing layer `bounds' to data source `/tmp/RtmpLRygg1/file7373da7de50/file73735995f0ef/.tmp/reclus.nhh.no/75.0.gpkg' using driver `GPKG'
## options:        OVERWRITE=YES 
## features:       1
## fields:         0
## geometry type:  Polygon
execGRASS("r.mask", vector="bounds", flag=c("overwrite", "quiet"))
execGRASS("r.to.vect", flag=c("overwrite", "quiet"), input="stream1", output="stream", type="line")
## Warning in execGRASS("r.to.vect", flag = c("overwrite", "quiet"), input = "stream1", : The command:
## r.to.vect --overwrite --quiet input=stream1 output=stream type=line
## produced at least one warning during execution:
## WARNING: Memory leak: 36 points are still in use
## WARNING: Memory leak: 36 points are still in use
imputed_streams <- readVECT("stream", ignore.stderr=TRUE)
library(mapview)
mapview(imputed_streams)

We can also calculate geomorphometric values, including the simple slope and aspect values for the masked raster using r.slope.aspect. If we then move the Olinda setor boundaries to GRASS, we can use v.rast.stats to summarize the raster values falling within each setor, here for the geomorphometric measures.

execGRASS("r.slope.aspect", elevation="dem", slope="slope", aspect="aspect", flag=c("quiet", "overwrite"))
writeVECT(olinda_sirgas2000[, "SETOR_"], "olinda", ignore.stderr=TRUE, v.in.ogr_flags=c("overwrite", "quiet"))
execGRASS("v.rast.stats", map="olinda", raster=c("slope", "aspect"), method=c("first_quartile", "median", "third_quartile"), column_prefix=c("slope", "aspect"), flag=c("c", "quiet"))

We can do the same for the Landsat 7 NDVI values:

execGRASS("r.in.gdal", flag=c("overwrite", "quiet"), input="output/L7_ndvi.tif", output="ndvi")
execGRASS("g.region", raster="ndvi")
execGRASS("v.rast.stats", map="olinda", raster="ndvi", method=c("first_quartile", "median", "third_quartile"), column_prefix="ndvi", flag=c("c", "quiet"))
olinda_gmm_ndvi <- readVECT("olinda", ignore.stderr=TRUE)
head(olinda_gmm_ndvi)
## Simple feature collection with 6 features and 11 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 295330.3 ymin: 9119339 xmax: 296729.1 ymax: 9120522
## epsg (SRID):    NA
## proj4string:    +proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
##   cat SETOR_ slope_first_quartile slope_median slope_third_quartile
## 1   1      2             1.017030      2.55044              3.02056
## 2   2      3             1.270000      2.60012              3.04049
## 3   3      4             0.574040      2.57144              3.62007
## 4   4      5             0.136963      2.05871              2.98215
## 5   5      6             0.747998      2.62333              3.76261
## 6   6      7             0.503164      1.94426              2.80776
##   aspect_first_quartile aspect_median aspect_third_quartile
## 1               36.2456      117.8670               233.862
## 2               90.1409      135.0760               180.256
## 3               29.8119       89.2267               155.906
## 4                2.1586       90.2566               144.705
## 5               85.8933      123.0810               180.247
## 6               25.9332       83.2747               132.841
##   ndvi_first_quartile ndvi_median ndvi_third_quartile
## 1            0.253968    0.338464           0.4067800
## 2           -0.157143    0.191008           0.3777780
## 3           -0.194175   -0.159420          -0.1081080
## 4           -0.161290   -0.106383           0.0000000
## 5           -0.189873   -0.155280          -0.1063830
## 6           -0.187879   -0.154762          -0.0909091
##                             geom
## 1 POLYGON ((295549.1 9120119,...
## 2 POLYGON ((295589.3 9120184,...
## 3 POLYGON ((295938.3 9120468,...
## 4 POLYGON ((295374.4 9119990,...
## 5 POLYGON ((295589.3 9120184,...
## 6 POLYGON ((296634.3 9119829,...

Exercises and review

Broad Street Cholera Data

Even though we know that John Snow already had a working hypothesis about cholera epidemics, his data remain interesting, especially if we use a GIS to find the street distances from mortality dwellings to the Broad Street pump in Soho in central London. Brody et al. (2000) point out that John Snow did not use maps to find the Broad Street pump, the polluted water source behind the 1854 cholera epidemic, because he associated cholera with water contaminated with sewage, based on earlier experience.

Broad Street Cholera Data

The basic data to be used here were made available by Jim Detwiler, who had collated them for David O’Sullivan for use on the cover of O’Sullivan and Unwin (2003), based on earlier work by Waldo Tobler and others. The files were a shapefile of counts of deaths at front doors of houses, two shapefiles of pump locations and a georeferenced copy of the Snow map as an image; the files were registered in the British National Grid CRS. These have been converted to GPKG format. In GRASS, a suitable location was set up in this CRS and the image file was imported; the building contours were then digitised as a vector layer and cleaned.

We would like to find the line of equal distances shown on the extract from John Snow’s map shown in Brody et al. (2000) shown here, or equivalently find the distances from the pumps to the front doors of houses with mortalities following the roads, not the straight line distance. We should recall that we only have the locations of counts of mortalities, not of people at risk or of survivors.

library(sf)
bbo <- st_read("data/snow/bbo.gpkg")
## Reading layer `bbo' from data source `/home/rsb/presentations/geomed19-workshop/data/snow/bbo.gpkg' using driver `GPKG'
## Simple feature collection with 1 feature and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 528890.7 ymin: 180557.9 xmax: 529808.7 ymax: 181415.7
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs
fixed <- "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +nadgrids=OSTN15_NTv2_OSGBtoETRS.gsb +units=m +no_defs"
st_crs(bbo) <- fixed
library(rgrass7)
myPROJSHARE <- "/usr/local/share/proj"
if (Sys.getenv("GRASS_PROJSHARE") == "") Sys.setenv(GRASS_PROJSHARE=myPROJSHARE)
myGRASS <- "/home/rsb/topics/grass/g761/grass76"
td <- tempdir()
SG <- maptools::Sobj_SpatialGrid(as(bbo, "Spatial"))$SG
use_sp()
soho <- initGRASS(gisBase=myGRASS, home=td, SG=SG, override=TRUE)
soho
## gisdbase    /tmp/RtmpLRygg1 
## location    file737342f9e487 
## mapset      file73734724a564 
## rows        94 
## columns     100 
## north       181420.7 
## south       180557.9 
## west        528890.7 
## east        529808.7 
## nsres       9.178723 
## ewres       9.18 
## projection  NA
MAPSET <- execGRASS("g.mapset", flags="p", intern=TRUE)
execGRASS("g.mapset", mapset="PERMANENT", flags="quiet")
execGRASS("g.proj", flags=c("p", "quiet"))
## XY location (unprojected)
execGRASS("g.proj", proj4=st_crs(bbo)$proj4string, flags=c("c", "quiet"))
execGRASS("g.mapset", mapset=MAPSET, flags="quiet")
execGRASS("g.region", flags="p", intern=TRUE)[3:11]
## [1] "north:      181420.7"  "south:      180557.9"  "west:       528890.7" 
## [4] "east:       529808.7"  "nsres:      9.1787234" "ewres:      9.18"     
## [7] "rows:       94"        "cols:       100"       "cells:      9400"
execGRASS("g.region", flags="a", res="1")
execGRASS("g.region", flags="p", intern=TRUE)[3:11]
## [1] "north:      181421" "south:      180557" "west:       528890"
## [4] "east:       529809" "nsres:      1"      "ewres:      1"     
## [7] "rows:       864"    "cols:       919"    "cells:      794016"
buildings <- st_read("data/snow/buildings.gpkg", quiet=TRUE)
st_crs(buildings) <- fixed
deaths <- st_read("data/snow/deaths.gpkg", quiet=TRUE)
st_crs(deaths) <- fixed
sum(deaths$Num_Css)
## [1] 578
b_pump <- st_read("data/snow/b_pump.gpkg", quiet=TRUE)
st_crs(b_pump) <- fixed
nb_pump <- st_read("data/snow/nb_pump.gpkg", quiet=TRUE)
st_crs(nb_pump) <- fixed
use_sf()
fl <- c("overwrite", "quiet")
writeVECT(bbo, vname="bbo", v.in.ogr_flags=c("o", fl), ignore.stderr=TRUE)
writeVECT(buildings[,1], vname="buildings", v.in.ogr_flags=c("o", fl), ignore.stderr=TRUE)
writeVECT(b_pump, vname="b_pump", v.in.ogr_flags=c("o", fl), ignore.stderr=TRUE)
writeVECT(nb_pump, vname="nb_pump", v.in.ogr_flags=c("o", fl), ignore.stderr=TRUE)
writeVECT(deaths, vname="deaths", v.in.ogr_flags=c("o", fl), ignore.stderr=TRUE)
execGRASS("g.list", type="vector", intern=TRUE)
## [1] "b_pump"    "bbo"       "buildings" "deaths"    "nb_pump"

GIS workflow

The buildings vector layer should be converted to its inverse (not buildings), and these roads should then be buffered to include the front doors (here 4m). These operations can be done in the raster or vector representation, but the outcome here will be a raster object from which to find the cost in 1 metre resolution of moving from each front door to each pump. We then need to extract the distance to the Broad Street pump, and to the nearest other pump, for each front door. We could also use vector street centre lines to build a network, and used graph-based methods to find the shortest paths from each front door to the pumps.

Create roads and convert to raster

First, we cut the buildings out of the extent polygon to leave the roads. Having set the region resolution to 1x1m squares we can convert the vector roads to raster, and can tabulate raster cell values, where asterisks are missing data cells:

execGRASS("v.overlay", ainput="buildings", binput="bbo", operator="xor", output="roads", flags=fl, ignore.stderr = TRUE)
execGRASS("v.to.rast", input="roads", output="rroads", use="val", value=1, flags=fl)
execGRASS("r.stats", input="rroads", flags=c("c", "quiet"))
## 1 263192
## * 530824

Buffer and reclass

We also need to buffer out the roads by an amount sufficient to include the the front door points within the roads — 4m was found by trial and error and may be too much, giving shorter distances than a thinner buffer would yield. Reclassification of the raster to give only unit cost is also needed:

execGRASS("r.buffer", input="rroads", output="rroads4", distances=4, flags=fl)
execGRASS("r.stats", input="rroads4", flags=c("c", "quiet"))
## 1 263192
## 2 152446
## * 378378
tf <- tempfile()
cat("1 2 = 1\n", file=tf)
execGRASS("r.reclass", input="rroads4", output="rroads4a", rules=tf, flags=fl)
execGRASS("r.stats", input="rroads4a", flags=c("c", "quiet"))
## 1 415638
## * 378378

Generate distance maps

The r.cost command returns a raster with cells set as the cost of moving from the vector start point or points to each cell; we do this twice, once for the Broad Street pump, and then for the other pumps:

execGRASS("r.cost", input="rroads4a", output="dist_broad", start_points="b_pump", flags=fl)
execGRASS("r.cost", input="rroads4a", output="dist_not_broad", start_points="nb_pump", flags=fl)

Pump to front door distances

Finally, we examine the values of these two distance maps at the front door points, and add these fields (columns) to the vector mortality map:

execGRASS("v.db.addcolumn", map="deaths", columns="broad double precision", flags="quiet")
execGRASS("v.what.rast", map="deaths", raster="dist_broad", column="broad", flags="quiet")
execGRASS("v.db.addcolumn", map="deaths", columns="not_broad double precision", flags="quiet")
execGRASS("v.what.rast", map="deaths", raster="dist_not_broad", column="not_broad", flags="quiet")

Mortality counts by pump nearness

Moving the data back to R from GRASS permits operations on the distance values. We set the logical variable b_nearer to TRUE if the distance to the Broad Street pump is less than the distance to the nearest other pump:

deaths1 <- readVECT("deaths", ignore.stderr=TRUE)
deaths1$b_nearer <- deaths1$broad < deaths1$not_broad
by(deaths1$Num_Css, deaths1$b_nearer, sum)
## deaths1$b_nearer: FALSE
## [1] 221
## -------------------------------------------------------- 
## deaths1$b_nearer: TRUE
## [1] 357

Using R as a GIS (topological and other operations)

There is a recently published article (https://onlinelibrary.wiley.com/doi/10.1111/ecog.04617) on the landscapemetrics package (Hesselbarth et al., n.d.); there are now many R packages for operations previously performed in GIS. It may be the case that even for moderate data set sizes, GIS are more performant, and this will almost certainly be the case for larger data sets, although sensible use of proxy datasets not in memory, and scaling operations to evaluate for the required output resolution may help. Here we first use the imputed streams to show topological operations, first the proportion of the area of setors that are within 50m of an imputed stream:

water_buf_50 <- st_buffer(imputed_streams, dist=50)
setor_area <- st_area(olinda_sirgas2000)
near_water0 <- st_intersection(olinda_sirgas2000[,"SETOR_"], water_buf_50[,"cat"])
near_water <- aggregate(near_water0, by=list(near_water0$SETOR_), head, n=1)
area_near_water <- st_area(near_water)
olinda_sirgas2000$setor_area <- setor_area
o <- match(near_water$SETOR_, olinda_sirgas2000$SETOR_)
olinda_sirgas2000$area_near_water <- 0
olinda_sirgas2000$area_near_water[o] <- area_near_water
olinda_sirgas2000$prop_near_water <- olinda_sirgas2000$area_near_water/olinda_sirgas2000$setor_area
summary(olinda_sirgas2000$prop_near_water)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.07342 0.19402 0.23066 0.35662 0.90232
library(tmap)
tm_shape(olinda_sirgas2000) + tm_fill("prop_near_water", palette="Blues", style="fisher", n=5)

Next, we examine the length of imputed streams per unit area by setor, again using intersection followed by aggregation to the setors:

streams_by_setor <- st_intersection(olinda_sirgas2000[,"SETOR_"], imputed_streams[,"cat"])
lngths0 <- aggregate(streams_by_setor, by=list(streams_by_setor$SETOR_), head, n=1)
lngths <- st_length(lngths0)
units(lngths) <- "mm"
o <- match(lngths0$SETOR_, olinda_sirgas2000$SETOR_)
olinda_sirgas2000$lngths <- 0
olinda_sirgas2000$lngths[o] <- lngths
olinda_sirgas2000$lngth_area <- olinda_sirgas2000$lngths/olinda_sirgas2000$setor_area
tm_shape(olinda_sirgas2000) + tm_fill("lngth_area", palette="Blues", style="fisher", n=5)

These two measures are highly correlated.

cor(olinda_sirgas2000$lngth_area, olinda_sirgas2000$prop_near_water)
## [1] 0.9655318

We can also use raster for operations on the raster objects that we have at our disposal.

library(raster)
## Loading required package: sp
r <- raster("output/elevation.tif")
r
## class      : RasterLayer 
## dimensions : 3096, 3097, 9588312  (nrow, ncol, ncell)
## resolution : 4.73, 4.7  (x, y)
## extent     : 286508, 301156.8, 9106220, 9120771  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=25 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## source     : /home/rsb/presentations/geomed19-workshop/output/elevation.tif 
## names      : elevation

We can use raster::terrain() to calculate geomorphometric measures:

slope_aspect <- terrain(r, opt=c('slope','aspect'), unit='degrees', neighbors=8)

and extract median values by setor polygon:

slopes <- extract(slope_aspect, olinda_sirgas2000, fun=median)

as well as the median NDVI values:

ndvi <- extract(raster("output/L7_ndvi.tif"), olinda_sirgas2000, fun=median)
summary(ndvi[,1])
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.26136 -0.15532 -0.10843 -0.07168 -0.02278  0.33846
if (exists("olinda_gmm_ndvi")) summary(olinda_gmm_ndvi$ndvi_median)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.26136 -0.15532 -0.10843 -0.07173 -0.02278  0.33846
summary(slopes[,1])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.050   2.542   3.067   4.602   6.406  12.280      13
if (exists("olinda_gmm_ndvi")) summary(olinda_gmm_ndvi$slope_median)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.050   2.551   3.021   4.511   6.123  12.280
summary(slopes[,2])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   64.53  139.78  177.27  171.60  195.18  312.25      13
if (exists("olinda_gmm_ndvi")) summary(olinda_gmm_ndvi$aspect_median)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   24.98  133.60  173.99  171.87  206.51  318.07

Exercises and review

As there is a small difference between the CRS values, we copy across before conducting an intersection operation to clip the buildings to the boundary, then we buffer in the buildings object (to make the roads broader).

library(sf)
buildings1 <- st_intersection(buildings, bbo)
buildings2 <- st_buffer(buildings1, dist=-4)
library(mapview)
mapview(buildings2)

Next we create a dummy raster using raster with 1 meter resolution in the extent of the buildings object (note that raster::extent() works with sf objects, but the CRS must be given as a string):

library(raster)
resolution <- 1
r <- raster(extent(buildings2), resolution=resolution, crs=fixed)
r[] <- resolution
summary(r)
##         layer
## Min.        1
## 1st Qu.     1
## Median      1
## 3rd Qu.     1
## Max.        1
## NA's        0

One of the building3 component geometries was empty (permitted in sf, not in sp), so should be dropped before running raster::cellFromPolygon() to list raster cells in each geometry (so we need unlist() to assign NA to the in-buffered buildings):

buildings3 <- as(buildings2[!st_is_empty(buildings2),], "Spatial")
cfp <- cellFromPolygon(r, buildings3)
is.na(r[]) <- unlist(cfp)
summary(r)
##          layer
## Min.         1
## 1st Qu.      1
## Median       1
## 3rd Qu.      1
## Max.         1
## NA's    330537
library(mapview)
mapview(r)

Using gdistance, we create a symmetric transition object with an internal sparse matrix representation, from which shortest paths can be computed:

library(gdistance)
tr1 <- transition(r, transitionFunction=function(x) 1/mean(x), directions=8, symm=TRUE)

We need to find shortest paths from addresses with mortalities to the Broad Street pump first:

sp_deaths <- as(deaths, "Spatial")
d_b_pump <- st_length(st_as_sfc(shortestPath(tr1, as(b_pump, "Spatial"), sp_deaths, output="SpatialLines")))

and then in a loop from the same addresses to each of the other pumps in turn, finally taking the minimum:

res <- matrix(NA, ncol=nrow(nb_pump), nrow=nrow(deaths))
sp_nb_pump <- as(nb_pump, "Spatial")
for (i in 1:nrow(nb_pump)) res[,i] <- st_length(st_as_sfc(shortestPath(tr1, sp_nb_pump[i,], sp_deaths, output="SpatialLines")))
d_nb_pump <- apply(res, 1, min)

Because sf::st_length() uses units units, but they get lost in assigning to a matrix, we need to re-assign before testing whether the Broad Street pump is closer or not:

library(units)
## udunits system database from /usr/share/udunits
units(d_nb_pump) <- "m"
deaths$b_nearer <- d_b_pump < d_nb_pump
by(deaths$Num_Css, deaths$b_nearer, sum)
## deaths$b_nearer: FALSE
## [1] 217
## -------------------------------------------------------- 
## deaths$b_nearer: TRUE
## [1] 361

Bivand, R. S. 2000. “Using the Statistical Data Analysis Language on GRASS 5.0 GIS Data Base Files.” Computers and Geosciences 26: 1043–52.

———. 2014. “GeoComputation and Open-Source Software : Components and Software Component Stacks.” In GeoComputation (Second Edition), edited by Robert J. Abrahart and Linda See, 329–255. Boca Raton, FL: CRC Press.

Brody, H., M. R. Rip, P. Vinten-Johansen, N. Paneth, and S. Rachman. 2000. “Map-Making and Myth-Making in Broad Street: The London Cholera Epidemic, 1854.” Lancet 356: 64–68.

Bucklin, David, and Mathieu Basille. 2018. “Rpostgis: Linking R with a PostGIS Spatial Database.” The R Journal 10 (1): 251–68. https://journal.r-project.org/archive/2018/RJ-2018-025/index.html.

Evers, Kristian, and Thomas Knudsen. 2017. Transformation Pipelines for Proj.4. https://www.fig.net/resources/proceedings/fig_proceedings/fig2017/papers/iss6b/ISS6B_evers_knudsen_9156.pdf.

Hesselbarth, Maximilian H. K., Marco Sciaini, Kimberly A. With, Kerstin Wiegand, and Jakub Nowosad. n.d. “Landscapemetrics: An Open-Source R Tool to Calculate Landscape Metrics.” Ecography 0 (0). https://doi.org/10.1111/ecog.04617.

Knudsen, Thomas, and Kristian Evers. 2017. Transformation Pipelines for Proj.4. https://meetingorganizer.copernicus.org/EGU2017/EGU2017-8050.pdf.

Muenchow, Jannes, Patrick Schratz, and Alexander Brenning. 2017. “RQGIS: Integrating R with QGIS for Statistical Geocomputing.” The R Journal 9 (2): 409–28. https://rjournal.github.io/archive/2017/RJ-2017-067/RJ-2017-067.pdf.

Neteler, M., and H. Mitasova. 2008. Open Source GIS: A GRASS GIS Approach, Third Edition. New York: Springer.

O’Sullivan, D., and D. J. Unwin. 2003. Geographical Information Analysis. Hoboken, NJ: Wiley.