Required current contributed CRAN packages:

I am running GRASS 8.2.0 and R 4.2.1, with recent update.packages(). Please install a recent GRASS https://grass.osgeo.org/download and a recent R https://cran.r-project.org/ before reaching Florence. The CRAN contributed packages used are:

needed <- c("rgrass", "XML", "raster", "stars", "abind", "sp", "sf", "terra")

and may be installed when online using this code (you will be asked which repository to access):

inst <- match(needed, .packages(all=TRUE))
need <- which(is.na(inst))
if (length(need) > 0) install.packages(needed[need])

Script

Script and data at https://github.com/rsbivand/foss4g_2022/raw/main/modernizing_220822.zip. Download to suitable location, unzip and use as basis.

Testing installation

Download and unzip the data and scripts, and, after editing the string containing the GRASS installation GISBASE, try to run:

source("modernizing_220822.R", echo=TRUE)

FOSS foundations and barn-raising

Both R and GRASS are FOSS. Both build on FOSS foundations, GRASS directly and R through contributed packages that the user may choose to install in R libraries.

Before barn-raising

So what are the shared FOSS foundations? Briefly, PROJ, GEOS and GDAL, first shown in the pre-barn-raising state:

PROJ (then often called Proj4) provides coordinate reference systems (CRS) definitions, and projection and datum transformation operations; GEOS provides planar topological predicates and operations; GDAL provides abstractions for reading, writing, manipulating and transforming GIS data using drivers for raster and vector data.

PROJ < 6 and GDAL < 3 used to ship with shared flat text files providing metadata about coordinate reference systems. Pre-PROJ 6, PROJ 5 introduced pipelines for projection and datum transformation operations Evers and Knudsen (2017). PROJ 4 had been stuck with an outdated WGS84 (World Geodetic System 1984 and +towgs84= key-value pairs) datum transformation hub introducing errors of up to 2m; all datum transformations were forced through WGS84 as an intermediate product.

Barn-raising

Following the Barn Raising project (https://gdalbarn.com/), PROJ 6 and GDAL 3 were released together, with GDAL 3 requiring PROJ >= 6. The most obvious changes were that PROJ now stored CRS, projection and transformation metadata in an SQLite database, also used by GDAL, and that most of the legacy flat text files were replaced by database tables. The database file made it much easier to adopt OGC WKT2 (2019) for specifying CRS. This provided the information needed to drop the WGS84 transformation hub in most cases, and to construct direct transformation pipelines from database lookup:

CDN or bundle grid files?

The most accurate direct transformation pipelines may need transformation grids rather than 3 or 7 parameter transformation models applied equally over the area of interest. While the legacy proj-datumgrid zipfile never exceeded 6.3 MiB, the current proj-data zipfile is 562.5 MiB (https://download.osgeo.org/proj/). Since it is unlikely that users need transformation grids for all continents, one can download by continent (Oceania, North America, Europe), or for global transformations, but these are still all large. The volume of transformation grids released under permissive licenses and hosted by the CDN will continue to increase rapidly.

Instead of installing lots of unneeded grids, which may become stale, use can be made of an on-demand content download network https://cdn.proj.org from within PROJ, using CURL for download and TIFF as the unified grid format (Cloud Optimized GeoTiff). On-demand use creates a user-writeable cache.db SQLite database of grid chunks, which may be used across applications looking at the same PROJ_LIB directory:

Modernizing R packages

Pre barn-raising

When the first R-GRASS interface package was written (R. S. Bivand 2000), a small subset of GRASS GIS library was included in the R package GRASS, to permit GRASS 5 rasters to be read and written. When the R package rgdal became available, the next two interface iterations spgrass6 for GRASS 6 and rgrass7 for GRASS 7 used it to read and write temporary raster and vector files using GDAL. In addition to the GRASS interface packages, many other R packages used the functionality of PROJ and GDAL then provided by rgdal:

One complication on platforms such as Windows and macOS for which the R archive network provides binary packages, is that the PROJ and GDAL shared metadata files are bundled:

The versions of these metadata files may differ from those bundled with other binary software (such as OSGeo4W or the GRASS Windows standalone).

Addressing barn-raising in R packages

R. Bivand (2021) describes the steps taken to modernize rgdal as a useful place to prototype adaptations - see also R. Bivand (2022). For vector data, the sf simple features package is strongly recommended for new work, while for vector and raster spatial and spatio-temporal arrays, stars may be used. Both of these are part of the informal r-spatial GitHub organization (https://r-spatial.org, https://github.com/r-spatial). sf links to PROJ, GDAL and GEOS, so is able to replace rgdal and rgeos. rgdal perhaps uses PROJ directly rather than PROJ through GDAL, but most of what was learned about changes in PROJ and GDAL is shared.

In addition, rgdal had been used extensively by the raster package, which itself did not link to external software libraries. raster is being re-implemented in the terra package, which is now almost complete. terra, like sf, links to PROJ, GDAL and GEOS, so provides the same file handling functionality as rgdal with regard to the R-GRASS interface, but for both vector and raster objects. rgdal, sf and terra can use the WKT2 (2019) CRS representation.

GDAL RRASTER driver

GDAL has had an RRASTER raster driver to read and write files created by the R raster package since 2.2 (https://gdal.org/drivers/raster/rraster.html#raster-rraster). Until 3.5, the driver used Proj4 strings to represent CRS, but from 3.5.0 may also use WKT2 (2019) as the preferred representation (for terra and raster; the latter uses pure R code not GDAL).

Modernizing the R-GRASS interface

Choices and issues

Simply permitting the rgrass7 package to interface GRASS 8 as well as GRASS 7 was not sufficient, because rgdal will be retired during 2023. So rgrass is the new interface package. It has been tested with GRASS 7.8.6 as well as GRASS 8. After examining the use of sf and stars compared to terra, it was found that of the packages providing services from PROJ, GEOS and GDAL, terra was a better fit, in having object structures more like traditional GIS. sf is innovating fast, including the use of s2 for spherical geometry topology predicates and operations, and spatio-temporal arrays in stars; these may be needed later, but for now terra is a good fit for the needs of the interface:

Before continuing to a proper description of the interface, one practical problem needs to be mentioned: bundling. R packages for Windows and macOS are built as static binary packages: if they contain compiled C, C++ or Fortran code linking to external software libraries like PROJ, GDAL or GEOS, they are statically linked to those libraries. This means not only that the versions of the external software libraries are fixed, but that (at present) each such R package also bundles a copy of the shared PROJ and GDAL metadata directories. The GRASS distributions for macOS and Windows (both stand-alone and OSGeo4W) bundle shared PROJ and GDAL metadata directories. The Windows distributions further bundle the complete PROJ set of transformation grids.

If the user is not careful, and if the underlying version of PROJ or GDAL is upgraded during an R minor version (say 4.2, during the calendar year approximately April to April), sf and terra may have different versions too. This has for example emerged in https://github.com/rsbivand/rgrass/issues/57; it would be useful if the internal PROJ function found in the issue was exposed in the API for downstream code (like R packages and GRASS) to use.

Introduction to the interface

The original R-GRASS interface (R. S. Bivand 2000; Neteler and Mitasova 2008) was written mainly to permit R objects represented as sp class objects to be moved to GRASS, and GRASS objects to be moved to R for statistical analysis. From spgrass6 0.6-3 (April 2009) following a fruitful workshop at Queen’s University, Belfast, the interface was re-written to use the --interface-description flag provided for each GRASS command, also used by the Python interface to GRASS commands. Command interface descriptions are parsed from XML and cached as R objects for efficiency. The current version of the R-GRASS interface is rgrass. In addition, an R function initGRASS() was written to permit GRASS to be started from within R to which we will return below.

It is worth stating that the number of years that the current package maintainer can continue in that role is limited, and it would be excellent if the maintainer role is transferred at the latest during 2024 (that is, shortly after the retirement of rgdal, rgeos and maptools).

Starting R inside GRASS

When starting GRASS GIS from a terminal console (here GRASS 8.2.0), one can continue in the GRASS terminal console, starting an interactive R session from there (here R 4.2.1). Loading and attaching the R-GRASS interface package rgrass in the R session, we see that the current GRASS location is detected and reported:

$ /home/rsb/topics/grass/g820/bin/grass /home/rsb/topics/grassdata/nc_basic_spm_grass7/rsb
Starting GRASS GIS...
Cleaning up temporary files...

          __________  ___   __________    _______________
         / ____/ __ \/   | / ___/ ___/   / ____/  _/ ___/
        / / __/ /_/ / /| | \__ \\_  \   / / __ / / \__ \
       / /_/ / _, _/ ___ |___/ /__/ /  / /_/ // / ___/ /
       \____/_/ |_/_/  |_/____/____/   \____/___//____/

Welcome to GRASS GIS 8.2.0
GRASS GIS homepage:                      https://grass.osgeo.org
This version running through:            Bash Shell (/bin/bash)
Help is available with the command:      g.manual -i
See the licence terms with:              g.version -c
See citation options with:               g.version -x
If required, restart the GUI with:       g.gui wxpython
When ready to quit enter:                exit

Launching <wxpython> GUI in the background, please wait...
GRASS nc_basic_spm_grass7/rsb:github-rsb > R

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

...

> library(rgrass)
Loading required package: XML
GRASS GIS interface loaded with GRASS version: GRASS 8.2.0 (2022)
and location: nc_basic_spm_grass7

Since rgrass knows the current location, we can for example use execGRASS() to list GRASS rasters in the PERMANENT mapset in the standard North Carolina basic data set (https://grass.osgeo.org/sampledata/north_carolina/nc_basic_spm_grass7.zip):

> execGRASS("g.list", type="raster", mapset="PERMANENT")
basins
elevation
elevation_shade
geology
lakes
landuse
soils
> q()
Save workspace image? [y/n/c]: n
GRASS nc_basic_spm_grass7/rsb:github-rsb > exit
Cleaning up default sqlite database ...
Cleaning up temporary files...
Done.

Goodbye from GRASS GIS

Leaving R returns us to the GRASS terminal console, which we can also exit.

R can also be started within the GRASS GUI, by choosing the console tab, and entering for example rstudio, or another R graphical user interface on Windows or macOS. This screendump shows the same listing of rasters in PERMANENT in rstudio:

Starting GRASS inside R

From spgrass6 0.6-3, it has also been possible to start a GRASS session from a running R session using the initGRASS() function. This is done by setting GRASS and environment variables from the R session (https://grass.osgeo.org/grass80/manuals/variables.html). Starting GRASS from R may use a temporary location, or may use an existing GRASS LOCATION.

Temporary GRASS location

It may be useful to set an environmental variable to the value of GISBASE, as shown for example in the GRASS terminal console:

GRASS nc_basic_spm_grass7/rsb:github-rsb > echo $GISBASE
/home/rsb/topics/grass/g820/grass82

Starting R with such a suitable environment variable set lets us retrieve it later when needed. When loaded and attached, rgrass reports that it seems that GRASS is not running:

$ GRASS_INSTALLATION=/home/rsb/topics/grass/g820/grass82 R

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

...

> library(rgrass)
Loading required package: XML
GRASS GIS interface loaded with GRASS version: (GRASS not running)

Here we’ll use a raster file provided with terra:

> library(terra)
terra 1.6.0
> f <- system.file("ex/elev.tif", package="terra")
> r <- rast(f)

The first argument to initGRASS() is gisBase=, which here we are passing the value of the environmental variable after checking that it is a directory. The second argument is where to write the GISRC file. The third is a template raster (here a "SpatRaster" object) from which to clone the temporary location size, position, resolution and projection:

> GRASS_INSTALLATION <- Sys.getenv("GRASS_INSTALLATION")
> file.info(GRASS_INSTALLATION)$isdir
[1] TRUE
> loc <- initGRASS(gisBase=GRASS_INSTALLATION, home=tempdir(), SG=r, override=TRUE)

As can be seen, initGRASS() creates not only environment and GRASS variables, but also many files in the location mapsets; g.gisenv also shows details:

> list.files(file.path(loc$GISDBASE, loc$LOCATION_NAME), recursive=TRUE)
[1] "file28aaf6dc1c44c/WIND"          "file28aaf6dc1c44c/windows/input"
[3] "PERMANENT/DEFAULT_WIND"          "PERMANENT/PROJ_EPSG"            
[5] "PERMANENT/PROJ_INFO"             "PERMANENT/PROJ_SRID"            
[7] "PERMANENT/PROJ_UNITS"            "PERMANENT/PROJ_WKT"             
[9] "PERMANENT/WIND"                 
> execGRASS("g.gisenv")
GISDBASE='/tmp/Rtmpe7QdVd';
LOCATION_NAME='file28aaf5be4b905';
MAPSET='file28aaf6dc1c44c';
GRASS_GUI='text';

We may now write R objects to the temporary GRASS location for manipulation and analysis, here calculating slope and aspect layers:

> write_RAST(r, vname="terra_elev")
Importing raster map <terra_elev>...
 100%
SpatRaster read into GRASS using r.in.gdal from file 
> execGRASS("g.list", type="raster", mapset=loc$MAPSET)
terra_elev
> execGRASS("r.slope.aspect", flags="overwrite", elevation="terra_elev", slope="slope", aspect="aspect")
 100%
Aspect raster map <aspect> complete
Slope raster map <slope> complete
> u1 <- read_RAST(c("terra_elev", "slope", "aspect"))
Checking GDAL data type and nodata value...
 100%
Using GDAL data type <UInt16>
Exporting raster data to RRASTER format...
 100%
r.out.gdal complete. File </tmp/RtmpLsAbI6/file29f4b33cd20f6.grd> created.
Checking GDAL data type and nodata value...
 100%
Using GDAL data type <Float32>
Exporting raster data to RRASTER format...
 100%
r.out.gdal complete. File </tmp/RtmpLsAbI6/file29f4b2d2e045b.grd> created.
Checking GDAL data type and nodata value...
 100%
Using GDAL data type <Float32>
Exporting raster data to RRASTER format...
 100%
r.out.gdal complete. File </tmp/RtmpLsAbI6/file29f4b52cab445.grd> created.
> u1
class       : SpatRaster 
dimensions  : 90, 95, 3  (nrow, ncol, nlyr)
resolution  : 0.008333326, 0.008333333  (x, y)
extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
sources     : file29f4b33cd20f6.grd  
              file29f4b2d2e045b.grd  
              file29f4b52cab445.grd  
names       :   terra_elev,        slope,       aspect 
min values  : 141.00000000,   0.01416342,   0.08974174 
max values  :   547.000000,     7.229438,   360.000000 

Existing GRASS location

Similarly, GRASS may be started in an R session by providing the gisDbase=, “location= and mapset= arguments with valid values:

$ GRASS_INSTALLATION=/home/rsb/topics/grass/g820/grass82 R

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

...

> library(rgrass)
Loading required package: XML
GRASS GIS interface loaded with GRASS version: (GRASS not running)
> GRASS_INSTALLATION <- Sys.getenv("GRASS_INSTALLATION")
> loc <- initGRASS(GRASS_INSTALLATION, home=tempdir(), gisDbase="/home/rsb/topics/grassdata", location="nc_basic_spm_grass7", mapset="rsb", override=TRUE)
> execGRASS("g.gisenv")
GISDBASE='/home/rsb/topics/grassdata';
LOCATION_NAME='nc_basic_spm_grass7';
MAPSET='rsb';
GRASS_GUI='text';
> execGRASS("g.list", type="raster", mapset="PERMANENT")
basins
elevation
elevation_shade
geology
lakes
landuse
soils

Coercion between object representations

The current version of the R-GRASS interface has been simplified to use the terra package because it, like sf and rgdal before it, links to the important external libraries. The workhorse driver is known as RRASTER, and has been widely used in raster and terra (see also (https://rspatial.org)). It uses GDAL but writes a flat uncompressed binary file. Using terra::rast() also appears to preserve category names and colour tables, but needs further testing (see (https://github.com/rsbivand/rgrass/issues/42)).

From GDAL 3.5.0, the RRASTER driver also supports WKT2_2019 CRS representations; in earlier versions of GDAL, the driver only supported the proj-string representation (https://github.com/rsbivand/rgrass/issues/51).

These changes mean that users transferring data between R and GRASS will need to coerce between terra classes SpatVector and SpatRaster and the class system of choice. In addition, SpatRaster is only read into memory from file when this is required, so requiring some care.

On loading and attaching, terra displays its version:

library("terra")
## terra 1.6.17

This description is constructed conditioning on the availability of suggested packages. terra::gdal() tells us the versions of the external libraries being used by terra:

gdal(lib="all")
##     gdal     proj     geos 
##  "3.5.2"  "9.1.0" "3.11.0"

When using CRAN binary packages built static for Windows and macOS, the R packages will use the same versions of the external libraries, but not necessarily the same versions as those against which GRASS was installed.

"SpatVector" coercion

In the terra package (Hijmans 2022b), vector data are held in "SpatVector" objects. This means that when read_VECT() is used, a "SpatVector" object is returned, and the same class of object is needed for write_VECT() for writing to GRASS.

fv <- system.file("ex/lux.shp", package="terra")
(v <- vect(fv))
##  class       : SpatVector 
##  geometry    : polygons 
##  dimensions  : 12, 6  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  source      : lux.shp
##  coord. ref. : lon/lat WGS 84 (EPSG:4326) 
##  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP
##  type        : <num>    <chr> <num>    <chr> <num> <int>
##  values      :     1 Diekirch     1 Clervaux   312 18081
##                    1 Diekirch     2 Diekirch   218 32543
##                    1 Diekirch     3  Redange   259 18664

These objects are always held in memory, so there is no inMemory() method:

try(inMemory(v))
## Error in (function (classes, fdef, mtable)  : 
##   unable to find an inherited method for function 'inMemory' for signature '"SpatVector"'

The coordinate reference system is expressed in WKT2-2019 form:

cat(crs(v), "\n")
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]

"sf"

Most new work should use vector classes defined in the sf package (Pebesma 2022, 2018).

library("sf")
## Linking to GEOS 3.11.0, GDAL 3.5.2, PROJ 9.0.1; sf_use_s2() is TRUE

In this case, coercion uses st_as_sf():

v_sf <- st_as_sf(v)
v_sf
## Simple feature collection with 12 features and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 5.74414 ymin: 49.44781 xmax: 6.528252 ymax: 50.18162
## Geodetic CRS:  WGS 84
## First 10 features:
##    ID_1       NAME_1 ID_2           NAME_2 AREA    POP
## 1     1     Diekirch    1         Clervaux  312  18081
## 2     1     Diekirch    2         Diekirch  218  32543
## 3     1     Diekirch    3          Redange  259  18664
## 4     1     Diekirch    4          Vianden   76   5163
## 5     1     Diekirch    5            Wiltz  263  16735
## 6     2 Grevenmacher    6       Echternach  188  18899
## 7     2 Grevenmacher    7           Remich  129  22366
## 8     2 Grevenmacher   12     Grevenmacher  210  29828
## 9     3   Luxembourg    8         Capellen  185  48187
## 10    3   Luxembourg    9 Esch-sur-Alzette  251 176820
##                          geometry
## 1  POLYGON ((6.026519 50.17767...
## 2  POLYGON ((6.178368 49.87682...
## 3  POLYGON ((5.881378 49.87015...
## 4  POLYGON ((6.131309 49.97256...
## 5  POLYGON ((5.977929 50.02602...
## 6  POLYGON ((6.385532 49.83703...
## 7  POLYGON ((6.316665 49.62337...
## 8  POLYGON ((6.425158 49.73164...
## 9  POLYGON ((5.998312 49.69992...
## 10 POLYGON ((6.039474 49.44826...

and the vect() method to get from sf to terra:

v_sf_rt <- vect(v_sf)
v_sf_rt
##  class       : SpatVector 
##  geometry    : polygons 
##  dimensions  : 12, 6  (geometries, attributes)
##  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat WGS 84 (EPSG:4326) 
##  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP
##  type        : <num>    <chr> <num>    <chr> <num> <int>
##  values      :     1 Diekirch     1 Clervaux   312 18081
##                    1 Diekirch     2 Diekirch   218 32543
##                    1 Diekirch     3  Redange   259 18664
all.equal(v_sf_rt, v, check.attributes=FALSE)
## [1] TRUE

"Spatial"

To coerce to and from vector classes defined in the sp package (Roger S. Bivand, Pebesma, and Gomez-Rubio 2013), use the classes defined in sf as an intermediate step:

Sys.setenv("_SP_EVOLUTION_STATUS_"="2")
library("sp")
v_sp <- as(v_sf, "Spatial")
print(summary(v_sp))
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##        min       max
## x  5.74414  6.528252
## y 49.44781 50.181622
## Is projected: FALSE 
## proj4string : [+proj=longlat +datum=WGS84 +no_defs]
## Data attributes:
##       ID_1          NAME_1               ID_2          NAME_2         
##  Min.   :1.000   Length:12          Min.   : 1.00   Length:12         
##  1st Qu.:1.000   Class :character   1st Qu.: 3.75   Class :character  
##  Median :2.000   Mode  :character   Median : 6.50   Mode  :character  
##  Mean   :1.917                      Mean   : 6.50                     
##  3rd Qu.:3.000                      3rd Qu.: 9.25                     
##  Max.   :3.000                      Max.   :12.00                     
##       AREA            POP        
##  Min.   : 76.0   Min.   :  5163  
##  1st Qu.:187.2   1st Qu.: 18518  
##  Median :225.5   Median : 26097  
##  Mean   :213.4   Mean   : 50167  
##  3rd Qu.:253.0   3rd Qu.: 36454  
##  Max.   :312.0   Max.   :182607
v_sp_rt <- vect(st_as_sf(v_sp))
all.equal(v_sp_rt, v, check.attributes=FALSE)
## [1] TRUE

"SpatRaster" coercion

In the terra package, raster data are held in "SpatRaster" objects. This means that when read_RAST() is used, a "SpatRaster" object is returned, and the same class of object is needed for write_RAST() for writing to GRASS.

fr <- system.file("ex/elev.tif", package="terra")
(r <- rast(fr))
## class       : SpatRaster 
## dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : elev.tif 
## name        : elevation 
## min value   :       141 
## max value   :       547

In general, "SpatRaster" objects are files, rather than data held in memory:

try(inMemory(r))
## [1] FALSE

"stars"

The stars package (Pebesma 2021) uses GDAL through sf:

library("stars")
## Loading required package: abind

A coercion method is provided from "SpatRaster" to "stars":

r_stars <- st_as_stars(r)
print(r_stars)
## stars object with 2 dimensions and 1 attribute
## attribute(s):
##           Min. 1st Qu. Median     Mean 3rd Qu. Max. NA's
## elev.tif   141     291    333 348.3366     406  547 3942
## dimension(s):
##   from to  offset       delta refsys point values x/y
## x    1 95 5.74167  0.00833333 WGS 84 FALSE   NULL [x]
## y    1 90 50.1917 -0.00833333 WGS 84 FALSE   NULL [y]

which round-trips in memory.

(r_stars_rt <- rast(r_stars))
## class       : SpatRaster 
## dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : memory 
## name        : lyr.1 
## min value   :   141 
## max value   :   547

When coercing to "stars_proxy" the same applies:

(r_stars_p <- st_as_stars(r, proxy=TRUE))
## stars_proxy object with 1 attribute in 1 file(s):
## $elev.tif
## [1] "[...]/elev.tif"
## 
## dimension(s):
##   from to  offset       delta refsys point values x/y
## x    1 95 5.74167  0.00833333 WGS 84 FALSE   NULL [x]
## y    1 90 50.1917 -0.00833333 WGS 84 FALSE   NULL [y]

with coercion from "stars_proxy" also not reading data into memory:

(r_stars_p_rt <- rast(r_stars_p))
## class       : SpatRaster 
## dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : elev.tif 
## name        : elevation 
## min value   :       141 
## max value   :       547

"Spatial"

Coercion to the sp "SpatialGridDataFrame" representation is provided by stars:

r_sp <- as(r_stars, "Spatial")
summary(r_sp)
## Object of class SpatialGridDataFrame
## Coordinates:
##         min       max
## x  5.741667  6.533333
## y 49.441667 50.191667
## Is projected: FALSE 
## proj4string : [+proj=longlat +datum=WGS84 +no_defs]
## Grid attributes:
##   cellcentre.offset    cellsize cells.dim
## x          5.745833 0.008333333        95
## y         49.445833 0.008333333        90
## Data attributes:
##     elev.tif    
##  Min.   :141.0  
##  1st Qu.:291.0  
##  Median :333.0  
##  Mean   :348.3  
##  3rd Qu.:406.0  
##  Max.   :547.0  
##  NA's   :3942

and can be round-tripped:

(r_sp_rt <- rast(st_as_stars(r_sp)))
## class       : SpatRaster 
## dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : memory 
## name        : lyr.1 
## min value   :   141 
## max value   :   547

"RasterLayer"

updated 2022-10-12 following publication of raster 3.6-3 on CRAN.

From version 3.6-3 the raster package (Hijmans 2022a) uses terra for all GDAL operations. Because of this, coercing a "SpatRaster" object to a "RasterLayer" object is simple:

library(raster)
(r_RL <- raster(r))
## class      : RasterLayer 
## dimensions : 90, 95, 8550  (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333  (x, y)
## extent     : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : elev.tif 
## names      : elevation 
## values     : 141, 547  (min, max)
inMemory(r_RL)
## [1] FALSE

This object (held on file rather than in memory) can be round-tripped:

(r_RL_rt <- rast(r_RL))
## class       : SpatRaster 
## dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : elev.tif 
## name        : elevation 
## min value   :       141 
## max value   :       547

Example: Broad Street Cholera outbreak 1854

Even though we know that John Snow already had a working hypothesis about cholera epidemics, his data remain interesting, especially if we use a GIS to find the street distances from mortality dwellings to the Broad Street pump in Soho in central London. Brody et al. (2000) point out that John Snow did not use maps to ‘find’ the Broad Street pump, the polluted water source behind the 1854 cholera epidemic, because he associated cholera with water contaminated with sewage, based on earlier experience.

Broad Street Cholera Data

The basic data to be used here were made available by Jim Detwiler, who had collated them for David O’Sullivan for use on the cover of O’Sullivan and Unwin (2003), based on earlier work by Waldo Tobler and others. The files were a shapefile of counts of deaths at front doors of houses, two shapefiles of pump locations and a georeferenced copy of the Snow map as an image; the files were registered in the British National Grid CRS. In GRASS, a suitable location was set up in this CRS and the image file was imported; the building contours were then digitised as a vector layer and cleaned.

We would like to find the line of equal distances shown on the extract from John Snow’s map shown in Brody et al. (2000) shown here, or equivalently find the distances from the pumps to the front doors of houses with mortalities following the roads, not the straight line distance. We should recall that we only have the locations of counts of mortalities, not of people at risk or of survivors.

We’ll use "SpatVector" as a basis for spatial object representation, and the terra package for reading and writing spatial data files, here the extent of the digitized buildings:

library(terra)
bbo <- vect("data/bbo.gpkg")
bbo
##  class       : SpatVector 
##  geometry    : polygons 
##  dimensions  : 1, 1  (geometries, attributes)
##  extent      : 528890.7, 529808.7, 180557.9, 181415.7  (xmin, xmax, ymin, ymax)
##  source      : bbo.gpkg
##  coord. ref. : OSGB36 / British National Grid (EPSG:27700) 
##  names       :   cat
##  type        : <int>
##  values      :     1

Next, we create a "SpatRaster" object with 1m resolution covering the extent of the buildings in EW/NS orientation (a bounding box):

bbo_r <- rast(bbo, resolution=1)
bbo_r
## class       : SpatRaster 
## dimensions  : 858, 918, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : 528890.7, 529808.7, 180557.9, 181415.9  (xmin, xmax, ymin, ymax)
## coord. ref. : OSGB36 / British National Grid (EPSG:27700)

We’ll read the vector data: digitized building outlines, front door points, and pump locations, into the R session:

buildings <- vect("data/buildings.gpkg")
deaths <- vect("data/deaths.gpkg")
b_pump <- vect("data/b_pump.gpkg")
nb_pump <- vect("data/nb_pump.gpkg")
plot(buildings)
points(deaths)
points(b_pump, pch=7, col="red", cex=1.5)
points(nb_pump, pch=1, col="blue", cex=1.5)

Instantiating GRASS

As noted above, the value of the environment variable GISBASE set in a live GRASS session is where GRASS lives:

Sys.setenv("GRASS_INSTALLATION"="/home/rsb/topics/grass/g820/grass82")
library(rgrass)
## Loading required package: XML
## GRASS GIS interface loaded with GRASS version: (GRASS not running)
GRASS_INSTALLATION <- Sys.getenv("GRASS_INSTALLATION")
file.info(GRASS_INSTALLATION)$isdir[1]
## [1] TRUE
GRASS_INSTALLATION
## [1] "/home/rsb/topics/grass/g820/grass82"

We also need a place to put the temporary GRASS location:

td <- tempdir()
td
## [1] "/tmp/Rtmp10ZF4e"

Here we need the three objects we have already created, and also set override=TRUE, as this document may be run many times. initGRASS() looks for an environment variable that GRASS sessions set (GISRC) pointing to a file of GRASS environment variables. Real GRASS sessions remove it on exit, but this interface does not (yet) provide for its removal, hence the need here to override.

With rgrass and using a "SpatRaster" object bbo_r to set up the GRASS location, we can set the projection, extent and resolution much more easily than in previous versions of the interface:

soho <- initGRASS(gisBase=GRASS_INSTALLATION, home=td, SG=bbo_r, override=TRUE)
soho
## gisdbase    /tmp/Rtmp10ZF4e 
## location    filed06c64f79d54 
## mapset      filed06c16fec7f 
## rows        858 
## columns     918 
## north       181415.9 
## south       180557.9 
## west        528890.7 
## east        529808.7 
## nsres       1 
## ewres       1 
## projection:
##  PROJCRS["OSGB36 / British National Grid",
##     BASEGEOGCRS["OSGB36",
##         DATUM["Ordnance Survey of Great Britain 1936",
##             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4277]],
##     CONVERSION["British National Grid",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",49,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-2,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996012717,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",400000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",-100000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Engineering survey, topographic mapping."],
##         AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
##         BBOX[49.75,-9,61.01,2.01]],
##     ID["EPSG",27700]]

Next we write the spatial data objects to the GRASS temporary LOCATION and MAPSET, taking only the first column of the buildings object. Now the interface by default uses Geopackage (GPKG) intermediate files, supporting long field names and UTF-8 encoding:

fl <- c("overwrite", "quiet")
write_VECT(bbo, vname="bbo", flags=fl)
## Warning in x@ptr$write(filename, layer, filetype, insert[1], overwrite[1], :
## GDAL Message 6: dataset /tmp/Rtmp10ZF4e/filed06c417c3735.gpkg does not support
## layer creation option ENCODING
write_VECT(buildings[,1], vname="buildings", flags=fl)
## Warning in x@ptr$write(filename, layer, filetype, insert[1], overwrite[1], :
## GDAL Message 6: dataset /tmp/Rtmp10ZF4e/filed06c8bf8bd8.gpkg does not support
## layer creation option ENCODING
write_VECT(b_pump, vname="b_pump", flags=fl)
## Warning in x@ptr$write(filename, layer, filetype, insert[1], overwrite[1], :
## GDAL Message 6: dataset /tmp/Rtmp10ZF4e/filed06c5d8b659e.gpkg does not support
## layer creation option ENCODING
write_VECT(nb_pump, vname="nb_pump", flags=fl)
## Warning in x@ptr$write(filename, layer, filetype, insert[1], overwrite[1], :
## GDAL Message 6: dataset /tmp/Rtmp10ZF4e/filed06c5c8c3b90.gpkg does not support
## layer creation option ENCODING
write_VECT(deaths, vname="deaths", flags=fl)
## Warning in x@ptr$write(filename, layer, filetype, insert[1], overwrite[1], :
## GDAL Message 6: dataset /tmp/Rtmp10ZF4e/filed06c5cb9e5c6.gpkg does not support
## layer creation option ENCODING
execGRASS("g.list", type="vector", intern=TRUE)
## [1] "b_pump"    "bbo"       "buildings" "deaths"    "nb_pump"

Data processing

First, we cut the buildings out of the extent polygon to leave the roads. Having set the region resolution to 1x1m squares we can convert the vector roads to raster, and can tabulate raster cell values, where the asterisk (*) are missing data cells:

execGRASS("v.overlay", ainput="buildings", binput="bbo", operator="xor", output="roads", flags=fl)
execGRASS("v.to.rast", input="roads", output="rroads", use="val", value=1, flags=fl)
execGRASS("r.stats", input="rroads", flags=c("c", "quiet"))
## 1 263191
## * 524453

We also need to buffer out the roads by an amount sufficient to include the the front door points within the roads - 4m was found by trial and error and may be too much, giving shorter distances than a thinner buffer would yield. Reclassification of the raster to give only unit cost is also needed:

execGRASS("r.buffer", input="rroads", output="rroads4", distances=4, flags=fl)
execGRASS("r.stats", input="rroads4", flags=c("c", "quiet"))
## 1 263191
## 2 151899
## * 372554
tf <- tempfile()
cat("1 2 = 1\n", file=tf)
execGRASS("r.reclass", input="rroads4", output="rroads4a", rules=tf, flags=fl)
execGRASS("r.stats", input="rroads4a", flags=c("c", "quiet"))
## 1 415090
## * 372554

The r.cost command returns a raster with cells set as the cost of moving from the vector start point or points to each cell; we do this twice, once for the Broad Street pump, and then for the other pumps:

execGRASS("r.cost", input="rroads4a", output="dist_broad", start_points="b_pump", flags=fl)
execGRASS("r.cost", input="rroads4a", output="dist_not_broad", start_points="nb_pump", flags=fl)

Finally, we examine the values of these two distance maps at the front door points, and add these fields (columns) to the vector mortality map:

execGRASS("v.db.addcolumn", map="deaths", columns="broad double precision", flags="quiet")
execGRASS("v.what.rast", map="deaths", raster="dist_broad", column="broad", flags="quiet")
execGRASS("v.db.addcolumn", map="deaths", columns="not_broad double precision", flags="quiet")
execGRASS("v.what.rast", map="deaths", raster="dist_not_broad", column="not_broad", flags="quiet")

Moving the data back to R from GRASS permits operations on the distance values. We set the logical variable b_nearer to TRUE if the distance to the Broad Street pump is less than the distance to the nearest other pump:

deaths1 <- read_VECT("deaths", flags=fl)
deaths1$b_nearer <- deaths1$broad < deaths1$not_broad
by(deaths1$Num_Css, deaths1$b_nearer, sum)
## deaths1$b_nearer: FALSE
## [1] 221
## ------------------------------------------------------------ 
## deaths1$b_nearer: TRUE
## [1] 357

Just with terra?

In https://rsbivand.github.io/geomed19-workshop/Geomed19_II.html, it was shown that raster and gdistance could be used to analyze the same problem. Times have moved on, so let’s try terra, first buffering the roads out by 4m (buildings in by 4m):

buildings_4 <- buffer(buildings, width=-4)

Next we rasterize the negative buffered buildings using bbo_r created above:

buildings_4_r <- rasterize(buildings_4, bbo_r, field="cat")

and recode the values to set the negative buffered buildings to a large value and the roads to the value of the resolution:

values(buildings_4_r)[!is.na(values(buildings_4_r))] <- 99999
values(buildings_4_r)[is.na(values(buildings_4_r))] <- 1

Next, rasterize the pump locations to the same resolution template:

b_pump_r <- rasterize(b_pump, bbo_r, field="cat")
nb_pump_r <- rasterize(nb_pump, bbo_r, field="cat")

Copy the recoded negative buffered buildings raster, and overwrite with the location of the Broad Street pump set to zero, then run costDistance():

b_buildings_4_r <- buildings_4_r
values(b_buildings_4_r)[which(values(b_pump_r) == 1)] <- 0
res_b <- costDistance(b_buildings_4_r, target=0)

Repeat for the non-Broad Street pumps:

nb_buildings_4_r <- buildings_4_r
values(nb_buildings_4_r)[which(values(nb_pump_r) > 0)] <- 0
res_nb <- costDistance(nb_buildings_4_r, target=0)

Extract the distances to the Broad Street and nearest non-Broad Street pumps by the locations of the front doors of houses where deaths were registered:

b_deaths <- extract(res_b, deaths)
nb_deaths <- extract(res_nb, deaths)

Find, for each such address, whether the Broad Street pump was nearer or not:

broad_nearer <- b_deaths$cat < nb_deaths$cat

Sum how many deaths occurred for addresses for which the Broad Street pump was nearer or not:

by(deaths$Num_Css, broad_nearer, sum)
## broad_nearer: FALSE
## [1] 221
## ------------------------------------------------------------ 
## broad_nearer: TRUE
## [1] 357

The outcome is the same as the analysis above using GRASS.

Open questions

There are clearly numerous questions, and hopefully this session will both define others, and maybe recruit contributors to ones already identified.

Continuous integration testing

One of the main conclusions in https://github.com/rsbivand/rgrass/issues/34 was that a testing framework using github actions is needed for the rgrass package. This could extend to the tinytest approach, but simply having some way to instantiate a platform with R, GRASS, rgrass and terra (where GRASS and terra need external software libraries) would be very helpful.

Spatio-temporal objects

The current state of rgrass does not extend to spatio-temporal objects, but stars and terra on the R side, and GRASS, do have object representations which should be interfaced. All are somewhat moving targets now, but deserve attention.

Proxy-based file transfer

One further open question, possibly linked to spatio-temporal objects, is that contemporary and near-future data volumes may become such that proxying data objects will come into focus. Both terra and stars can work in this way; it might be worth considering revisiting the GDAL GRASS read-only plug-in for raster objects, perhaps permitting larger data volumes to be handled by storing once on disk in the GRASS location. Then terra and stars could access the GRASS location by proxy. However, for really large data volumes in earth observation, this would not be sufficient.

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Fedora Linux 36 (Workstation Edition)
## 
## Matrix products: default
## BLAS:   /home/rsb/topics/R/R421-share/lib64/R/lib/libRblas.so
## LAPACK: /home/rsb/topics/R/R421-share/lib64/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rgrass_0.3-6  XML_3.99-0.11 raster_3.6-3  stars_0.5-6   abind_1.4-5  
## [6] sp_1.5-1      sf_1.0-9      terra_1.6-17 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9         pillar_1.8.1       bslib_0.4.0        compiler_4.2.1    
##  [5] jquerylib_0.1.4    highr_0.9          class_7.3-20       tools_4.2.1       
##  [9] digest_0.6.29      lattice_0.20-45    tibble_3.1.8       jsonlite_1.8.2    
## [13] evaluate_0.17      lifecycle_1.0.3    pkgconfig_2.0.3    rlang_1.0.6       
## [17] DBI_1.1.3          cli_3.4.1          rstudioapi_0.14    parallel_4.2.1    
## [21] yaml_2.3.5         xfun_0.33          fastmap_1.1.0      e1071_1.7-11      
## [25] stringr_1.4.1      dplyr_1.0.10       knitr_1.40         generics_0.1.3    
## [29] sass_0.4.2         vctrs_0.4.2        tidyselect_1.2.0   classInt_0.4-8    
## [33] grid_4.2.1         glue_1.6.2         R6_2.5.1           fansi_1.0.3       
## [37] rmarkdown_2.17     magrittr_2.0.3     codetools_0.2-18   htmltools_0.5.3   
## [41] units_0.8-0        assertthat_0.2.1   utf8_1.2.2         KernSmooth_2.23-20
## [45] stringi_1.7.8      proxy_0.4-27       lwgeom_0.2-9       cachem_1.0.6

References

Bivand, R. S. 2000. “Using the R Statistical Data Analysis Language on GRASS 5.0 GIS Data Base Files.” Computers and Geosciences 26: 1043–52.
Bivand, Roger. 2021. “Progress in the R Ecosystem for Representing and Handling Spatial Data.” Journal of Geographical Systems 23: 515–46. https://doi.org/10.1007/s10109-020-00336-0.
———. 2022. R Packages for Analyzing Spatial Data: A Comparative Case Study with Areal Data.” Geographical Analysis 54 (3): 488–518. https://doi.org/10.1111/gean.12319.
Bivand, Roger S., Edzer Pebesma, and Virgilio Gomez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY. https://asdar-book.org/.
Brody, H., M. R. Rip, P. Vinten-Johansen, N. Paneth, and S. Rachman. 2000. “Map-Making and Myth-Making in Broad Street: The London Cholera Epidemic, 1854.” Lancet 356: 64–68.
Evers, Kristian, and Thomas Knudsen. 2017. Transformation Pipelines for PROJ.4. https://www.fig.net/resources/proceedings/fig\_proceedings/fig2017/papers/iss6b/ISS6B\_evers\_knudsen\_9156.pdf.
Hijmans, Robert J. 2022a. raster: Geographic Data Analysis and Modeling. https://cran.r-project.org/package=raster.
———. 2022b. terra: Spatial Data Analysis. https://cran.r-project.org/package=terra.
Knudsen, Thomas, and Kristian Evers. 2017. Transformation Pipelines for PROJ.4. https://meetingorganizer.copernicus.org/EGU2017/EGU2017-8050.pdf.
Neteler, M., and H. Mitasova. 2008. Open Source GIS: A GRASS GIS Approach, Third Edition. New York: Springer.
O’Sullivan, D., and D. J. Unwin. 2003. Geographical Information Analysis. Hoboken, NJ: Wiley.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
———. 2021. stars: Spatiotemporal Arrays, Raster and Vector Data Cubes. https://CRAN.R-project.org/package=stars.
———. 2022. sf: Simple Features for R. https://cran.r-project.org/package=sf.