Required current contributed CRAN packages:

I am running R 4.0.3, with recent update.packages().

needed <- c("RSQLite", "mapview", "sf", "rgdal", "sp")

Script

Script and data at https://github.com/rsbivand/ECS530_h20/raw/main/ECS530_III.zip. Download to suitable location, unzip and use as basis.

Schedule

  • 9/11 (I) Spatial data representation, (II) Support+topology, input/output

  • 10/11 (III) Coordinate reference systems, (IV) Visualization

  • 11/11 (VI) Spatial autocorrelation, project surgery

  • 12/11 (VII) Spatial regression, (VIII) Spatial multilevel regression

  • 13/11 (IX) Interpolation, point processes, project surgery, presentation

  • 14/11 Presentations

Session III

  • 09:15-09:45 Coordinate reference systems: background

  • 09:45-10:15 Modernising PROJ and issues

  • 10:15-11:00 Proposed developments (using sp and rgdal as prototypes)

Coordinate reference systems: background

The usefulness of spatial data is linked to knowing its coordinate reference system. The coordinate reference system may be geographic, usually measured in decimal degrees, or projected, layered on a known geographic CRS, usually measured in metres (planar). The underlying geographical CRS must specify an ellipsoid, with associated major and minor axis lengths:

library(sp)
library(rgdal)
## rgdal: version: 1.5-18, (SVN revision 1082)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.2.0, released 2020/10/26
## Path to GDAL shared files: /usr/local/share/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 7.2.0, November 1st, 2020, [PJ_VERSION: 720]
## Path to PROJ shared files: /tmp/Rtmpkvg72D/file3ed76b3cfb2:/usr/local/share/proj:/usr/local/share/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-4
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
projInfo("ellps")
##         name          major                  ell
## 1      MERIT    a=6378137.0           rf=298.257
## 2      SGS85    a=6378136.0           rf=298.257
## 3      GRS80    a=6378137.0     rf=298.257222101
## 4      IAU76    a=6378140.0           rf=298.257
## 5       airy  a=6377563.396       rf=299.3249646
## 6     APL4.9    a=6378137.0            rf=298.25
## 7      NWL9D    a=6378145.0            rf=298.25
## 8   mod_airy  a=6377340.189        b=6356034.446
## 9     andrae   a=6377104.43             rf=300.0
## 10    danish a=6377019.2563             rf=300.0
## 11   aust_SA    a=6378160.0            rf=298.25
## 12     GRS67    a=6378160.0    rf=298.2471674270
## 13   GSK2011    a=6378136.5       rf=298.2564151
## 14    bessel  a=6377397.155       rf=299.1528128
## 15  bess_nam  a=6377483.865       rf=299.1528128
## 16    clrk66    a=6378206.4          b=6356583.8
## 17    clrk80  a=6378249.145          rf=293.4663
## 18 clrk80ign    a=6378249.2 rf=293.4660212936269
## 19       CPM    a=6375738.7            rf=334.29
## 20    delmbr     a=6376428.             rf=311.5
## 21   engelis   a=6378136.05          rf=298.2566
## 22   evrst30  a=6377276.345          rf=300.8017
## 23   evrst48  a=6377304.063          rf=300.8017
## 24   evrst56  a=6377301.243          rf=300.8017
## 25   evrst69  a=6377295.664          rf=300.8017
## 26   evrstSS  a=6377298.556          rf=300.8017
## 27   fschr60     a=6378166.             rf=298.3
## 28  fschr60m     a=6378155.             rf=298.3
## 29   fschr68     a=6378150.             rf=298.3
## 30   helmert     a=6378200.             rf=298.3
## 31     hough    a=6378270.0              rf=297.
## 32      intl    a=6378388.0              rf=297.
## 33     krass    a=6378245.0             rf=298.3
## 34     kaula     a=6378163.            rf=298.24
## 35     lerch     a=6378139.           rf=298.257
## 36     mprts     a=6397300.              rf=191.
## 37  new_intl    a=6378157.5          b=6356772.2
## 38   plessis     a=6376523.           b=6355863.
## 39      PZ90    a=6378136.0         rf=298.25784
## 40    SEasia    a=6378155.0       b=6356773.3205
## 41   walbeck    a=6376896.0       b=6355834.8467
## 42     WGS60    a=6378165.0             rf=298.3
## 43     WGS66    a=6378145.0            rf=298.25
## 44     WGS72    a=6378135.0            rf=298.26
## 45     WGS84    a=6378137.0     rf=298.257223563
## 46    sphere    a=6370997.0          b=6370997.0
##                        description
## 1                       MERIT 1983
## 2        Soviet Geodetic System 85
## 3             GRS 1980(IUGG, 1980)
## 4                         IAU 1976
## 5                        Airy 1830
## 6              Appl. Physics. 1965
## 7         Naval Weapons Lab., 1965
## 8                    Modified Airy
## 9       Andrae 1876 (Den., Iclnd.)
## 10  Andrae 1876 (Denmark, Iceland)
## 11 Australian Natl & S. Amer. 1969
## 12               GRS 67(IUGG 1967)
## 13                        GSK-2011
## 14                     Bessel 1841
## 15           Bessel 1841 (Namibia)
## 16                     Clarke 1866
## 17                Clarke 1880 mod.
## 18              Clarke 1880 (IGN).
## 19 Comm. des Poids et Mesures 1799
## 20         Delambre 1810 (Belgium)
## 21                    Engelis 1985
## 22                    Everest 1830
## 23                    Everest 1948
## 24                    Everest 1956
## 25                    Everest 1969
## 26       Everest (Sabah & Sarawak)
## 27    Fischer (Mercury Datum) 1960
## 28           Modified Fischer 1960
## 29                    Fischer 1968
## 30                    Helmert 1906
## 31                           Hough
## 32    International 1909 (Hayford)
## 33                Krassovsky, 1942
## 34                      Kaula 1961
## 35                      Lerch 1979
## 36                 Maupertius 1738
## 37          New International 1967
## 38           Plessis 1817 (France)
## 39                           PZ-90
## 40                  Southeast Asia
## 41                         Walbeck
## 42                          WGS 60
## 43                          WGS 66
## 44                          WGS 72
## 45                          WGS 84
## 46       Normal Sphere (r=6370997)

Other parameters should be specified, such as the prime meridian, often taken as Greenwich. Before PROJ version 6, legacy PROJ (and GDAL) used a +datum= tag introduced after the library migrated beyond USGS (around version 4.4). The underlying problem was not that projection and inverse projection could not be carried out between projected CRS and geograpghical CRS, but that national mapping agencies defined often many datums, keying the specification of a geographical CRS to a national or regional datum. Some of these, especially for North America, were supported, but support for others was patchy. The +datum= tag supported a partly informal listing of values, themselves linked to three or seven coefficient datum transformation sets, used through the +towgs84= tag. Coefficient lookup through the +datum= tag, or direct specification of coefficients through the +towgs84= tag became a convenient way to handle datum transformation in addition to projection and inverse projection.

The default “hub” for transformation was to go through the then newly achieved WGS84 datum. Spatial data files often encoded the geographic and projected CRS with reference to these values, in some cases using PROJ 4 strings. These used a pseudo projection +proj=longlat to indicate a geographical CRS, and many other possible values of +proj= for projected CRS.

The Grids & Datums column in Photogrammetric Engineering & Remote Sensing gives insight into some of the peculiarities of national mapping agencies - authority is typically national but may be subnational:

data("GridsDatums")
GridsDatums[grep("Norway", GridsDatums$country),]
##                   country     month year ISO
## 17  The Kingdom of Norway (October) 1999 NOR
## 226                Norway    (July) 2017 NOR

Beyond this, the database successively developed by the European Petroleum Survey Group was copied to local CSV files for PROJ and GDAL, providing lookup by code number. From PROJ 6, GDAL no longer uses these CSV files, and PROJ makes available a SQLite database copy of the EPSG database:

EPSG <- make_EPSG()
EPSG[grep("Oslo", EPSG$note), 1:2]
##       code                            note
## 515   4817                 NGO 1948 (Oslo)
## 2435 27391    NGO 1948 (Oslo) / NGO zone I
## 2436 27392   NGO 1948 (Oslo) / NGO zone II
## 2437 27393  NGO 1948 (Oslo) / NGO zone III
## 2438 27394   NGO 1948 (Oslo) / NGO zone IV
## 2439 27395    NGO 1948 (Oslo) / NGO zone V
## 2440 27396   NGO 1948 (Oslo) / NGO zone VI
## 2441 27397  NGO 1948 (Oslo) / NGO zone VII
## 2442 27398 NGO 1948 (Oslo) / NGO zone VIII
CRS("+init=epsg:4817")
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded ellps Bessel Modified in CRS definition: +proj=longlat
## +a=6377492.018 +rf=299.1528128 +pm=oslo +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum NGO_1948_Oslo in CRS definition
## CRS arguments:
##  +proj=longlat +a=6377492.018 +rf=299.1528128 +pm=oslo +no_defs

The lookup prior to PROJ 6 used to provide a +towgs84= value of 278.3,93,474.5,7.889,0.05,-6.61,6.21, but in the new regime only reveals transformation coefficients in the context of a coordinate operation:

list_coordOps("EPSG:4817", "EPSG:4326")
## Candidate coordinate operations found:  1 
## Strict containment:  FALSE 
## Visualization order:  TRUE 
## Source: EPSG:4817 
## Target: EPSG:4326 
## Best instantiable operation has accuracy: 3 m
## Description: axis order change (2D) + NGO 1948 (Oslo) to NGO1948 (1) + NGO
##              1948 to WGS 84 (1) + axis order change (2D)
## Definition:  +proj=pipeline +step +proj=unitconvert +xy_in=deg +xy_out=rad
##              +step +inv +proj=longlat +a=6377492.018
##              +rf=299.1528128 +pm=oslo +step +inv +proj=longlat
##              +a=6377492.018 +rf=299.1528128 +step +proj=push
##              +v_3 +step +proj=cart +a=6377492.018
##              +rf=299.1528128 +step +proj=helmert +x=278.3 +y=93
##              +z=474.5 +rx=7.889 +ry=0.05 +rz=-6.61 +s=6.21
##              +convention=position_vector +step +inv +proj=cart
##              +ellps=WGS84 +step +proj=pop +v_3 +step
##              +proj=unitconvert +xy_in=rad +xy_out=deg

Up to and including PROJ 5, downstream software, like sf and rgdal, have been able to rely on the provision of ad-hoc transformation capabilities, with apparently predictable consequences. Everybody knew (or should have known) that each new release of the PROJ and GDAL CSV metadata files could update transformation coefficients enough to shift outcomes a little. Everyone further chose to ignore the timestamping of coordinates, or at least of datasets; we could guess (as above) that US Census tract boundaries for 1980 must use the NAD27 datum framework - suprisingly many used NAD83 anyway (both for Boston and the North Carolina SIDS data set).

Use of KML files to provide zoom and pan for these boundaries, and now leaflet and mapview exposes approximations mercilessly. Use of coefficients of transformation of an unknown degree of approximation, and authority “googled it” was reaching its limits, or likely had exceeded them.

sp classes used a PROJ string to define the CRS (in an S4 "CRS" object):

getClass("CRS")
## Class "CRS" [package "sp"]
## 
## Slots:
##                 
## Name:   projargs
## Class: character

sf used an S3 "crs" object with an integer EPSG code and a PROJ string; if instantiated from the EPSG code, both were provided. In current sf, the "crs" object has user input and wkt components, and methods to access the PROJ string and the EPSG code for backward compatibility:

library(sf)
## Linking to GEOS 3.9.0dev, GDAL 3.2.0, PROJ 7.2.0
st_crs(4326)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

Modernising PROJ and issues

PROJ

Because so much open source (and other) software uses the PROJ library and framework, many are affected when PROJ upgrades. Until very recently, PROJ has been seen as very reliable, and the changes taking place now are intended to confirm and reinforce this reliability. Before PROJ 5 (PROJ 6 was released in 2019, PROJ 7 was released in March 2020), the +datum= tag was used, perhaps with +towgs84= with three or seven coefficients, and possibly +nadgrids= where datum transformation grids were available. However, transformations from one projection to another first inversed to longitude-latitude in WGS84, then projected on to the target projection.

Fast-forward 35 years and PROJ.4 is everywhere: It provides coordinate handling for almost every geospatial program, open or closed source. Today, we see a drastical increase in the need for high accuracy GNSS coordinate handling, especially in the agricultural and construction engineering sectors. This need for geodetic-accuracy transformations is not satisfied by “classic PROJ.4”. But with the ubiquity of PROJ.4, we can provide these transformations “everywhere”, just by implementing them as part of PROJ.4 (Evers and Knudsen 2017).

Escaping the WGS84 hub/pivot: PROJ and OGC WKT2

Following the introduction of geodetic modules and pipelines in PROJ 5 (Knudsen and Evers 2017; Evers and Knudsen 2017), PROJ 6 moves further. Changes in the legacy PROJ representation and WGS84 transformation hub have been coordinated through the GDAL barn raising initiative. Crucially WGS84 often ceases to be the pivot for moving between datums. A new OGC WKT is coming, and an SQLite EPSG file database has replaced CSV files. SRS will begin to support 3D by default, adding time too as SRS change. See also PROJ migration notes.

There are very useful postings on the PROJ mailing list from Martin Desruisseaux, first proposing clarifications and a follow-up including a summary:

  • “Early binding” ≈ hub transformation technique.
  • “Late binding” ≈ hub transformation technique NOT used, replaced by a more complex technique consisting in searching parameters in the EPSG database after the transformation context (source, target, epoch, area of interest) is known.
  • The problem of hub transformation technique is independent of WGS84. It is caused by the fact that transformations to/from the hub are approximate. Any other hub we could invent in replacement of WGS84 will have the same problem, unless we can invent a hub for which transformations are exact (I think that if such hub existed, we would have already heard about it).

The solution proposed by ISO 19111 (in my understanding) is:

  • Forget about hub (WGS84 or other), unless the simplicity of early-binding is considered more important than accuracy.
  • Associating a CRS to a coordinate set (geometry or raster) is no longer sufficient. A {CRS, epoch} tuple must be associated. ISO 19111 calls this tuple “Coordinate metadata”. From a programmatic API point of view, this means that getCoordinateReferenceSystem() method in Geometry objects (for instance) needs to be replaced by a getCoordinateMetadata() method.

This page gives a picture of why the changes in PROJ matter - the arrows are in cm per year displacement:

To handle this level of detail, PROJ 7 introduces an on-demand content delivery network under user control as an alternative to dowloading many possibly unnecessary time-specific vertical and horizontal transformation grids. PROJ 8 will introduce datum ensembles, particularly for commonly used datums such as EPSG:4326, which have had (especially in Noth America) multiple, differing, instantiations as time has progressed; for now we do not know what the consequences of this will be - it is already in the EPSG database.

Upstream software dependencies of the R-spatial ecosystem

When changes occur in upstream external software, R packages using these libraries often need to adapt, but package maintainers try very hard to shield users from any consequences, so that legacy workflows continue to provide the same or at least similar results from the same data.

The code shown in (Bivand, Pebesma, and Gomez-Rubio 2008, 2013) is almost all run nightly on a platform with updated R packages and external software.

This does not necessarily trap all differences (figures are not compared), but is helpful in detecting impacts of changes in packages or external software.

It is also very helpful that CRAN servers using the released and development versions of R, and with different levels of external software also run nightly checks.

Again, sometimes changes are only noticed by users, but quite often checks run by maintainers and by CRAN alert us to impending challenges.

Tracking the development mailing lists of the external software communities, all open source, can also show how thinking is evolving, although sometimes code tidying in external software can have unexpected consequences, breaking not sf or sp with rgdal or rgeos, but a package further downstream.

(Bivand 2014) discusses open source geospatial software stacks more generally, but here we will consider ongoing changes in PROJ.

(Knudsen and Evers 2017; Evers and Knudsen 2017) not only point out how the world has changed since a World Geodetic System of 1984 (WGS84) was adopted as a hub for coordinate transformation in PROJ, but also introduced transformation pipelines.

In using a transformation hub, PROJ had worked adequately when the errors introduced by transforming first to WGS84 and then from WGS84 to the target coordinate reference system, but with years passing from 1984, the world has undergone sufficient tectonic shifts for errors to increase.

In addition, the need for precision has risen in agriculture and engineering. So PROJ, as it was, risked ceasing to be fit for purpose as a fundamental component of the geospatial open source software stack.

Following major changes in successive iterations of the international standards for coordinate reference systems (ISO 2004), PROJ is changing from preferring “late-binding” transformations, pivoting through a known transformation hub in going from input to target coordinate reference systems, to “early-binding” transformations.

This means that the user may be offered alternative paths from input to target coordinate reference systems, some of which may go directly, and more will use higher precision transformation grids, enlarging the existing practice of using North American Datum (NAD) grids.

In other cases, three or seven coefficient transformations may be offered, but the default fallback, where little is known about the input or target specification, may be less satisfactory than PROJ has previously offered.

Grid CDN mechanism

The grid CDN is available at https://cdn.proj.org, and can be turned on for use in rgdal; I’m not aware that sf has made control of it available yet. Files are now stored in a single GTiff format (I think cloud-optimised).

Transformation pipelines

In addition, the current iteration of the standard makes it more important to declare the epoch of interest of coordinates (when the position was recorded and how) and the region of interest.

A transformation pathway may have an undefined epoch and a global span, but cannot achieve optimal precision everywhere.

By bounding the region of interest say within a tectonic plate, and the epoch to a given five-year period, very high precision transformations may be possible.

These choices have not so far been required explicitly, but for example matching against the "area" table in the database may reduce the number of transformation pathways offered dramatically.

CRS in R before PROJ

The mapproj package provided coordinate reference system and projection support for the maps package. From mapproj/src/map.h, line 20, we can see that the eccentricity of the Earth is defined as 0.08227185422, corrresponding to the Clark 1866 ellipsoid (Iliffe and Lott 2008):

ellps <- sf_proj_info("ellps")
(clrk66 <- unlist(ellps[ellps$name=="clrk66",]))
##          name         major           ell   description 
##      "clrk66" "a=6378206.4" "b=6356583.8" "Clarke 1866"

With a very few exceptions, projections included in mapproj::mapproject() use the Clarke 1866 ellipsoid, with the remainder using a sphere with the Clarke 1866 major axis radius. The function returns coordinates for visualization in an unknown metric; no inverse projections are available.

eval(parse(text=clrk66["major"]))
eval(parse(text=clrk66["ell"]))
print(sqrt((a^2-b^2)/a^2), digits=10)
## [1] 0.08227185422

Approaching and implemented changes in PROJ

  • PROJ developers not only point out how the world has changed since a World Geodetic System of 1984 (WGS84) was adopted as a hub for coordinate transformation in PROJ, but also introduced transformation pipelines (Knudsen and Evers 2017; Evers and Knudsen 2017).

  • In using a transformation hub, PROJ had worked adequately when the errors introduced by transforming first to WGS84 and then from WGS84 to the target coordinate reference system were acceptable, but with years passing from 1984, the world has undergone sufficient tectonic shifts for errors to increase, just as needs for accuracy sharpen.

  • In addition, the need for accuracy has risen in agriculture and engineering.

  • So PROJ, as it was, risked ceasing to be fit for purpose as a fundamental component of the geospatial open source software stack.

PROJ will also become more tightly linked to authorities responsible for the specification components. While the original well-known text (WKT1) descriptions also contained authorities, WKT2 (2019) is substantially more stringent. PROJ continues to use the European Petroleum Survey Group (EPSG) database, the local copy PROJ uses is now an SQLite database, with a large number of tables:

library(RSQLite)
DB0 <- strsplit(sf:::CPL_get_data_dir(), .Platform$path.sep)[[1]]
DB <- file.path(DB0[length(DB0)], "proj.db")
db <- dbConnect(SQLite(), dbname=DB)
cat(strwrap(paste(dbListTables(db), collapse=", ")), sep="\n")
## alias_name, authority_list, authority_to_authority_preference, axis,
## celestial_body, compound_crs, concatenated_operation,
## concatenated_operation_step, conversion, conversion_method,
## conversion_param, conversion_table, coordinate_operation_method,
## coordinate_operation_view, coordinate_operation_with_conversion_view,
## coordinate_system, crs_view, deprecation, ellipsoid, extent,
## geodetic_crs, geodetic_datum, geodetic_datum_ensemble_member,
## geoid_model, grid_alternatives, grid_packages, grid_transformation,
## helmert_transformation, helmert_transformation_table, metadata,
## object_view, other_transformation, prime_meridian, projected_crs,
## scope, supersession, unit_of_measure, usage, vertical_crs,
## vertical_datum, vertical_datum_ensemble_member
dbDisconnect(db)
  • The initial use of coordinate reference systems for objects defined in sp was based on the PROJ string representation, which built on a simplified key=value form.

  • Keys began with plus (+), and the value format depended on the key.

  • If essential keys were missing, some might be added by default from a file that has now been eliminated as misleading; if +ellps= was missing and not added internally from other keys, +ellps=WGS84 would be added silently to refer to the World Geodetic System 1984 ellipsoid definition.

  • Accurate coordinate transformation has always been needed for the integration of data from different sources, but has become much more pressing as web mapping has become available in R, through the leaflet package (Cheng, Karambelkar, and Xie 2018), on which mapview and the "view" mode of tmap build.

  • As web mapping provides zooming and panning, possible infelicities that were too small to detect as mismatches in transformation jump into prominence.

  • The web mapping workflow transforms input objects to OGC:CRS84 (geographical CRS WGS 84, World area of relevance, WGS84 datum, visualization order) as expected by leaflet, then on to EPSG:3857 (WGS 84 / Pseudo-Mercator) for display on web map backgrounds (this is carried out internally in leaflet).

  • Objects shown in mapview and tmap are now coerced to sf classes, then st_transform() transforms to OGC:CRS84 if necessary (or until we are sure EPSG:4326 never swaps axes).

Broad Street Cholera Data

John Snow did not use maps to find the Broad Street pump, the polluted water source behind the 1854 cholera epidemic in Soho in central London, because he associated cholera with water contaminated with sewage, based on earlier experience (Brody et al. 2000). The basic data to be used here were made available by Jim Detwiler, who had collated them for David O’Sullivan for use on the cover of O’Sullivan and Unwin (2003), based on earlier work by Waldo Tobler and others.

Where is the Broad Street pump?

We’ll use the example of the location of the Broad Street pump in Soho, London, to be distributed with sf; the object has planar coordinates in the OSGB British National Grid projected CRS with the OSGB datum:

library(sf)
bp_file <- system.file("gpkg/b_pump.gpkg", package="sf")
b_pump_sf <- st_read(bp_file)
## Reading layer `b_pump' from data source `/home/rsb/lib/r_libs/sf/gpkg/b_pump.gpkg' using driver `GPKG'
## Simple feature collection with 1 feature and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 529393.5 ymin: 181020.6 xmax: 529393.5 ymax: 181020.6
## projected CRS:  Transverse_Mercator

Proj4 string degradation

Before R packages upgraded the way coordinate reference systems were represented in early 2020, our Proj4 string representation suffered degradation. Taking the Proj4 string defined in PROJ 5 for the British National Grid, there is a +datum=OSGB36 key-value pair. But when processing this input with PROJ 6 and GDAL 3, this key is removed. Checking, we can see that reading the input string appears to work, but the output for the Proj4 string drops the +datum=OSGB36 key-value pair, introducing instead the ellipse implied by that datum:

proj5 <- paste0("+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717",
 " +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs")
legacy <- st_crs(proj5)
proj6 <- legacy$proj4string
proj5_parts <- unlist(strsplit(proj5, " "))
proj6_parts <- unlist(strsplit(proj6, " "))
proj5_parts[!is.element(proj5_parts, proj6_parts)]
## [1] "+datum=OSGB36"
proj6_parts[!is.element(proj6_parts, proj5_parts)]
## [1] "+ellps=airy"

We can emulate the problem seen following the release in May 2019 of GDAL 3.0.0 using PROJ 6, by inserting the degraded Proj4 string into the Broad Street pump object. The coordinate reference system representation is now ignorant of the proper datum specification. The apparent "proj4string" component of the sf "crs" is used to permit packages to adapt, even though its contents are degraded.

b_pump_sf1 <- b_pump_sf
st_crs(b_pump_sf1) <- st_crs(st_crs(b_pump_sf1)$proj4string)
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that

Why does this matter? For visualization on a web map, for example using the mapview package, the projected geometries are transformed to the same WGS84 ellipse and datum (OGC:CRS84) that were used in PROJ 4 as a transformation hub. OGC:CRS84 is the visualization axis order equivalent of EPSG:4326. In leaflet, these are projected to Web Mercator (EPSG:3856). Inside mapview(), the sf::st_transform() function is used, so we will emulate this coordinate operation before handing on the geometries for display.

However, because the one of the objects now has a degraded Proj4 string representation of its coordinate reference system, the output points, apparently transformed identically to WGS84, are now some distance apart:

b_pump_sf_ll <- st_transform(b_pump_sf, "OGC:CRS84")
b_pump_sf1_ll <- st_transform(b_pump_sf1, "OGC:CRS84")
st_distance(b_pump_sf_ll, b_pump_sf1_ll)
## Units: [m]
##          [,1]
## [1,] 125.0578

The Broad Street pump is within 2m of the green point (relative accuracy preserved) but the red point is now in Ingestre Place, because of the loss of the datum specification.

library(mapview)
## GDAL version >= 3.1.0 | setting mapviewOptions(fgb = TRUE)
if (sf:::CPL_gdal_version() >= "3.1.0") mapviewOptions(fgb = FALSE)
pts <- rbind(b_pump_sf_ll, b_pump_sf1_ll)
pts$CRS <- c("original", "degraded")
mapview(pts, zcol="CRS", map.type="OpenStreetMap", col.regions=c("green", "red"), cex=18)

Implemented resolutions: WKT2 2019

  • Once PROJ 6 and GDAL 3 had stabilized in the summer of 2019, we identified the underlying threat as lying in the advertised degradation of GDAL’s exportToProj4() function.

  • When reading raster and vector files, the coordinate reference system representation using Proj4 strings would often be degraded, so that further transformation within R (also using GDAL/PROJ functionality) would be at risk of much greater inaccuracy than before.

  • Since then, sf, sp with rgdal and raster have adopted the 2019 version of the “Well-Known Text” coordinate reference system representation WKT2-2019 (ISO 2004) instead of Proj4 strings to contain coordinate reference system definitions.

  • Accommodations have also been provided so that the S3 class "crs" objects used in objects defined in sf, and the formal S4 class "CRS" objects used objects defined in sp and raster, can continue to attempt to support Proj4 strings in addition, while other package maintainers and workflow users catch up.

  • Following an extended campaign of repeated checking about 900 reverse dependencies (packages depending on sp, rgdal and others) and dozens of github issues, most of the consequences of the switch to WKT2 among packages have now been addressed.

  • More recently (late August 2020), 115 packages are being offered rebuilt stored objects that had included "CRS" objects without WKT2 definitions.

  • This approach has ensured that spatial objects, whether created within R, read in from external data sources, or read as stored objects, all have WKT2 string representations of their coordinate reference systems, and for backward compatibility can represent these in addition as Proj4 strings.

  • Operations on objects should carry forward the new representations, which should also be written out to external data formats correctly.

Specified axis order

There is a minor divergence between sf and sp (and thus rgdal): in sf, the axis order of the CRS is preserved as instantiated, but objects do not have their axes swapped to accord with authorities unless sf::st_axis_order() is set `TRUE´. This can appear odd, because although the representation records a northings-eastings axis order, data are treated as eastings-northings in plotting, variogram construction and so on:

st_crs("EPSG:4326")
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

In sp/rgdal, attempts are made to ensure that axis order is in the form termed GIS, traditional, or visualization, that is always eastings-northings. From very recent rgdal commits (from rev. 1060), PROJ rather than GDAL is used for instantiating CRS:

library(sp)
cat(wkt(CRS("EPSG:4326")), "\n")
## GEOGCRS["WGS 84 (with axis order normalized for visualization)",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ID["EPSG",6326]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433],
##         ID["EPSG",8901]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]]]

The probability of confusion increases when coercing from sf to sp and vice-versa, with the representations most often remaining unchanged.

sf_from_sp <- st_crs(CRS("EPSG:4326"))
o <- strsplit(sf_from_sp$wkt, "\n")[[1]]
cat(paste(o[grep("CS|AXIS|ORDER", o)], collapse="\n"))
##     CS[ellipsoidal,2],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[1],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[2],

Both of these coercions are using the same underlying PROJ and GDAL versions, and the same PROJ metadata. Work on this question has not yet stabilized; we perhaps prefer all data to be GIS-order, but to be able to read/write from and to authority order.

sp_from_sf <- as(st_crs("EPSG:4326"), "CRS")
o <- strsplit(wkt(sp_from_sf), "\n")[[1]]
cat(paste(o[grep("CS|AXIS|ORDER", o)], collapse="\n"))
##     CS[ellipsoidal,2],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[1],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[2],

Current thinking is to avoid the EPSG:4326 axis order issue by recommending use of OGC:CRS84, which is the representation used in GeoJSON, and also known as urn:ogc:def:crs:OGC::CRS84. This specification is always eastings-northings:

cat(st_crs("OGC:CRS:84")$wkt, "\n")
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ID["EPSG",6326]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433],
##         ID["EPSG",8901]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]]]
cat(wkt(CRS("OGC:CRS84")), "\n")
## GEOGCRS["WGS 84 (CRS84)",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Not known."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["OGC","CRS84"]]

PROJ or GDAL SRS?

As just mentioned, from rgdal rev. 1060, PROJ is preferred for instantiating "CRS" objects, but GDAL may still be chosen; sf uses GDAL. On the previous slide, the sp and sf WKT2 renderings differ for this reason, but can be made equal. The two are also see as strictly equivalent by PROJ:

rgdal::set_prefer_proj(FALSE)
GDAL_SRS <- wkt(CRS("OGC:CRS84"))
all.equal(st_crs("OGC:CRS:84")$wkt, GDAL_SRS)
## [1] TRUE
rgdal::set_prefer_proj(TRUE)
cat(GDAL_SRS, "\n")
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ID["EPSG",6326]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433],
##         ID["EPSG",8901]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]]]
rgdal::compare_CRS(CRS("OGC:CRS84"), as(st_crs("OGC:CRS:84"), "CRS"))
##                       strict                   equivalent 
##                        FALSE                         TRUE 
## equivalent_except_axis_order 
##                         TRUE

Also see: https://twitter.com/geospacedman/status/1321206393152692229

Coordinate operations

Transformation in sf uses code in GDAL, which in turn uses functions in PROJ; in sp/rgdal, PROJ is used directly for transformation. In order to demonstrate more of what is happening, let us coerce these sf objects to sp (they are both planar with an x–y axis order):

b_pump_sp <- as(b_pump_sf, "Spatial")
b_pump_sp1 <- as(b_pump_sf1, "Spatial")

We will also set up a temporary directory for use with the on-demand grid download functionality in PROJ 7; this must be done before rgdal is loaded:

td <- tempfile()
dir.create(td)
Sys.setenv("PROJ_USER_WRITABLE_DIRECTORY"=td)
library(rgdal)

Areas of interest

In sf, areas-of-interest need to be given by the users, while in transformation and projection in rgdal, these are calculated from the object being projected or transformed. The provision of areas-of-interest is intended to reduce the number of candidate coordinate operations found by PROJ.

WKT <- wkt(b_pump_sp)
o <- list_coordOps(WKT, "EPSG:4326")
aoi0 <- project(t(unclass(bbox(b_pump_sp))), WKT, inv=TRUE)
aoi <- c(t(aoi0 + c(-0.1, +0.1)))
o_aoi <- list_coordOps(WKT, "EPSG:4326", area_of_interest=aoi)

rgdal::list_coordOps() accesses the PROJ metadata database to search through candidate coordinate operations, ranking them by accuracy, returning a data frame of operations. When an area-of-interest is provided, candidates not intersecting it are dropped. Coordinate operations that cannot be instantiated because of missing grids are also listed. Here without an area-of-interest: 9 candidate operations are found when the WKT string contains datum information. Of these, 8 may be instantiated, with 1 needing a grid. 4 operations cease to be candidates if we use an area-of-interest.

Coordinate operations

In sp/rgdal, the coordinate operation last used is returned, and can be retrieved using rgdal::get_last_coordOp(); coordinate operations are represented as pipelines (Knudsen and Evers 2017; Evers and Knudsen 2017), introduced in PROJ 5 and using the PROJ key-value pair notation:

b_pump_sp_ll <- spTransform(b_pump_sp, "OGC:CRS84")
cat(strwrap(get_last_coordOp()), sep="\n")
## +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601
## +x_0=400000 +y_0=-100000 +ellps=airy +step +proj=push +v_3 +step
## +proj=cart +ellps=airy +step +proj=helmert +x=446.448 +y=-125.157
## +z=542.06 +rx=0.15 +ry=0.247 +rz=0.842 +s=-20.489
## +convention=position_vector +step +inv +proj=cart +ellps=WGS84 +step
## +proj=pop +v_3 +step +proj=unitconvert +xy_in=rad +xy_out=deg

Here we can see that an inverse projection from the specified Transverse Mercator projection is made to geographical coordinates, followed by a seven-parameter Helmert transformation to WGS84 ellipsoid and datum. The parameters are contained in the best instantiable coordinate operation retrieved from the PROJ database. The +push +v_3 and +pop +v_3 operations are used when only horizontal components are needed in the Helmert transformation.

o <- list_coordOps(wkt(b_pump_sp1), "OGC:CRS84",
  area_of_interest=aoi)
b_pump_sp1_ll <- spTransform(b_pump_sp1, "OGC:CRS84")
cat(strwrap(get_last_coordOp()), sep="\n")
## +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601
## +x_0=400000 +y_0=-100000 +ellps=airy +step +proj=unitconvert +xy_in=rad
## +xy_out=deg

Going on to the case of the degraded representation, only 1 operation is found, with only ballpark accuracy. With our emulation of the dropping of +datum= support in GDAL’s exportToProj4(), we see that the coordinate operation pipeline only contains the inverse projection step, accounting for the observed shift of the Broad Street pump to Ingestre Place.

Using the content download network to access grids

Finally, sp/rgdal may use the provision of on-demand downloading of transformation grids to provide more accuracy (CDN, from PROJ 7, https://cdn.proj.org ). Before finding and choosing to use a coordinate operation using an on-demand downloaded grid, the designated directory is empty:

enable_proj_CDN()
## [1] "Using: /tmp/Rtmpkvg72D/file3ed76b3cfb2"
list.files(td)
## character(0)

Using the CDN, all the candidate operations are instantiable, and the pipeline now shows a horizontal grid transformation rather than a Helmert transformation.

b_pump_sp_llg <- spTransform(b_pump_sp, "OGC:CRS84")
cat(strwrap(get_last_coordOp()), sep="\n")
## +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601
## +x_0=400000 +y_0=-100000 +ellps=airy +step +proj=hgridshift
## +grids=uk_os_OSTN15_NTv2_OSGBtoETRS.tif +step +proj=unitconvert
## +xy_in=rad +xy_out=deg

Now the downloaded grid is cached in the database in the designated CDN directory, and may be used for other transformations using the same operation.

(fls <- list.files(td))
## [1] "cache.db"
file.size(file.path(td, fls[1]))
## [1] 319488
disable_proj_CDN()
ll <- st_as_sf(b_pump_sp_ll)
ll1 <- st_as_sf(b_pump_sp1_ll)
llg <- st_as_sf(b_pump_sp_llg)
sf_use_s2(FALSE)
c(st_distance(ll, ll1), st_distance(ll, llg))
## Units: [m]
## [1] 125.057812   1.751479

Once again, the distance between the point transformed from the sf object as read from file and the point with the degraded coordinate reference system emulating the effect of the change in behaviour of GDAL’s exportToProj4() in GDAL 6 and later is about 125m. Using the CDN shifts the output point by 1.7m, here using both the new s2 and legacy interfaces for measurements in sf using geographical coordinates (Dunnington, Pebesma, and Rubak 2020):

sf_use_s2(TRUE)
c(st_distance(ll, ll1), st_distance(ll, llg))
## Units: [m]
## [1] 124.728570   1.745973
sf_use_s2(FALSE)

CRS representation: status

  • Although it appears that most of the consequences of the change in representation of coordinate reference systems from Proj4 to WKT2 strings have now been addressed, we still see signs on the mailing list and on Twitter that users, naturally directing their attention to their analytical or visualization work, may still be confused.

  • The extent of the spatial cluster of R packages is so great that it will undoubtedly take time before the dust settles.

  • However, we trust that the operation of upgrading representations is now largely complete.

  • Multiple warnings issued in sp workflows, now noisily drawing attention to possible degradations in workflows, will by default be muted when sp 1.5 and rgdal 1.6 are released.

Bivand, Roger. 2014. “Geocomputation and Open Source Software: Components and Software Stacks.” In Geocomputation, edited by Robert J. Abrahart and Linda M. See, 329–55. Boca Raton: CRC Press. http://hdl.handle.net/11250/163358.

Bivand, Roger, Edzer Pebesma, and Virgilio Gomez-Rubio. 2008. Applied Spatial Data Analysis with R. Springer, NY. https://asdar-book.org/.

———. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY. https://asdar-book.org/.

Brody, H., M. R. Rip, P. Vinten-Johansen, N. Paneth, and S. Rachman. 2000. “Map-Making and Myth-Making in Broad Street: The London Cholera Epidemic, 1854.” Lancet 356: 64–68.

Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2018. Leaflet: Create Interactive Web Maps with the Javascript ’Leaflet’ Library. https://CRAN.R-project.org/package=leaflet.

Dunnington, Dewey, Edzer Pebesma, and Ege Rubak. 2020. S2: Spherical Geometry Operators Using the S2 Geometry Library; Package Version 1.0.2. https://CRAN.R-project.org/package=s2.

Evers, Kristian, and Thomas Knudsen. 2017. Transformation Pipelines for Proj.4. https://www.fig.net/resources/proceedings/fig\_proceedings/fig2017/papers/iss6b/ISS6B\_evers\_knudsen\_9156.pdf.

Iliffe, Jonathan, and Roger Lott. 2008. Datums and Map Projections: For Remote Sensing, GIS and Surveying. Boca Raton: CRC.

ISO. 2004. ISO 19111:2019 Geographic Information – Referencing by Coordinates. https://www.iso.org/standard/74039.html.

Knudsen, Thomas, and Kristian Evers. 2017. Transformation Pipelines for Proj.4. https://meetingorganizer.copernicus.org/EGU2017/EGU2017-8050.pdf.

O’Sullivan, D., and D. J. Unwin. 2003. Geographical Information Analysis. Hoboken, NJ: Wiley.