Required current contributed CRAN packages:

I am running R 3.6.1, with recent update.packages().

needed <- c("RSQLite", "mapview", "sf", "rgdal", "sp")

Script

Script and data at https://github.com/rsbivand/ECS530_h19/raw/master/ECS530_III.zip. Download to suitable location, unzip and use as basis.

Schedule

  • 2/12 (I) Spatial data representation, (II) Support+topology, input/output

  • 3/12 (III) Coordinate reference systems, (IV) Visualization

  • 4/12 (V) R/GIS interfaces, project surgery

  • 5/12 (VI) Spatial autocorrelation, (VII) Spatial regression

  • 6/12 (VIII) Interpolation, point processes, project surgery

  • 7/12 Presentations

Session III

  • 09:15-09:45 Coordinate reference systems: background

  • 09:45-10:15 Modernising PROJ and issues

  • 10:15-11:00 Proposed developments (using sp and rgdal as prototypes)

Coordinate reference systems: background

The usefulness of spatial data is linked to knowing its coordinate reference system. The coordinate reference system may be geographic, usually measured in decimal degrees, or projected, layered on a known geographic CRS, usually measured in metres (planar). The underlying geographical CRS must specify an ellipsoid, with associated major and minor axis lengths:

library(sp)
library(rgdal)
## rgdal: version: 1.5-2, (SVN revision 896)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 3.0.2, released 2019/10/28
##  Path to GDAL shared files: /usr/local/share/gdal
##  GDAL binary built with GEOS: TRUE 
##  Loaded PROJ.4 runtime: Rel. 6.2.1, November 1st, 2019, [PJ_VERSION: 621]
##  Path to PROJ.4 shared files: /usr/local/share/proj
##  Linking to sp version: 1.3-3
projInfo("ellps")
##         name          major                  ell
## 1      MERIT    a=6378137.0           rf=298.257
## 2      SGS85    a=6378136.0           rf=298.257
## 3      GRS80    a=6378137.0     rf=298.257222101
## 4      IAU76    a=6378140.0           rf=298.257
## 5       airy  a=6377563.396       rf=299.3249646
## 6     APL4.9    a=6378137.0            rf=298.25
## 7      NWL9D    a=6378145.0            rf=298.25
## 8   mod_airy  a=6377340.189        b=6356034.446
## 9     andrae   a=6377104.43             rf=300.0
## 10    danish a=6377019.2563             rf=300.0
## 11   aust_SA    a=6378160.0            rf=298.25
## 12     GRS67    a=6378160.0    rf=298.2471674270
## 13   GSK2011    a=6378136.5       rf=298.2564151
## 14    bessel  a=6377397.155       rf=299.1528128
## 15  bess_nam  a=6377483.865       rf=299.1528128
## 16    clrk66    a=6378206.4          b=6356583.8
## 17    clrk80  a=6378249.145          rf=293.4663
## 18 clrk80ign    a=6378249.2 rf=293.4660212936269
## 19       CPM    a=6375738.7            rf=334.29
## 20    delmbr     a=6376428.             rf=311.5
## 21   engelis   a=6378136.05          rf=298.2566
## 22   evrst30  a=6377276.345          rf=300.8017
## 23   evrst48  a=6377304.063          rf=300.8017
## 24   evrst56  a=6377301.243          rf=300.8017
## 25   evrst69  a=6377295.664          rf=300.8017
## 26   evrstSS  a=6377298.556          rf=300.8017
## 27   fschr60     a=6378166.             rf=298.3
## 28  fschr60m     a=6378155.             rf=298.3
## 29   fschr68     a=6378150.             rf=298.3
## 30   helmert     a=6378200.             rf=298.3
## 31     hough    a=6378270.0              rf=297.
## 32      intl    a=6378388.0              rf=297.
## 33     krass    a=6378245.0             rf=298.3
## 34     kaula     a=6378163.            rf=298.24
## 35     lerch     a=6378139.           rf=298.257
## 36     mprts     a=6397300.              rf=191.
## 37  new_intl    a=6378157.5          b=6356772.2
## 38   plessis     a=6376523.           b=6355863.
## 39      PZ90    a=6378136.0         rf=298.25784
## 40    SEasia    a=6378155.0       b=6356773.3205
## 41   walbeck    a=6376896.0       b=6355834.8467
## 42     WGS60    a=6378165.0             rf=298.3
## 43     WGS66    a=6378145.0            rf=298.25
## 44     WGS72    a=6378135.0            rf=298.26
## 45     WGS84    a=6378137.0     rf=298.257223563
## 46    sphere    a=6370997.0          b=6370997.0
##                        description
## 1                       MERIT 1983
## 2        Soviet Geodetic System 85
## 3             GRS 1980(IUGG, 1980)
## 4                         IAU 1976
## 5                        Airy 1830
## 6              Appl. Physics. 1965
## 7         Naval Weapons Lab., 1965
## 8                    Modified Airy
## 9       Andrae 1876 (Den., Iclnd.)
## 10  Andrae 1876 (Denmark, Iceland)
## 11 Australian Natl & S. Amer. 1969
## 12               GRS 67(IUGG 1967)
## 13                        GSK-2011
## 14                     Bessel 1841
## 15           Bessel 1841 (Namibia)
## 16                     Clarke 1866
## 17                Clarke 1880 mod.
## 18              Clarke 1880 (IGN).
## 19 Comm. des Poids et Mesures 1799
## 20         Delambre 1810 (Belgium)
## 21                    Engelis 1985
## 22                    Everest 1830
## 23                    Everest 1948
## 24                    Everest 1956
## 25                    Everest 1969
## 26       Everest (Sabah & Sarawak)
## 27    Fischer (Mercury Datum) 1960
## 28           Modified Fischer 1960
## 29                    Fischer 1968
## 30                    Helmert 1906
## 31                           Hough
## 32    International 1909 (Hayford)
## 33                Krassovsky, 1942
## 34                      Kaula 1961
## 35                      Lerch 1979
## 36                 Maupertius 1738
## 37          New International 1967
## 38           Plessis 1817 (France)
## 39                           PZ-90
## 40                  Southeast Asia
## 41                         Walbeck
## 42                          WGS 60
## 43                          WGS 66
## 44                          WGS 72
## 45                          WGS 84
## 46       Normal Sphere (r=6370997)

Other parameters should be specified, such as the prime meridian, often taken as Greenwich. Before PROJ version 6, legacy PROJ (and GDAL) used a +datum= tag introduced after the library migrated beyond USGS (around version 4.4). The underlying problem was not that projection and inverse projection could not be carried out between projected CRS and geograpghical CRS, but that national mapping agencies defined often many datums, keying the specification of a geographical CRS to a national or regional datum. Some of these, especially for North America, were supported, but support for others was patchy. The +datum= tag supported a partly informal listing of values, themselves linked to three or seven coefficient datum transformation sets, used through the +towgs84= tag. Coefficient lookup through the +datum= tag, or direct specification of coefficients through the +towgs84= tag became a convenient way to handle datum transformation in addition to projection and inverse projection.

The default “hub” for transformation was to go through the then newly achieved WGS84 datum. Spatial data files often encoded the geographic and projected CRS with reference to these values, in some cases using PROJ 4 strings. These used a pseudo projection +proj=longlat to indicate a geographical CRS, and many other possible values of +proj= for projected CRS.

The Grids & Datums column in Photogrammetric Engineering & Remote Sensing gives insight into some of the peculiarities of national mapping agencies - authority is typically national but may be subnational:

data("GridsDatums")
GridsDatums[grep("Norway", GridsDatums$country),]
##                   country     month year ISO
## 17  The Kingdom of Norway (October) 1999 NOR
## 226                Norway    (July) 2017 NOR

Beyond this, the database successively developed by the European Petroleum Survey Group was copied to local CSV files for PROJ and GDAL, providing lookup by code number.

EPSG <- make_EPSG()
EPSG[grep("Oslo", EPSG$note), 1:2]
##       code                            note
## 515   4817                 NGO 1948 (Oslo)
## 2408 27391    NGO 1948 (Oslo) / NGO zone I
## 2409 27392   NGO 1948 (Oslo) / NGO zone II
## 2410 27393  NGO 1948 (Oslo) / NGO zone III
## 2411 27394   NGO 1948 (Oslo) / NGO zone IV
## 2412 27395    NGO 1948 (Oslo) / NGO zone V
## 2413 27396   NGO 1948 (Oslo) / NGO zone VI
## 2414 27397  NGO 1948 (Oslo) / NGO zone VII
## 2415 27398 NGO 1948 (Oslo) / NGO zone VIII
CRS("+init=epsg:4817")
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum NGO_1948_Oslo in CRS definition
## CRS arguments:
##  +proj=longlat +a=6377492.018 +rf=299.1528128 +pm=oslo +no_defs

The lookup prior to PROJ 6 used to provide a +towgs84= value of 278.3,93,474.5,7.889,0.05,-6.61,6.21, but in the new regime only reveals transformation coefficients in the context of a coordinate operation (only in the unreleased development version of rgdal):

list_coordOps("EPSG:4817", "EPSG:4326")
## Candidate coordinate operations found:  1 
## Strict containment:  FALSE 
## Visualization order:  TRUE 
## Source: EPSG:4817 
## Target: EPSG:4326 
## Best instantiable operation has accuracy: 3 m
## Description: axis order change (2D) + NGO 1948 (Oslo) to NGO1948 (1) + NGO
##              1948 to WGS 84 (1) + axis order change (2D)
## Definition:  +proj=pipeline +step +proj=unitconvert +xy_in=deg +xy_out=rad
##              +step +inv +proj=longlat +a=6377492.018
##              +rf=299.1528128 +pm=oslo +step +inv +proj=longlat
##              +a=6377492.018 +rf=299.1528128 +step +proj=push
##              +v_3 +step +proj=cart +a=6377492.018
##              +rf=299.1528128 +step +proj=helmert +x=278.3 +y=93
##              +z=474.5 +rx=7.889 +ry=0.05 +rz=-6.61 +s=6.21
##              +convention=position_vector +step +inv +proj=cart
##              +ellps=WGS84 +step +proj=pop +v_3 +step
##              +proj=unitconvert +xy_in=rad +xy_out=deg

Up to and including PROJ 5, downstream software, like sf and rgdal, have been able to rely on the provision of ad-hoc transformtion capabilities, with apparently predictable consequences. Everybody knew (or should have known) that each new release of the PROJ and GDAL CSV metadata files could update transformation coefficients enough to shift outcomes a little. Everyone further chose to ignore the timestamping of coordinates, or at least of datasets; we could guess (as above) that US Census tract boundaries for 1980 must use the NAD27 datum framework - suprisingly many used NAD83 anyway (both for Boston and the North Carolina SIDS data set).

Use of KML files to provide zoom and pan for these boundaries, and now leaflet and mapview exposes approximations mercilessly. Use of coefficients of transformation of an unknown degree of approximation, and authority “googled it” was reaching its limits, or likely had exceeded them.

sp classes used a PROJ string to define the CRS (in an S4 "CRS" object):

getClass("CRS")
## Class "CRS" [package "sp"]
## 
## Slots:
##                 
## Name:   projargs
## Class: character

while sf uses an S3 "crs" object with an integer EPSG code and a PROJ string; if instantiated from the EPSG code, both are provided, here for now retaining the fragile +towgs84= key because the central OGRSpatialReference function exportToProj4() is not (yet) being called (it is called when reading from file):

library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
st_crs(4326)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"

Modernising PROJ and issues

PROJ

Because so much open source (and other) software uses the PROJ library and framework, many are affected when PROJ upgrades. Until very recently, PROJ has been seen as very reliable, and the changes taking place now are intended to confirm and reinforce this reliability. Before PROJ 5 (PROJ 6 is out now, PROJ 7 is coming early in 2020), the +datum= tag was used, perhaps with +towgs84= with three or seven coefficients, and possibly +nadgrids= where datum transformation grids were available. However, transformations from one projection to another first inversed to longitude-latitude in WGS84, then projected on to the target projection.

Fast-forward 35 years and PROJ.4 is everywhere: It provides coordinate handling for almost every geospatial program, open or closed source. Today, we see a drastical increase in the need for high accuracy GNSS coordinate handling, especially in the agricultural and construction engineering sectors. This need for geodetic-accuracy transformations is not satisfied by “classic PROJ.4”. But with the ubiquity of PROJ.4, we can provide these transformations “everywhere”, just by implementing them as part of PROJ.4 (Evers and Knudsen 2017).

Escaping the WGS84 hub/pivot: PROJ and OGC WKT2

Following the introduction of geodetic modules and pipelines in PROJ 5 (Knudsen and Evers 2017; Evers and Knudsen 2017), PROJ 6 moves further. Changes in the legacy PROJ representation and WGS84 transformation hub have been coordinated through the GDAL barn raising initiative. Crucially WGS84 often ceases to be the pivot for moving between datums. A new OGC WKT is coming, and an SQLite EPSG file database has replaced CSV files. SRS will begin to support 3D by default, adding time too as SRS change. See also PROJ migration notes.

There are very useful postings on the PROJ mailing list from Martin Desruisseaux, first proposing clarifications and a follow-up including a summary:

  • “Early binding” ≈ hub transformation technique.
  • “Late binding” ≈ hub transformation technique NOT used, replaced by a more complex technique consisting in searching parameters in the EPSG database after the transformation context (source, target, epoch, area of interest) is known.
  • The problem of hub transformation technique is independent of WGS84. It is caused by the fact that transformations to/from the hub are approximate. Any other hub we could invent in replacement of WGS84 will have the same problem, unless we can invent a hub for which transformations are exact (I think that if such hub existed, we would have already heard about it).

The solution proposed by ISO 19111 (in my understanding) is:

  • Forget about hub (WGS84 or other), unless the simplicity of early-binding is considered more important than accuracy.
  • Associating a CRS to a coordinate set (geometry or raster) is no longer sufficient. A {CRS, epoch} tuple must be associated. ISO 19111 calls this tuple “Coordinate metadata”. From a programmatic API point of view, this means that getCoordinateReferenceSystem() method in Geometry objects (for instance) needs to be replaced by a getCoordinateMetadata() method.

Upstream software dependencies of the R-spatial ecosystem

When changes occur in upstream external software, R packages using these libraries often need to adapt, but package maintainers try very hard to shield users from any consequences, so that legacy workflows continue to provide the same or at least similar results from the same data.

The code shown in (Bivand, Pebesma, and Gomez-Rubio 2008, 2013) is almost all run nightly on a platform with updated R packages and external software.

This does not necessarily trap all differences (figures are not compared), but is helpful in detecting impacts of changes in packages or external software.

It is also very helpful that CRAN servers using the released and development versions of R, and with different levels of external software also run nightly checks.

Again, sometimes changes are only noticed by users, but quite often checks run by maintainers and by CRAN alert us to impending challenges.

Tracking the development mailing lists of the external software communities, all open source, can also show how thinking is evolving, although sometimes code tidying in external software can have unexpected consequences, breaking not sf or sp with rgdal or rgeos, but a package further downstream.

(Bivand 2014) discusses open source geospatial software stacks more generally, but here we will consider ongoing changes in PROJ.

(Knudsen and Evers 2017; Evers and Knudsen 2017) not only point out how the world has changed since a World Geodetic System of 1984 (WGS84) was adopted as a hub for coordinate transformation in PROJ, but also introduced transformation pipelines.

In using a transformation hub, PROJ had worked adequately when the errors introduced by transforming first to WGS84 and then from WGS84 to the target coordinate reference system, but with years passing from 1984, the world has undergone sufficient tectonic shifts for errors to increase.

In addition, the need for precision has risen in agriculture and engineering. So PROJ, as it was, risked ceasing to be fit for purpose as a fundamental component of the geospatial open source software stack.

Following major changes in successive iterations of the international standards for coordinate reference systems (ISO 2004), PROJ is changing from preferring “early-binding” transformations, pivoting through a known transformation hub in going from input to target coordinate reference systems, to “late-binding” transformations (early/late were reversed before 2020-04-09, thanks to Floris Vanderhaeghe for the correction).

This means that the user may be offered alternative paths from input to target coordinate reference systems, some of which may go directly, and more will use higher precision transformation grids, enlarging the existing practice of using North American Datum (NAD) grids.

In other cases, three or seven coefficient transformations may be offered, but the default fallback, where little is known about the input or target specification, may be less satisfactory than PROJ has previously offered.

PROJ will also become more tightly linked to authorities responsible for the specification components. While the original well-known text (WKT1) descriptions also contained authorities, WKT2-2018 is substantially more stringent. PROJ continues to use the European Petroleum Survey Group (EPSG) database, the local copy PROJ uses is now an SQLite database, with a large number of tables:

library(RSQLite)
db <- dbConnect(SQLite(), dbname="/usr/local/share/proj/proj.db")
cat(strwrap(paste(dbListTables(db), collapse=", ")), sep="\n")
## alias_name, area, authority_list, authority_to_authority_preference,
## axis, celestial_body, compound_crs, concatenated_operation, conversion,
## conversion_method, conversion_param, conversion_table,
## coordinate_operation_method, coordinate_operation_view,
## coordinate_operation_with_conversion_view, coordinate_system, crs_view,
## deprecation, ellipsoid, geodetic_crs, geodetic_datum,
## grid_alternatives, grid_packages, grid_transformation,
## helmert_transformation, helmert_transformation_table, metadata,
## object_view, other_transformation, prime_meridian, projected_crs,
## supersession, unit_of_measure, vertical_crs, vertical_datum
dbDisconnect(db)

Grid CDN mechanism

Current discussions now relate to mechanisms for caching downloaded grids, and advertising their availability to all programs using PROJ, for example GRASS GIS or QGIS.

Up to now, PROJ metadata files have usually been stored in a directory with only read access for users.

New facilities have been added to add to the search path for PROJ metadata files, but downloading often bulky grid files on-the-fly is not seen as a sensible use of resources.

Transformation pipelines

In addition, the current iteration of the standard makes it more important to declare the epoch of interest of coordinates (when the position was recorded and how) and the region of interest.

A transformation pathway may have an undefined epoch and a global span, but cannot achieve optimal precision everywhere.

By bounding the region of interest say within a tectonic plate, and the epoch to a given five-year period, very high precision transformations may be possible.

These choices have not so far been required explicitly, but for example matching against the "area" table in the database may reduce the number of transformation pathways offered dramatically.

CRS status before GDAL3 and PROJ6

The initial use of coordinate reference systems for objects defined in sp was based on the PROJ.4 string representation, which built on a simplified key=value form.

Keys began with plus (+), and the value format depended on the key.

If essential keys were missing, some might be added by default from a file that has now been eliminated as misleading; if +ellps= was missing and not added internally from other keys, +ellps=WGS84 would be added silently.

Accurate coordinate transformation has always been needed for the integration of data from different sources, but has become much more pressing as web mapping has become available in R, through the leaflet package (Cheng, Karambelkar, and Xie 2018), on which mapview and the "view" mode of tmap.

As web mapping provides zooming and panning, possible infelicities that were too small to detect as mismatches in transformation jump into prominence.

The web mapping workflow transforms input objects to EPSG:4326 (geographical CRS WGS 84, World area of relevance, WGS84 datum) as expected by leaflet, then on to EPSG:3857 (WGS 84 / Pseudo-Mercator) for display on web map backgrounds (this is carried out internally in leaflet.

We’ll be using the Soho cholera data set; I converted the shapefiles from https://asdar-book.org/bundles2ed/die_bundle.zip to GPKG to be more modern (using ogr2ogr in GDAL 3 built against PROJ 6. sf is installed using the proj.h interface in PROJ 6:

buildings <- sf::st_read("snow/buildings.gpkg", quiet=TRUE)
st_crs(buildings)
## Coordinate Reference System:
##   No EPSG code
##   proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"

To make an interactive display in mapview(), conversion/transformation to “Web Mercator” is needed - this uses a WGS84 datum. But PROJ 6 has dropped the +datum= tag, so the display is not correctly registered.

library(mapview)
mapview(buildings)

The CRS/SRS values in the GPKG file (it is a multi-table SQLite database) include the datum definition:

library(RSQLite)
db = dbConnect(SQLite(), dbname="snow/buildings.gpkg")
dbReadTable(db, "gpkg_spatial_ref_sys")$definition[4]
## [1] "PROJCS[\"Transverse_Mercator\",GEOGCS[\"GCS_OSGB 1936\",DATUM[\"OSGB_1936\",SPHEROID[\"Airy 1830\",6377563.396,299.3249646,AUTHORITY[\"EPSG\",\"7001\"]],AUTHORITY[\"EPSG\",\"6277\"]],PRIMEM[\"Greenwich\",0],UNIT[\"Degree\",0.0174532925199433],AUTHORITY[\"EPSG\",\"4277\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",49],PARAMETER[\"central_meridian\",-2],PARAMETER[\"scale_factor\",0.999601],PARAMETER[\"false_easting\",400000],PARAMETER[\"false_northing\",-100000],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH]]"
dbDisconnect(db)

Maybe using rgdal which is built using PROJ 6 but the legacy proj_api.h interface, and the shapefile as shipped with ASDAR reproduction materials will help?

buildings1 <- rgdal::readOGR("snow/buildings.shp", verbose=FALSE)
## Warning in OGRSpatialRef(dsn, layer, morphFromESRI = morphFromESRI, dumpSRS =
## dumpSRS): Discarded datum OSGB_1936 in CRS definition: +proj=tmerc +lat_0=49
## +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs
## Warning in OGRSpatialRef(dsn = dsn, layer = layer, morphFromESRI =
## morphFromESRI, : Discarded datum OSGB_1936 in CRS definition: +proj=tmerc
## +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m
## +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum Unknown_based_on_Airy_1830_ellipsoid in CRS definition
proj4string(buildings1)
## [1] "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"

No, same problem:

mapview(buildings1)

But the shapefile has the datum definition:

readLines("snow/buildings.prj")
## [1] "PROJCS[\"Transverse_Mercator\",GEOGCS[\"GCS_OSGB 1936\",DATUM[\"D_OSGB_1936\",SPHEROID[\"Airy_1830\",6377563.396,299.3249646]],PRIMEM[\"Greenwich\",0],UNIT[\"Degree\",0.017453292519943295]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",49.00000000000002],PARAMETER[\"central_meridian\",-2],PARAMETER[\"scale_factor\",0.999601],PARAMETER[\"false_easting\",400000],PARAMETER[\"false_northing\",-100000],UNIT[\"Meter\",1]]"

There are a number of components to the PROJ/GDAL changes taking place.

One concerns the use of transformation pipelines to represent coordinate operations. These pipelines (there may be many candidates) vary by area of interest, accuracy of transformation coordinates if used, and the availability of grids.

A second component concerns the representation of CRS; if CRS are represented by PROJ strings, and go through the GDAL function OGRSpatialReference exportToProj4(), most +datum= tags will be stripped (see function documentation).

A third component adds area of interest and possibly epoch to the WKT2_2018 version of ISO 19111 as a forward-looking text representation of a CRS.

Proposed developments (using sp and rgdal as prototypes)

The current proposals now exposed in my fork of sp on github (>= 1.3-3) and the development version of rgdal on R-Forge involve the following steps, over and above backward compatibility (no change in handling CRS for PROJ < 6 and GDAL < 3; try to handle wrong case of GDAL < 3 with PROJ 6):

For PROJ >= 6 && GDAL >= 3: supplement the "CRS" PROJ string with a full WKT2_2018 representation. If the object is instantiated by reading a file through GDAL, then exportToProj4() will degrade most CRS when creating a PROJ string. The WKT string is stored as a comment() to the "CRS" object, which is permissable but not desirable in an S4 context, but is backwards compatible. We can see that the WKT string represents the CRS seen in the vector file:

comment(slot(buildings1, "proj4string"))
## [1] "PROJCRS[\"Transverse_Mercator\",BASEGEOGCRS[\"GCS_OSGB 1936\",DATUM[\"OSGB 1936\",ELLIPSOID[\"Airy 1830\",6377563.396,299.3249646,LENGTHUNIT[\"metre\",1]],ID[\"EPSG\",6277]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"Degree\",0.0174532925199433]]],CONVERSION[\"unnamed\",METHOD[\"Transverse Mercator\",ID[\"EPSG\",9807]],PARAMETER[\"Latitude of natural origin\",49,ANGLEUNIT[\"Degree\",0.0174532925199433],ID[\"EPSG\",8801]],PARAMETER[\"Longitude of natural origin\",-2,ANGLEUNIT[\"Degree\",0.0174532925199433],ID[\"EPSG\",8802]],PARAMETER[\"Scale factor at natural origin\",0.999601,SCALEUNIT[\"unity\",1],ID[\"EPSG\",8805]],PARAMETER[\"False easting\",400000,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8806]],PARAMETER[\"False northing\",-100000,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8807]]],CS[Cartesian,2],AXIS[\"(E)\",east,ORDER[1],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]],AXIS[\"(N)\",north,ORDER[2],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]]]"

Using the previous direct instantiation mechanism, we see that the PROJ string is degraded

(o <- CRS("+init=epsg:27700"))
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum OSGB_1936 in CRS definition
## CRS arguments:
##  +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
## +y_0=-100000 +ellps=airy +units=m +no_defs

but that the WKT2 payload is safe, and is actually better specified than that from file:

comment(o)
## [1] "PROJCRS[\"OSGB 1936 / British National Grid\",BASEGEOGCRS[\"OSGB 1936\",DATUM[\"OSGB 1936\",ELLIPSOID[\"Airy 1830\",6377563.396,299.3249646,LENGTHUNIT[\"metre\",1]]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"degree\",0.0174532925199433]],ID[\"EPSG\",4277]],CONVERSION[\"British National Grid\",METHOD[\"Transverse Mercator\",ID[\"EPSG\",9807]],PARAMETER[\"Latitude of natural origin\",49,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8801]],PARAMETER[\"Longitude of natural origin\",-2,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8802]],PARAMETER[\"Scale factor at natural origin\",0.9996012717,SCALEUNIT[\"unity\",1],ID[\"EPSG\",8805]],PARAMETER[\"False easting\",400000,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8806]],PARAMETER[\"False northing\",-100000,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8807]],ID[\"EPSG\",19916]],CS[Cartesian,2],AXIS[\"(E)\",east,ORDER[1],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]],AXIS[\"(N)\",north,ORDER[2],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]],USAGE[SCOPE[\"unknown\"],AREA[\"UK - Britain and UKCS 49°46'N to 61°01'N, 7°33'W to 3°33'E\"],BBOX[49.75,-9.2,61.14,2.88]]]"

The new rgdal::showSRID() function replaces the legacy rgdal::showWKT(), and is used internally in `rgdal::checkCRSArgs_ng(), which gets an additional argument to pass through a CRS string in a different format:

cat(showSRID("+init=epsg:27700", multiline="YES"), "\n")
## PROJCRS["OSGB 1936 / British National Grid",
##     BASEGEOGCRS["OSGB 1936",
##         DATUM["OSGB 1936",
##             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4277]],
##     CONVERSION["British National Grid",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",49,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-2,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996012717,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",400000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",-100000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]],
##         ID["EPSG",19916]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["UK - Britain and UKCS 49°46'N to 61°01'N, 7°33'W to 3°33'E"],
##         BBOX[49.75,-9.2,61.14,2.88]]]
showSRID("+init=epsg:27700", "PROJ")
## Warning in showSRID("+init=epsg:27700", "PROJ"): Discarded datum OSGB_1936 in
## CRS definition
## [1] "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"

While previously rgdal::checkCRSArgs() was called by sp::CRS() and checked or expanded the PROJ string, in the GDAL >= 3 and PROJ >= 6 setting, the new function will use rgdal::showSRID() to provide both a checked PROJ string and a WKT2 string, which are then used to populate the "CRS" object and its comment.

checkCRSArgs_ng
## function (uprojargs = NA_character_, SRS_string = NULL) 
## {
##     no_SRS <- is.null(SRS_string)
##     no_PROJ <- is.na(uprojargs)
##     res <- vector(mode = "list", length = 3L)
##     res[[1]] <- FALSE
##     res[[2]] <- NA_character_
##     res[[3]] <- NA_character_
##     if (!no_SRS) {
##         stopifnot(is.character(SRS_string))
##         stopifnot(length(SRS_string) == 1L)
##     }
##     if (!no_PROJ) {
##         stopifnot(is.character(uprojargs))
##         stopifnot(length(uprojargs) == 1L)
##     }
##     if (!no_SRS) {
##         uprojargs1 <- try(showSRID(SRS_string, format = "PROJ", 
##             multiline = "NO"))
##         if (inherits(uprojargs1, "try-error")) {
##             res[[1]] <- FALSE
##             res[[2]] <- NA_character_
##         }
##         else {
##             res[[1]] <- TRUE
##             res[[2]] <- uprojargs1
##         }
##         wkt2 <- try(showSRID(SRS_string, format = "WKT2", multiline = "NO"))
##         if (!inherits(wkt2, "try-error")) 
##             res[[3]] <- wkt2
##     }
##     else if (!no_PROJ) {
##         uprojargs <- sub("^\\s+", "", uprojargs)
##         uprojargs1 <- try(showSRID(uprojargs, format = "PROJ", 
##             multiline = "NO"))
##         if (inherits(uprojargs1, "try-error")) {
##             res[[1]] <- FALSE
##             res[[2]] <- NA_character_
##         }
##         else {
##             res[[1]] <- TRUE
##             res[[2]] <- uprojargs1
##         }
##         wkt2 <- try(showSRID(uprojargs, format = "WKT2", multiline = "NO"))
##         if (!inherits(wkt2, "try-error")) 
##             res[[3]] <- wkt2
##     }
##     res
## }
## <bytecode: 0x76671a8>
## <environment: namespace:rgdal>

Because sp::CRS() is called whenever an object is created, for example when reading from a file, newly instantiated "Spatial" objects should receive updated "CRS" objects with WKT2 comments.

The "Spatial" objects that will not receive updated "CRS" objects are those that have been serialized, as "RDA" or "RDS" objects, for example in package data directories, or for example from (https://gadm.org).

con <- url("https://biogeo.ucdavis.edu/data/gadm3.6/Rsp/gadm36_NOR_0_sp.rds")
nor <- readRDS(con)
close(con)
proj4string(nor)
## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
comment(slot(nor, "proj4string"))
## NULL

There is no automatic way to update these "CRS" objects, so user intervention will be needed to avoid possible degradation:

(o <- CRS(proj4string(nor)))
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum WGS_1984 in CRS definition,
##  but +towgs84= values preserved
## CRS arguments:
##  +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs
comment(o)
## [1] "BOUNDCRS[SOURCECRS[GEOGCRS[\"unknown\",DATUM[\"World Geodetic System 1984\",ELLIPSOID[\"WGS 84\",6378137,298.257223563,LENGTHUNIT[\"metre\",1]],ID[\"EPSG\",6326]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8901]],CS[ellipsoidal,2],AXIS[\"longitude\",east,ORDER[1],ANGLEUNIT[\"degree\",0.0174532925199433,ID[\"EPSG\",9122]]],AXIS[\"latitude\",north,ORDER[2],ANGLEUNIT[\"degree\",0.0174532925199433,ID[\"EPSG\",9122]]]]],TARGETCRS[GEOGCRS[\"WGS 84\",DATUM[\"World Geodetic System 1984\",ELLIPSOID[\"WGS 84\",6378137,298.257223563,LENGTHUNIT[\"metre\",1]]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"degree\",0.0174532925199433]],CS[ellipsoidal,2],AXIS[\"latitude\",north,ORDER[1],ANGLEUNIT[\"degree\",0.0174532925199433]],AXIS[\"longitude\",east,ORDER[2],ANGLEUNIT[\"degree\",0.0174532925199433]],ID[\"EPSG\",4326]]],ABRIDGEDTRANSFORMATION[\"Transformation from unknown to WGS84\",METHOD[\"Geocentric translations (geog2D domain)\",ID[\"EPSG\",9603]],PARAMETER[\"X-axis translation\",0,ID[\"EPSG\",8605]],PARAMETER[\"Y-axis translation\",0,ID[\"EPSG\",8606]],PARAMETER[\"Z-axis translation\",0,ID[\"EPSG\",8607]]]]"

The new function rgdal::list_coordOps() can be used to explore what alternatives are returned on searching the proj.db database and the grids present on the running platform. If all we are using is the degraded PROJ string, we only find one ballpark alternative, leading to the mis-placement of buildings in Soho (mapview now converts all "Spatial" objects to their sf equivalents, so buildings1 is also mis-placed):

(c0 <- list_coordOps(paste0(proj4string(buildings1), " +type=crs"), "EPSG:4326"))
## Candidate coordinate operations found:  1 
## Strict containment:  FALSE 
## Visualization order:  TRUE 
## Source: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000
##         +y_0=-100000 +ellps=airy +units=m +no_defs +type=crs
## Target: EPSG:4326 
## Best instantiable operation has only ballpark accuracy 
## Description: Inverse of unknown + Ballpark geographic offset from unknown to
##              WGS 84 + axis order change (2D)
## Definition:  +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
##              +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy
##              +step +proj=unitconvert +xy_in=rad +xy_out=deg

If instead we use the WKT comment, we get 7 alternatives, with the best available having 2m accuracy. A 1m accuracy alternative could be available if a grid (URL given) is downloaded and installed (this will follow later).

(c1 <- list_coordOps(comment(slot(buildings1, "proj4string")), "EPSG:4326"))
## Candidate coordinate operations found:  7 
## Strict containment:  FALSE 
## Visualization order:  TRUE 
## Source: PROJCRS["Transverse_Mercator", BASEGEOGCRS["GCS_OSGB 1936",
##         DATUM["OSGB 1936", ELLIPSOID["Airy 1830", 6377563.396,
##         299.3249646, LENGTHUNIT["metre", 1]], ID["EPSG",
##         6277]], PRIMEM["Greenwich", 0, ANGLEUNIT["Degree",
##         0.0174532925199433]]], CONVERSION["unnamed",
##         METHOD["Transverse Mercator", ID["EPSG", 9807]],
##         PARAMETER["Latitude of natural origin", 49,
##         ANGLEUNIT["Degree", 0.0174532925199433], ID["EPSG",
##         8801]], PARAMETER["Longitude of natural origin", -2,
##         ANGLEUNIT["Degree", 0.0174532925199433], ID["EPSG",
##         8802]], PARAMETER["Scale factor at natural origin",
##         0.999601, SCALEUNIT["unity", 1], ID["EPSG", 8805]],
##         PARAMETER["False easting", 400000, LENGTHUNIT["metre",
##         1], ID["EPSG", 8806]], PARAMETER["False northing",
##         -100000, LENGTHUNIT["metre", 1], ID["EPSG", 8807]]],
##         CS[Cartesian, 2], AXIS["(E)", east, ORDER[1],
##         LENGTHUNIT["metre", 1, ID["EPSG", 9001]]], AXIS["(N)",
##         north, ORDER[2], LENGTHUNIT["metre", 1, ID["EPSG",
##         9001]]]]
## Target: EPSG:4326 
## Best instantiable operation has accuracy: 2 m
## Description: Inverse of unnamed + OSGB 1936 to WGS 84 (6) + axis order
##              change (2D)
## Definition:  +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
##              +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy
##              +step +proj=push +v_3 +step +proj=cart +ellps=airy
##              +step +proj=helmert +x=446.448 +y=-125.157
##              +z=542.06 +rx=0.15 +ry=0.247 +rz=0.842 +s=-20.489
##              +convention=position_vector +step +inv +proj=cart
##              +ellps=WGS84 +step +proj=pop +v_3 +step
##              +proj=unitconvert +xy_in=rad +xy_out=deg
## Operation 6 is lacking 1 grid with accuracy 1 m
## Missing grid: OSTN15_NTv2_OSGBtoETRS.gsb 
## URL: https://download.osgeo.org/proj/proj-datumgrid-europe-latest.zip

rgdal::spTransform() methods have been supplemented to use CRS comments if available, falling back on PROJ strings. The coordinate operation chosen (the best available on the running platform) is searched for and found once, and re-used for all the geometries in the object. The last coordinate operation used is also cached, but it would be up to the users to re-use this pipeline if desired (probably a class for pipeline objects is required):

buildings1_ll <- spTransform(buildings1, CRS("+init=epsg:4326"))
get(".last_coordOp", envir=rgdal:::.RGDAL_CACHE)
## [1] "+proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +step +proj=push +v_3 +step +proj=cart +ellps=airy +step +proj=helmert +x=446.448 +y=-125.157 +z=542.06 +rx=0.15 +ry=0.247 +rz=0.842 +s=-20.489 +convention=position_vector +step +inv +proj=cart +ellps=WGS84 +step +proj=pop +v_3 +step +proj=unitconvert +xy_in=rad +xy_out=deg"

In these cases we have not so far been further confused by axis swapping - we do not yet know how this may affect workflows.

Having transformed the building outlines using the CRS WKT comment, we have retrieved 2m accuracy:

mapview(buildings1_ll)

Let’s try the Broad Street pump itself: does the proposed solution deliver a work-around (and arguably a robust solution going forward)?

bp <- readOGR("snow/b_pump.gpkg")
## Warning in OGRSpatialRef(dsn, layer, morphFromESRI = morphFromESRI, dumpSRS =
## dumpSRS): Discarded datum OSGB_1936 in CRS definition: +proj=tmerc +lat_0=49
## +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs
## OGR data source with driver: GPKG 
## Source: "/home/rsb/und/ecs530/h19/snow/b_pump.gpkg", layer: "b_pump"
## with 1 features
## It has 1 fields
## Integer64 fields read as strings:  cat
## Warning in OGRSpatialRef(dsn = dsn, layer = layer, morphFromESRI =
## morphFromESRI, : Discarded datum OSGB_1936 in CRS definition: +proj=tmerc
## +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m
## +no_defs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
## datum Unknown_based_on_Airy_1830_ellipsoid in CRS definition
comment(slot(bp, "proj4string"))
## [1] "PROJCRS[\"Transverse_Mercator\",BASEGEOGCRS[\"GCS_OSGB 1936\",DATUM[\"OSGB 1936\",ELLIPSOID[\"Airy 1830\",6377563.396,299.3249646,LENGTHUNIT[\"metre\",1]]],PRIMEM[\"Greenwich\",0,ANGLEUNIT[\"Degree\",0.0174532925199433]],ID[\"EPSG\",4277]],CONVERSION[\"unnamed\",METHOD[\"Transverse Mercator\",ID[\"EPSG\",9807]],PARAMETER[\"Latitude of natural origin\",49,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8801]],PARAMETER[\"Longitude of natural origin\",-2,ANGLEUNIT[\"degree\",0.0174532925199433],ID[\"EPSG\",8802]],PARAMETER[\"Scale factor at natural origin\",0.999601,SCALEUNIT[\"unity\",1],ID[\"EPSG\",8805]],PARAMETER[\"False easting\",400000,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8806]],PARAMETER[\"False northing\",-100000,LENGTHUNIT[\"metre\",1],ID[\"EPSG\",8807]]],CS[Cartesian,2],AXIS[\"easting\",east,ORDER[1],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]],AXIS[\"northing\",north,ORDER[2],LENGTHUNIT[\"metre\",1,ID[\"EPSG\",9001]]]]"

The untreated view (coercion to sf ignoring CRS comment then ballpark transformation to 4326 before internal conversion to pseudo-Mercator):

mapview(bp)

If we transform using the CRS WKT comment, we retrieve the pre-PROJ6/GDAL3 position, but can sharpen this if we download and use a higher precision grid:

mapview(spTransform(bp, CRS("+init=epsg:4326")))

A further problem is that packages like raster and mapview use verbatim PROJ strings to check CRS equivalence. This has not been resolved.

Bivand, Roger. 2014. “Geocomputation and Open Source Software: Components and Software Stacks.” In Geocomputation, edited by Robert J. Abrahart and Linda M. See, 329–55. Boca Raton: CRC Press. http://hdl.handle.net/11250/163358.

Bivand, Roger, Edzer Pebesma, and Virgilio Gomez-Rubio. 2008. Applied Spatial Data Analysis with R. Springer, NY. https://asdar-book.org/.

———. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY. https://asdar-book.org/.

Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2018. Leaflet: Create Interactive Web Maps with the Javascript ’Leaflet’ Library. https://CRAN.R-project.org/package=leaflet.

Evers, Kristian, and Thomas Knudsen. 2017. Transformation Pipelines for Proj.4. https://www.fig.net/resources/proceedings/fig\_proceedings/fig2017/papers/iss6b/ISS6B\_evers\_knudsen\_9156.pdf.

ISO. 2004. ISO 19111:2019 Geographic Information – Referencing by Coordinates. https://www.iso.org/standard/74039.html.

Knudsen, Thomas, and Kristian Evers. 2017. Transformation Pipelines for Proj.4. https://meetingorganizer.copernicus.org/EGU2017/EGU2017-8050.pdf.