td <- tempfile()
dir.create(td)
Sys.setenv("PROJ_USER_WRITABLE_DIRECTORY"=td)
#Sys.setenv("PROJ_NETWORK"="ON")
options("rgdal_show_exportToProj4_warnings"="none")
All the material presented here, to the extent it is original, is available under CC-BY-SA.
I am running R 4.1.2, with recent update.packages()
.
needed <- c("ggplot2", "mapsf", "colorspace", "RColorBrewer", "classInt",
"tmap", "rgdal", "mapview", "sp", "RSQLite", "sf")
Script and data at https://github.com/rsbivand/ECS530_h21/raw/main/ECS530_211116.zip. Download to suitable location, unzip and use as basis.
The usefulness of spatial data is linked to knowing its coordinate reference system. The coordinate reference system may be geographic, usually measured in decimal degrees, or projected, layered on a known geographic CRS, usually measured in metres (planar). The underlying geographical CRS must specify an ellipsoid, with associated major and minor axis lengths:
library(sf)
## Linking to GEOS 3.10.1, GDAL 3.4.0, PROJ 8.2.0
#library(rgdal)
sf_extSoftVersion()
## GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
## "3.10.1" "3.4.0" "8.2.0" "true" "true"
## PROJ
## "8.2.0"
head(sf_proj_info("ellps"))
## name major ell description
## 1 MERIT a=6378137.0 rf=298.257 MERIT 1983
## 2 SGS85 a=6378136.0 rf=298.257 Soviet Geodetic System 85
## 3 GRS80 a=6378137.0 rf=298.257222101 GRS 1980(IUGG, 1980)
## 4 IAU76 a=6378140.0 rf=298.257 IAU 1976
## 5 airy a=6377563.396 rf=299.3249646 Airy 1830
## 6 APL4.9 a=6378137.0 rf=298.25 Appl. Physics. 1965
Other parameters should be specified, such as the prime meridian, often taken as Greenwich. Before PROJ version 6, legacy PROJ (and GDAL) used a +datum=
tag introduced after the library migrated beyond USGS (around version 4.4). The underlying problem was not that projection and inverse projection could not be carried out between projected CRS and geograpghical CRS, but that national mapping agencies defined often many datums, keying the specification of a geographical CRS to a national or regional datum. Some of these, especially for North America, were supported, but support for others was patchy. The +datum=
tag supported a partly informal listing of values, themselves linked to three or seven coefficient datum transformation sets, used through the +towgs84=
tag. Coefficient lookup through the +datum=
tag, or direct specification of coefficients through the +towgs84=
tag became a convenient way to handle datum transformation in addition to projection and inverse projection.
The default “hub” for transformation was to go through the then newly achieved WGS84 datum. Spatial data files often encoded the geographic and projected CRS with reference to these values, in some cases using PROJ 4 strings. These used a pseudo projection +proj=longlat
to indicate a geographical CRS, and many other possible values of +proj=
for projected CRS.
The Grids & Datums column in Photogrammetric Engineering & Remote Sensing gives insight into some of the peculiarities of national mapping agencies - authority is typically national but may be subnational:
data("GridsDatums", package="rgdal")
GridsDatums[grep("Norway", GridsDatums$country),]
## country month year ISO
## 17 The Kingdom of Norway (October) 1999 NOR
## 226 Norway (July) 2017 NOR
Beyond this, the database successively developed by the European Petroleum Survey Group was copied to local CSV files for PROJ and GDAL, providing lookup by code number. From PROJ 6, GDAL no longer uses these CSV files. PROJ continues to use the European Petroleum Survey Group (EPSG) database, supplemented from other sources, the local copy PROJ uses is now an SQLite database, with a large number of tables:
library(RSQLite)
(DB0 <- sf_proj_search_paths())
## [1] "/tmp/RtmpNeLEVm/file26d0a951b362fb" "/usr/local/share/proj"
## [3] "/usr/local/share/proj"
DB <- file.path(DB0[length(DB0)], "proj.db")
db <- dbConnect(SQLite(), dbname=DB)
cat(strwrap(paste(dbListTables(db), collapse=", ")), sep="\n")
## alias_name, authority_list, authority_to_authority_preference, axis,
## celestial_body, compound_crs, concatenated_operation,
## concatenated_operation_step, conversion, conversion_method,
## conversion_param, conversion_table, coordinate_operation_method,
## coordinate_operation_view, coordinate_operation_with_conversion_view,
## coordinate_system, crs_view, deprecation, ellipsoid, extent,
## geodetic_crs, geodetic_datum, geodetic_datum_ensemble_member,
## geoid_model, grid_alternatives, grid_packages, grid_transformation,
## helmert_transformation, helmert_transformation_table, metadata,
## object_view, other_transformation, prime_meridian, projected_crs,
## scope, sqlite_stat1, supersession, unit_of_measure, usage,
## versioned_auth_name_mapping, vertical_crs, vertical_datum,
## vertical_datum_ensemble_member
PROJ.DB <- dbReadTable(db, "crs_view")
## Warning in result_fetch(res@ptr, n = n): Column `code`: mixed type, first seen
## values of type integer, coercing other values of type string
dbDisconnect(db)
PROJ.DB[grep("Norway", PROJ.DB$name), 2:5]
## auth_name code name type
## 4354 EPSG 5939 WGS 84 / EPSG Norway Polar Stereographic projected
## 7547 ESRI 102101 NGO_1948_Norway_Zone_1 projected
## 7548 ESRI 102102 NGO_1948_Norway_Zone_2 projected
## 7549 ESRI 102103 NGO_1948_Norway_Zone_3 projected
## 7550 ESRI 102104 NGO_1948_Norway_Zone_4 projected
## 7551 ESRI 102105 NGO_1948_Norway_Zone_5 projected
## 7552 ESRI 102106 NGO_1948_Norway_Zone_6 projected
## 7553 ESRI 102107 NGO_1948_Norway_Zone_7 projected
## 7554 ESRI 102108 NGO_1948_Norway_Zone_8 projected
## 11781 EPSG 9672 CD Norway depth vertical
PROJ.DB[grep("Oslo", PROJ.DB$name), 2:5]
## auth_name code name type
## 515 EPSG 4817 NGO 1948 (Oslo) geographic 2D
## 6488 EPSG 27391 NGO 1948 (Oslo) / NGO zone I projected
## 6489 EPSG 27392 NGO 1948 (Oslo) / NGO zone II projected
## 6490 EPSG 27393 NGO 1948 (Oslo) / NGO zone III projected
## 6491 EPSG 27394 NGO 1948 (Oslo) / NGO zone IV projected
## 6492 EPSG 27395 NGO 1948 (Oslo) / NGO zone V projected
## 6493 EPSG 27396 NGO 1948 (Oslo) / NGO zone VI projected
## 6494 EPSG 27397 NGO 1948 (Oslo) / NGO zone VII projected
## 6495 EPSG 27398 NGO 1948 (Oslo) / NGO zone VIII projected
## 7584 ESRI 102138 NGO_1948_Oslo_Kommune projected
## 7893 ESRI 102450 NGO_1948_Oslo_Baerum_Kommune projected
## 7894 ESRI 102451 NGO_1948_Oslo_Bergenhalvoen projected
## 7895 ESRI 102452 NGO_1948_Oslo_Oslo_Kommune projected
Goodbye PROJ.4! How to specify a coordinate reference system in R?
st_crs("EPSG:25831")
## Coordinate Reference System:
## User input: EPSG:25831
## wkt:
## PROJCRS["ETRS89 / UTM zone 31N",
## BASEGEOGCRS["ETRS89",
## ENSEMBLE["European Terrestrial Reference System 1989 ensemble",
## MEMBER["European Terrestrial Reference Frame 1989"],
## MEMBER["European Terrestrial Reference Frame 1990"],
## MEMBER["European Terrestrial Reference Frame 1991"],
## MEMBER["European Terrestrial Reference Frame 1992"],
## MEMBER["European Terrestrial Reference Frame 1993"],
## MEMBER["European Terrestrial Reference Frame 1994"],
## MEMBER["European Terrestrial Reference Frame 1996"],
## MEMBER["European Terrestrial Reference Frame 1997"],
## MEMBER["European Terrestrial Reference Frame 2000"],
## MEMBER["European Terrestrial Reference Frame 2005"],
## MEMBER["European Terrestrial Reference Frame 2014"],
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[0.1]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4258]],
## CONVERSION["UTM zone 31N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",3,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["Engineering survey, topographic mapping."],
## AREA["Europe between 0°E and 6°E: Andorra; Belgium - onshore and offshore; Denmark - offshore; Germany - offshore; Jan Mayen - offshore; Norway including Svalbard - onshore and offshore; Spain - onshore and offshore."],
## BBOX[37,0,82.45,6.01]],
## ID["EPSG",25831]]
Up to and including PROJ 5, downstream software, like sf and rgdal, have been able to rely on the provision of ad-hoc transformation capabilities, with apparently predictable consequences. Everybody knew (or should have known) that each new release of the PROJ and GDAL CSV metadata files could update transformation coefficients enough to shift outcomes a little. Everyone further chose to ignore the timestamping of coordinates, or at least of datasets; we could guess (as above) that US Census tract boundaries for 1980 must use the NAD27 datum framework - suprisingly many used NAD83 anyway (both for Boston and the North Carolina SIDS data set).
Use of KML files to provide zoom and pan for these boundaries, and now leaflet and mapview exposes approximations mercilessly. Use of coefficients of transformation of an unknown degree of approximation, and authority “googled it” was reaching its limits, or likely had exceeded them.
sp classes used a PROJ string to define the CRS (in an S4 "CRS"
object):
getClassDef("CRS", package="sp")
## Loading required package: sp
## Class "CRS" [package "sp"]
##
## Slots:
##
## Name: projargs
## Class: character
sf used an S3 "crs"
object with an integer EPSG code and a PROJ string; if instantiated from the EPSG code, both were provided. In current sf, the "crs"
object has user input and wkt components, and methods to access the PROJ string and the EPSG code for backward compatibility, and from PROJ 8 and EPSG 10 we get datum ensembles:
st_crs(4326)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
Because so much open source (and other) software uses the PROJ library and framework, many are affected when PROJ upgrades. Until very recently, PROJ has been seen as very reliable, and the changes taking place now are intended to confirm and reinforce this reliability. Before PROJ 5 (PROJ 6 was released in 2019, PROJ 7 was released in March 2020), the +datum=
tag was used, perhaps with +towgs84=
with three or seven coefficients, and possibly +nadgrids=
where datum transformation grids were available. However, transformations from one projection to another first inversed to longitude-latitude in WGS84, then projected on to the target projection.
Fast-forward 35 years and PROJ.4 is everywhere: It provides coordinate handling for almost every geospatial program, open or closed source. Today, we see a drastical increase in the need for high accuracy GNSS coordinate handling, especially in the agricultural and construction engineering sectors. This need for geodetic-accuracy transformations is not satisfied by “classic PROJ.4.” But with the ubiquity of PROJ.4, we can provide these transformations “everywhere,” just by implementing them as part of PROJ.4 (Evers and Knudsen 2017).
Following the introduction of geodetic modules and pipelines in PROJ 5 (Knudsen and Evers 2017; Evers and Knudsen 2017), PROJ 6 moves further. Changes in the legacy PROJ representation and WGS84 transformation hub have been coordinated through the GDAL barn raising initiative. Crucially WGS84 often ceases to be the pivot for moving between datums. SRS will begin to support 3D by default, adding time too as SRS change. See also PROJ migration notes.
There are very useful postings on the PROJ mailing list from Martin Desruisseaux, first proposing clarifications and a follow-up including a summary:
- “Early binding” ≈ hub transformation technique.
- “Late binding” ≈ hub transformation technique NOT used, replaced by a more complex technique consisting in searching parameters in the EPSG database after the transformation context (source, target, epoch, area of interest) is known.
- The problem of hub transformation technique is independent of WGS84. It is caused by the fact that transformations to/from the hub are approximate. Any other hub we could invent in replacement of WGS84 will have the same problem, unless we can invent a hub for which transformations are exact (I think that if such hub existed, we would have already heard about it).
The solution proposed by ISO 19111 (in my understanding) is:
- Forget about hub (WGS84 or other), unless the simplicity of early-binding is considered more important than accuracy.
- Associating a CRS to a coordinate set (geometry or raster) is no longer sufficient. A {CRS, epoch} tuple must be associated. ISO 19111 calls this tuple “Coordinate metadata.” From a programmatic API point of view, this means that getCoordinateReferenceSystem() method in Geometry objects (for instance) needs to be replaced by a getCoordinateMetadata() method.
This page gives a picture of why the changes in PROJ matter - the arrows are in cm per year displacement:
To handle this level of detail, PROJ 7 introduces an on-demand content delivery network under user control as an alternative to dowloading many possibly unnecessary time-specific vertical and horizontal transformation grids. PROJ 8 will introduce datum ensembles, particularly for commonly used datums such as EPSG:4326, which have had (especially in Noth America) multiple, differing, instantiations as time has progressed; for now we do not know what the consequences of this will be - it is already in the EPSG database.
When changes occur in upstream external software, R packages using these libraries often need to adapt, but package maintainers try very hard to shield users from any consequences, so that legacy workflows continue to provide the same or at least similar results from the same data.
The code shown in (Bivand, Pebesma, and Gomez-Rubio 2008, 2013) is almost all run nightly on a platform with updated R packages and external software.
This does not necessarily trap all differences (figures are not compared), but is helpful in detecting impacts of changes in packages or external software.
It is also very helpful that CRAN servers using the released and development versions of R, and with different levels of external software also run nightly checks.
Again, sometimes changes are only noticed by users, but quite often checks run by maintainers and by CRAN alert us to impending challenges.
Tracking the development mailing lists of the external software communities, all open source, can also show how thinking is evolving, although sometimes code tidying in external software can have unexpected consequences, breaking not sf or sp with rgdal or rgeos, but a package further downstream.
(Bivand 2014) discusses open source geospatial software stacks more generally, but here we will consider ongoing changes in PROJ, see also (Bivand 2020).
(Knudsen and Evers 2017; Evers and Knudsen 2017) not only point out how the world has changed since a World Geodetic System of 1984 (WGS84) was adopted as a hub for coordinate transformation in PROJ, but also introduced transformation pipelines.
In using a transformation hub, PROJ had worked adequately when the errors introduced by transforming first to WGS84 and then from WGS84 to the target coordinate reference system, but with years passing from 1984, the world has undergone sufficient tectonic shifts for errors to increase.
In addition, the need for precision has risen in agriculture and engineering. So PROJ, as it was, risked ceasing to be fit for purpose as a fundamental component of the geospatial open source software stack.
Following major changes in successive iterations of the international standards for coordinate reference systems (ISO 2004), PROJ is changing from preferring “late-binding” transformations, pivoting through a known transformation hub in going from input to target coordinate reference systems, to “early-binding” transformations.
This means that the user may be offered alternative paths from input to target coordinate reference systems, some of which may go directly, and more will use higher precision transformation grids, enlarging the existing practice of using North American Datum (NAD) grids.
In other cases, three or seven coefficient transformations may be offered, but the default fallback, where little is known about the input or target specification, may be less satisfactory than PROJ has previously offered.
The grid CDN is available at https://cdn.proj.org, and can be turned on for use in sf, where setting an environment variable also seems necessary: Sys.setenv("PROJ_NETWORK"="ON")
. Files are now stored in a single GTiff format (I think cloud-optimised).
In addition, the current iteration of the standard makes it more important to declare the epoch of interest of coordinates (when the position was recorded and how) and the region of interest.
A transformation pathway may have an undefined epoch and a global span, but cannot achieve optimal precision everywhere.
By bounding the region of interest say within a tectonic plate, and the epoch to a given five-year period, very high precision transformations may be possible.
These choices have not so far been required explicitly, but for example matching against the "area"
table in the database may reduce the number of transformation pathways offered dramatically.
The mapproj package provided coordinate reference system and projection support for the maps package. From mapproj/src/map.h
, line 20, we can see that the eccentricity of the Earth is defined as 0.08227185422
, corrresponding to the Clark 1866 ellipsoid (Iliffe and Lott 2008):
ellps <- sf_proj_info("ellps")
(clrk66 <- unlist(ellps[ellps$name=="clrk66",]))
## name major ell description
## "clrk66" "a=6378206.4" "b=6356583.8" "Clarke 1866"
With a very few exceptions, projections included in mapproj::mapproject()
use the Clarke 1866 ellipsoid, with the remainder using a sphere with the Clarke 1866 major axis radius. The function returns coordinates for visualization in an unknown metric; no inverse projections are available.
eval(parse(text=clrk66["major"]))
eval(parse(text=clrk66["ell"]))
print(sqrt((a^2-b^2)/a^2), digits=10)
## [1] 0.08227185422
PROJ developers not only point out how the world has changed since a World Geodetic System of 1984 (WGS84) was adopted as a hub for coordinate transformation in PROJ, but also introduced transformation pipelines (Knudsen and Evers 2017; Evers and Knudsen 2017).
In using a transformation hub, PROJ had worked adequately when the errors introduced by transforming first to WGS84 and then from WGS84 to the target coordinate reference system were acceptable, but with years passing from 1984, the world has undergone sufficient tectonic shifts for errors to increase, just as needs for accuracy sharpen.
In addition, the need for accuracy has risen in agriculture and engineering.
So PROJ, as it was, risked ceasing to be fit for purpose as a fundamental component of the geospatial open source software stack.
PROJ will also become more tightly linked to authorities responsible for the specification components. While the original well-known text (WKT1) descriptions also contained authorities, WKT2 (2019) is substantially more stringent.
The initial use of coordinate reference systems for objects defined in sp was based on the PROJ string representation, which built on a simplified key=value form.
Keys began with plus (+
), and the value format depended on the key.
If essential keys were missing, some might be added by default from a file that has now been eliminated as misleading; if +ellps=
was missing and not added internally from other keys, +ellps=WGS84
would be added silently to refer to the World Geodetic System 1984 ellipsoid definition.
Accurate coordinate transformation has always been needed for the integration of data from different sources, but has become much more pressing as web mapping has become available in R, through the leaflet package (Cheng, Karambelkar, and Xie 2021), on which mapview and the "view"
mode of tmap build.
As web mapping provides zooming and panning, possible infelicities that were too small to detect as mismatches in transformation jump into prominence.
The web mapping workflow transforms input objects to OGC:CRS84
(geographical CRS WGS 84, World area of relevance, WGS84 datum, visualization order) as expected by leaflet, then on to EPSG:3857
(WGS 84 / Pseudo-Mercator) for display on web map backgrounds (this is carried out internally in leaflet).
Objects shown in mapview and tmap are now coerced to sf classes, then st_transform()
transforms to OGC:CRS84
if necessary (or until we are sure EPSG:4326
never swaps axes).
John Snow did not use maps to find the Broad Street pump, the polluted water source behind the 1854 cholera epidemic in Soho in central London, because he associated cholera with water contaminated with sewage, based on earlier experience (Brody et al. 2000). The basic data to be used here were made available by Jim Detwiler, who had collated them for David O’Sullivan for use on the cover of O’Sullivan and Unwin (2003), based on earlier work by Waldo Tobler and others.
We’ll use the example of the location of the Broad Street pump in Soho, London, to be distributed with sf; the object has planar coordinates in the OSGB British National Grid projected CRS with the OSGB datum:
bp_file <- system.file("gpkg/b_pump.gpkg", package="sf")
b_pump_sf <- st_read(bp_file)
## Reading layer `b_pump' from data source `/home/rsb/lib/r_libs/sf/gpkg/b_pump.gpkg' using driver `GPKG'
## Simple feature collection with 1 feature and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 529393.5 ymin: 181020.6 xmax: 529393.5 ymax: 181020.6
## Projected CRS: Transverse_Mercator
Before R packages upgraded the way coordinate reference systems were represented in early 2020, our Proj4 string representation suffered degradation. Taking the Proj4 string defined in PROJ 5 for the British National Grid, there is a +datum=OSGB36
key-value pair. But when processing this input with PROJ 6 and GDAL 3, this key is removed. Checking, we can see that reading the input string appears to work, but the output for the Proj4 string drops the +datum=OSGB36
key-value pair, introducing instead the ellipse implied by that datum:
proj5 <- paste0("+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717",
" +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs")
legacy <- st_crs(proj5)
proj6 <- legacy$proj4string
proj5_parts <- unlist(strsplit(proj5, " "))
proj6_parts <- unlist(strsplit(proj6, " "))
proj5_parts[!is.element(proj5_parts, proj6_parts)]
## [1] "+datum=OSGB36"
proj6_parts[!is.element(proj6_parts, proj5_parts)]
## [1] "+ellps=airy"
We can emulate the problem seen following the release in May 2019 of GDAL 3.0.0 using PROJ 6, by inserting the degraded Proj4 string into the Broad Street pump object. The coordinate reference system representation is now ignorant of the proper datum specification. The apparent "proj4string"
component of the sf "crs"
is used to permit packages to adapt, even though its contents are degraded.
b_pump_sf1 <- b_pump_sf
st_crs(b_pump_sf1) <- st_crs(st_crs(b_pump_sf1)$proj4string)
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
Why does this matter? For visualization on a web map, for example using the mapview package, the projected geometries are transformed to the same WGS84 ellipse and datum (OGC:CRS84
) that were used in PROJ 4 as a transformation hub. OGC:CRS84
is the visualization axis order equivalent of EPSG:4326
. In leaflet, these are projected to Web Mercator (EPSG:3856
). Inside mapview()
, the sf::st_transform()
function is used, so we will emulate this coordinate operation before handing on the geometries for display.
However, because the one of the objects now has a degraded Proj4 string representation of its coordinate reference system, the output points, apparently transformed identically to WGS84, are now some distance apart:
Sys.setenv("PROJ_NETWORK"="OFF")
sf_proj_network(FALSE)
## character(0)
list.files(sf_proj_search_paths()[1])
## character(0)
b_pump_sf_ll <- st_transform(b_pump_sf, "OGC:CRS84")
list.files(sf_proj_search_paths()[1])
## character(0)
b_pump_sf1_ll <- st_transform(b_pump_sf1, "OGC:CRS84")
list.files(sf_proj_search_paths()[1])
## character(0)
sf_use_s2(FALSE)
## Spherical geometry (s2) switched off
st_distance(b_pump_sf_ll, b_pump_sf1_ll)
## Units: [m]
## [,1]
## [1,] 125.0578
The Broad Street pump is within 1m or 2m of the green point (relative accuracy preserved) but the red point is now in Ingestre Place, because of the loss of the datum specification.
library(mapview)
if (sf:::CPL_gdal_version() >= "3.1.0") mapviewOptions(fgb = FALSE)
pts <- rbind(b_pump_sf_ll, b_pump_sf1_ll)
pts$CRS <- c("original", "degraded")
mapview(pts, zcol="CRS", map.type="OpenStreetMap", col.regions=c("green", "red"), cex=18)
Once PROJ 6 and GDAL 3 had stabilized in the summer of 2019, we identified the underlying threat as lying in the advertised degradation of GDAL’s exportToProj4()
function.
When reading raster and vector files, the coordinate reference system representation using Proj4 strings would often be degraded, so that further transformation within R (also using GDAL/PROJ functionality) would be at risk of much greater inaccuracy than before.
Since then, sf, sp with rgdal and raster have adopted the 2019 version of the “Well-Known Text” coordinate reference system representation WKT2-2019 (ISO 2004) instead of Proj4 strings to contain coordinate reference system definitions.
Accommodations have also been provided so that the S3 class "crs"
objects used in objects defined in sf, and the formal S4 class "CRS"
objects used objects defined in sp and raster, can continue to attempt to support Proj4 strings in addition, while other package maintainers and workflow users catch up.
Following an extended campaign of repeated checking about 900 reverse dependencies (packages depending on sp, rgdal and others) and dozens of github issues, most of the consequences of the switch to WKT2 among packages have now been addressed.
More recently (late August 2020), 115 packages are being offered rebuilt stored objects that had included "CRS"
objects without WKT2 definitions.
This approach has ensured that spatial objects, whether created within R, read in from external data sources, or read as stored objects, all have WKT2 string representations of their coordinate reference systems, and for backward compatibility can represent these in addition as Proj4 strings.
Operations on objects should carry forward the new representations, which should also be written out to external data formats correctly.
There is a minor divergence between sf and sp (and thus rgdal): in sf, the axis order of the CRS is preserved as instantiated, but objects do not have their axes swapped to accord with authorities unless sf::st_axis_order()
is set `TRUE´. This can appear odd, because although the representation records a northings-eastings axis order, data are treated as eastings-northings in plotting, variogram construction and so on:
st_crs("EPSG:4326")
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
In sp/rgdal, attempts are made to ensure that axis order is in the form termed GIS, traditional, or visualization, that is always eastings-northings. From very recent rgdal commits (from rev. 1060), PROJ rather than GDAL is used for instantiating CRS:
library(sp)
cat(wkt(CRS("EPSG:4326")), "\n")
## GEOGCRS["WGS 84 (with axis order normalized for visualization)",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)",
## ID["EPSG",1166]],
## MEMBER["World Geodetic System 1984 (G730)",
## ID["EPSG",1152]],
## MEMBER["World Geodetic System 1984 (G873)",
## ID["EPSG",1153]],
## MEMBER["World Geodetic System 1984 (G1150)",
## ID["EPSG",1154]],
## MEMBER["World Geodetic System 1984 (G1674)",
## ID["EPSG",1155]],
## MEMBER["World Geodetic System 1984 (G1762)",
## ID["EPSG",1156]],
## MEMBER["World Geodetic System 1984 (G2139)",
## ID["EPSG",1309]],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1],
## ID["EPSG",7030]],
## ENSEMBLEACCURACY[2.0],
## ID["EPSG",6326]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8901]],
## CS[ellipsoidal,2],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433,
## ID["EPSG",9122]]],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433,
## ID["EPSG",9122]]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## REMARK["Axis order reversed compared to EPSG:4326"]]
The probability of confusion increases when coercing from sf to sp and vice-versa, with the representations most often remaining unchanged.
sf_from_sp <- st_crs(CRS("EPSG:4326"))
o <- strsplit(sf_from_sp$wkt, "\n")[[1]]
cat(paste(o[grep("CS|AXIS|ORDER", o)], collapse="\n"))
## CS[ellipsoidal,2],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[1],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[2],
Both of these coercions are using the same underlying PROJ and GDAL versions, and the same PROJ metadata. Work on this question has not yet stabilized; we perhaps prefer all data to be GIS-order, but to be able to read/write from and to authority order.
sp_from_sf <- as(st_crs("EPSG:4326"), "CRS")
o <- strsplit(wkt(sp_from_sf), "\n")[[1]]
cat(paste(o[grep("CS|AXIS|ORDER", o)], collapse="\n"))
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
Current thinking is to avoid the EPSG:4326
axis order issue by recommending use of OGC:CRS84
, which is the representation used in GeoJSON, and also known as urn:ogc:def:crs:OGC::CRS84
. This specification is always eastings-northings:
cat(st_crs("OGC:CRS:84")$wkt, "\n")
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ID["EPSG",6326]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8901]],
## CS[ellipsoidal,2],
## AXIS["longitude",east,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433,
## ID["EPSG",9122]]],
## AXIS["latitude",north,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433,
## ID["EPSG",9122]]]]
cat(wkt(CRS("OGC:CRS84")), "\n")
## GEOGCRS["WGS 84 (CRS84)",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Not known."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["OGC","CRS84"]]
Transformation in sf uses code in GDAL, which in turn uses functions in PROJ; in sp/rgdal, PROJ is used directly for transformation. In order to demonstrate more of what is happening, sf_proj_pipelines()
has been added to sf:
(o <- sf_proj_pipelines(st_crs(b_pump_sf), "OGC:CRS84"))
## Candidate coordinate operations found: 9
## Strict containment: FALSE
## Axis order auth compl: FALSE
## Source: PROJCRS["Transverse_Mercator",
## BASEGEOGCRS["GCS_OSGB 1936",
## DATUM["OSGB_1936",
## ELLIPSOID["Airy 1830",6377563.396,299.3249646,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["Degree",0.0174532925199433]],
## ID["EPSG",4277]],
## CONVERSION["unnamed",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",49,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-2,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.999601,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",400000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",-100000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["easting",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["northing",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
## Target: OGC:CRS84
## Best instantiable operation has accuracy: 2 m
## Description: Inverse of unnamed + OSGB36 to WGS 84 (6) + axis order change
## (2D)
## Definition: +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
## +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy
## +step +proj=push +v_3 +step +proj=cart +ellps=airy
## +step +proj=helmert +x=446.448 +y=-125.157
## +z=542.06 +rx=0.15 +ry=0.247 +rz=0.842 +s=-20.489
## +convention=position_vector +step +inv +proj=cart
## +ellps=WGS84 +step +proj=pop +v_3 +step
## +proj=unitconvert +xy_in=rad +xy_out=deg
## Operation 8 is lacking 1 grid with accuracy 1 m
## Missing grid: uk_os_OSTN15_NTv2_OSGBtoETRS.tif
## URL: https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif
as.data.frame(o)[,c(5,8)]
## accuracy instantiable
## 1 2 TRUE
## 2 21 TRUE
## 3 10 TRUE
## 4 21 TRUE
## 5 18 TRUE
## 6 35 TRUE
## 7 3 TRUE
## 8 1 FALSE
## 9 NA TRUE
In sf, areas-of-interest need to be given by the users. The provision of areas-of-interest is intended to reduce the number of candidate coordinate operations found by PROJ.
aoi0 <- sf_project(st_crs(b_pump_sf), "OGC:CRS84", matrix(unclass(st_bbox(b_pump_sf)), 2, 2, byrow=TRUE))
aoi <- c(t(aoi0 + c(-0.1, +0.1)))
(o_aoi <- sf_proj_pipelines(st_crs(b_pump_sf), "OGC:CRS84", AOI=aoi))
## Candidate coordinate operations found: 5
## Strict containment: FALSE
## Axis order auth compl: FALSE
## Source: PROJCRS["Transverse_Mercator",
## BASEGEOGCRS["GCS_OSGB 1936",
## DATUM["OSGB_1936",
## ELLIPSOID["Airy 1830",6377563.396,299.3249646,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["Degree",0.0174532925199433]],
## ID["EPSG",4277]],
## CONVERSION["unnamed",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",49,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-2,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.999601,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",400000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",-100000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["easting",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["northing",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
## Target: OGC:CRS84
## Best instantiable operation has accuracy: 2 m
## Description: Inverse of unnamed + OSGB36 to WGS 84 (6) + axis order change
## (2D)
## Definition: +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
## +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy
## +step +proj=push +v_3 +step +proj=cart +ellps=airy
## +step +proj=helmert +x=446.448 +y=-125.157
## +z=542.06 +rx=0.15 +ry=0.247 +rz=0.842 +s=-20.489
## +convention=position_vector +step +inv +proj=cart
## +ellps=WGS84 +step +proj=pop +v_3 +step
## +proj=unitconvert +xy_in=rad +xy_out=deg
## Operation 5 is lacking 1 grid with accuracy 1 m
## Missing grid: uk_os_OSTN15_NTv2_OSGBtoETRS.tif
## URL: https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif
as.data.frame(o_aoi)[,c(5,8)]
## accuracy instantiable
## 1 2 TRUE
## 2 10 TRUE
## 3 21 TRUE
## 4 21 TRUE
## 5 1 FALSE
sf::sf_proj_pipelines()
accesses the PROJ metadata database to search through candidate coordinate operations, ranking them by accuracy, returning a data frame of operations. When an area-of-interest is provided, candidates not intersecting it are dropped. Coordinate operations that cannot be instantiated because of missing grids are also listed. Here without an area-of-interest: 9 candidate operations are found when the WKT string contains datum information. Of these, 8 may be instantiated, with 1 needing a grid. 4 operations cease to be candidates if we use an area-of-interest.
In sp/rgdal, the coordinate operation last used is returned, and can be retrieved using rgdal::get_last_coordOp()
; coordinate operations are represented as pipelines (Knudsen and Evers 2017; Evers and Knudsen 2017), introduced in PROJ 5 and using the PROJ key-value pair notation:
(helm <- as.data.frame(o_aoi)[o_aoi$accuracy==2, "definition"])
## [1] "+proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +step +proj=push +v_3 +step +proj=cart +ellps=airy +step +proj=helmert +x=446.448 +y=-125.157 +z=542.06 +rx=0.15 +ry=0.247 +rz=0.842 +s=-20.489 +convention=position_vector +step +inv +proj=cart +ellps=WGS84 +step +proj=pop +v_3 +step +proj=unitconvert +xy_in=rad +xy_out=deg"
b_pump_sf_ll <- st_transform(b_pump_sf, "OGC:CRS84", pipe=helm, aoi=aoi)
Here we can see that an inverse projection from the specified Transverse Mercator projection is made to geographical coordinates, followed by a seven-parameter Helmert transformation to WGS84 ellipsoid and datum. The parameters are contained in the best instantiable coordinate operation retrieved from the PROJ database. The +push +v_3
and +pop +v_3
operations are used when only horizontal components are needed in the Helmert transformation.
(o_aoi1 <- sf_proj_pipelines(st_crs(b_pump_sf1), "OGC:CRS84", AOI=aoi))
## Candidate coordinate operations found: 1
## Strict containment: FALSE
## Axis order auth compl: FALSE
## Source: PROJCRS["unknown",
## BASEGEOGCRS["unknown",
## DATUM["Unknown based on Airy 1830 ellipsoid",
## ELLIPSOID["Airy 1830",6377563.396,299.3249646,
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8901]]],
## CONVERSION["unknown",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",49,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-2,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.999601,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",400000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",-100000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
## Target: OGC:CRS84
## Best instantiable operation has only ballpark accuracy
## Description: Inverse of unknown + Ballpark geographic offset from unknown to
## WGS 84 (CRS84)
## Definition: +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
## +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy
## +step +proj=unitconvert +xy_in=rad +xy_out=deg
b_pump_sf1_ll <- st_transform(b_pump_sf1, "OGC:CRS84")
st_distance(b_pump_sf_ll, b_pump_sf1_ll)
## Units: [m]
## [,1]
## [1,] 125.0578
Going on to the case of the degraded representation, only 1 operation is found, with only ballpark accuracy. With our emulation of the dropping of +datum=
support in GDAL’s exportToProj4()
, we see that the coordinate operation pipeline only contains the inverse projection step, accounting for the observed shift of the Broad Street pump to Ingestre Place.
Finally, sp/rgdal may use the provision of on-demand downloading of transformation grids to provide more accuracy (CDN, from PROJ 7, https://cdn.proj.org ). Before finding and choosing to use a coordinate operation using an on-demand downloaded grid, the designated directory is empty:
sf_proj_network()
## [1] FALSE
Sys.setenv("PROJ_NETWORK"="ON")
sf_proj_network(TRUE)
## [1] "https://cdn.proj.org"
Using the CDN, all the candidate operations are instantiable, and the pipeline now shows a horizontal grid transformation rather than a Helmert transformation.
(og <- sf_proj_pipelines(st_crs(b_pump_sf), "OGC:CRS84", AOI=aoi))
## Candidate coordinate operations found: 5
## Strict containment: FALSE
## Axis order auth compl: FALSE
## Source: PROJCRS["Transverse_Mercator",
## BASEGEOGCRS["GCS_OSGB 1936",
## DATUM["OSGB_1936",
## ELLIPSOID["Airy 1830",6377563.396,299.3249646,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["Degree",0.0174532925199433]],
## ID["EPSG",4277]],
## CONVERSION["unnamed",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",49,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-2,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.999601,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",400000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",-100000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["easting",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["northing",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
## Target: OGC:CRS84
## Best instantiable operation has accuracy: 1 m
## Description: Inverse of unnamed + OSGB36 to WGS 84 (9) + axis order change
## (2D)
## Definition: +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
## +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy
## +step +proj=hgridshift
## +grids=uk_os_OSTN15_NTv2_OSGBtoETRS.tif +step
## +proj=unitconvert +xy_in=rad +xy_out=deg
list.files(sf_proj_search_paths()[1])
## character(0)
b_pump_sf_llgd <- st_transform(b_pump_sf, "OGC:CRS84")
list.files(sf_proj_search_paths()[1])
## character(0)
library(rgdal)
## Please note that rgdal will be retired by the end of 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
##
## rgdal: version: 1.5-27, (SVN revision 1155)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.4.0, released 2021/11/03
## Path to GDAL shared files: /usr/local/share/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 8.2.0, November 1st, 2021, [PJ_VERSION: 820]
## Path to PROJ shared files: /tmp/RtmpNeLEVm/file26d0a951b362fb:/usr/local/share/proj:/usr/local/share/proj
## PROJ CDN enabled: TRUE
## Linking to sp version:1.4-6
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
is_proj_CDN_enabled()
## [1] TRUE
b_pump_sp_llgd <- spTransform(as(b_pump_sf, "Spatial"), "OGC:CRS84")
list.files(sf_proj_search_paths()[1])
## [1] "cache.db"
DB <- file.path(DB0[1], "cache.db")
db <- dbConnect(SQLite(), dbname=DB)
cat(strwrap(paste(dbListTables(db), collapse=", ")), sep="\n")
## chunk_data, chunks, downloaded_file_properties, linked_chunks,
## linked_chunks_head_tail, properties, sqlite_sequence
dbReadTable(db, "chunks")
## id url offset data_id
## 1 1 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 0 1
## 2 2 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2621440 2
## 3 3 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2637824 3
## 4 4 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2654208 4
## 5 5 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2670592 5
## 6 6 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2686976 6
## 7 7 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2703360 7
## 8 8 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2719744 8
## 9 9 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2736128 9
## 10 10 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2752512 10
## 11 11 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2768896 11
## 12 12 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2785280 12
## 13 13 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2801664 13
## 14 14 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2818048 14
## 15 15 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2834432 15
## 16 16 https://cdn.proj.org/uk_os_OSTN15_NTv2_OSGBtoETRS.tif 2850816 16
## data_size
## 1 16384
## 2 16384
## 3 16384
## 4 16384
## 5 16384
## 6 16384
## 7 16384
## 8 16384
## 9 16384
## 10 16384
## 11 16384
## 12 16384
## 13 16384
## 14 16384
## 15 16384
## 16 16384
dbDisconnect(db)
b_pump_sf_llgdsp <- st_as_sf(b_pump_sp_llgd)
Now the downloaded grid is cached in the database in the designated CDN directory, and may be used for other transformations using the same operation.
Sys.setenv("PROJ_NETWORK"="OFF")
sf_proj_network(FALSE)
## character(0)
c(st_distance(b_pump_sf_ll, b_pump_sf1_ll), st_distance(b_pump_sf_ll, b_pump_sf_llgd), st_distance(b_pump_sf_ll, b_pump_sf_llgdsp))
## Units: [m]
## [1] 125.057812 0.000000 1.751479
Once again, the distance between the point transformed from the sf object as read from file and the point with the degraded coordinate reference system emulating the effect of the change in behaviour of GDAL’s exportToProj4()
in GDAL 6 and later is about 125m. Using the CDN shifts the output point by 1.7m, here using both the new s2 and legacy interfaces for measurements in sf using geographical coordinates (Dunnington, Pebesma, and Rubak 2021):
sf_use_s2(TRUE)
## Spherical geometry (s2) switched on
c(st_distance(b_pump_sf_ll, b_pump_sf1_ll), st_distance(b_pump_sf_ll, b_pump_sf_llgd), st_distance(b_pump_sf_ll, b_pump_sf_llgdsp))
## Units: [m]
## [1] 124.728570 0.000000 1.745973
sf_use_s2(FALSE)
## Spherical geometry (s2) switched off
To be sure that sf uses the CDN, it appears that you need to set Sys.setenv("PROJ_NETWORK"="ON")
before sf is loaded and attached. This does not apply to rgdal, but possibly reflects the fact that rgdal uses PROJ directly, while sf::st_transform()
uses PROJ through GDAL (and sf::sf_proj_pipelines()
uses PROJ directly).
Although it appears that most of the consequences of the change in representation of coordinate reference systems from Proj4 to WKT2 strings have now been addressed, we still see signs on the mailing list and on Twitter that users, naturally directing their attention to their analytical or visualization work, may still be confused.
The extent of the spatial cluster of R packages is so great that it will undoubtedly take time before the dust settles.
However, we trust that the operation of upgrading representations is now largely complete.
Multiple warnings issued in sp workflows, now noisily drawing attention to possible degradations in workflows, will by default be muted when sp 1.5 and rgdal 1.6 are released.
Reading the saved "sf"
object, we can see that we have 71,353 observations, as in the article we are replicating:
df_tracts <- st_read("df_tracts.gpkg")
## Reading layer `df_tracts' from data source
## `/home/rsb/und/ecs530/2021/df_tracts.gpkg' using driver `GPKG'
## Simple feature collection with 71353 features and 28 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7332 ymin: 24.5447 xmax: -66.9499 ymax: 49.38436
## Geodetic CRS: NAD83
Visualising that many objects is possible, but reducing the area of interest to the Chicago Metropolitan Area, as the source article does, makes plotting quicker:
chicago_MA <- read.table("Chicago_MA.txt", colClasses=c("character", "character"))
chicago_MA_tracts <- !is.na(match(substring(df_tracts$GEOID, 1, 5), chicago_MA$V2))
sum(chicago_MA_tracts)
## [1] 2195
We’ll use the tmap package with class interval styles from the classInt package:
library(tmap)
## Registered S3 methods overwritten by 'stars':
## method from
## st_bbox.SpatRaster sf
## st_bbox.SpatVector sf
## st_crs.SpatRaster sf
## st_crs.SpatVector sf
tm_shape(df_tracts[chicago_MA_tracts,]) + tm_fill("med_inc_cv", style="fisher", n=7, title="Coefficient of Variation")
In ESRI documentation, CV thresholds of 12 and 40 percent are proposed for the transformed reported MOE values: https://doc.arcgis.com/en/esri-demographics/data/acs.htm. We’ll create a classified variable (ordered factor):
df_tracts$mi_cv_esri <- cut(df_tracts$med_inc_cv, c(0, 0.12, 0.40, Inf), labels=c("High", "Medium", "Low"), right=TRUE, include.lowest=TRUE, ordered_result=TRUE)
table(df_tracts$mi_cv_esri)
##
## High Medium Low
## 48302 22610 441
and map it:
tm_shape(df_tracts[chicago_MA_tracts,]) + tm_fill("mi_cv_esri", title="Reliability")
As the Low reliability tracts are small in size, the "view"
mode for interactive mapping may help:
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(df_tracts[chicago_MA_tracts,]) + tm_fill("mi_cv_esri", title="Reliability")
tmap_mode("plot")
## tmap mode set to plotting
Or equivalently using mapview, plotting the CV values rather than the ordered factor, which is not yet well-supported:
library(mapview)
mapviewOptions(fgb = FALSE)
mapview(df_tracts[chicago_MA_tracts,"med_inc_cv"], layer.name="Coefficient of Variation")
classInt provides the key class interval determination for thematic mapping of continuous variables. The classIntervals()
function takes a numeric vector (now also of classes POSIXt or units), a target number of intervals, and a style of class interval. Other arguments control the closure and precision of the intervals found.
library(classInt)
args(classIntervals)
## function (var, n, style = "quantile", rtimes = 3, ..., intervalClosure = c("left",
## "right"), dataPrecision = NULL, warnSmallN = TRUE, warnLargeN = TRUE,
## largeN = 3000L, samp_prop = 0.1, gr = c("[", "]"))
## NULL
Lapa et al. (2001) (Leprosy surveillance in Olinda, Brazil, using spatial analysis techniques) made available the underlying data set of Olinda census tracts (setor) in the Corrego Alegre 1970-72 / UTM zone 25S projection (EPSG:22525). Marilia Sá Carvalho and I wrote a tutorial in 2003/4 based on this data set; there is more information in the tutorial.
We’ll find 7 intervals using Fisher natural breaks for the deprivation variable:
library(sf)
olinda_sirgas2000 <- st_read("olinda_sirgas2000.gpkg")
## Reading layer `olinda_sirgas2000' from data source
## `/home/rsb/und/ecs530/2021/olinda_sirgas2000.gpkg' using driver `GPKG'
## Simple feature collection with 243 features and 10 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 288984.5 ymin: 9111038 xmax: 298474.8 ymax: 9120522
## Projected CRS: SIRGAS 2000 / UTM zone 25S
(cI <- classIntervals(olinda_sirgas2000$DEPRIV, n=7, style="fisher"))
## style: fisher
## [0,0.1215) [0.1215,0.244) [0.244,0.339) [0.339,0.439) [0.439,0.5435)
## 37 53 37 26 37
## [0.5435,0.6695) [0.6695,0.907]
## 33 20
Sys.setenv("PROJ_NETWORK"="OFF")
sf_proj_network()
## [1] FALSE
list.files(sf_proj_search_paths()[1])
## [1] "cache.db"
sf_extSoftVersion()
## GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
## "3.10.1" "3.4.0" "8.2.0" "true" "true"
## PROJ
## "8.2.0"
We also need to assign a palette of graphical values, most often colours, to use to fill the intervals, and can inspect the intervals and fill colours with a plot method:
The RColorBrewer package gives by permission access to the ColorBrewer palettes accesible from the ColorBrewer website. Note that ColorBrewer limits the number of classes tightly, only 3–9 sequential classes
library(RColorBrewer)
pal <- RColorBrewer::brewer.pal((length(cI$brks)-1), "Reds")
plot(cI, pal)
We can also display all the ColorBrewer palettes:
display.brewer.all()
Try exploring alternative class interval definitions and palettes, maybe also visiting http://hclwizard.org/ and its hclwizard()
Shiny app, returning a palette generating function on clicking the “Return to R” button:
library(colorspace)
hcl_palettes("sequential (single-hue)", n = 7, plot = TRUE)
pal <- hclwizard()
pal(6)
The end of rainbow discussion is informative:
wheel <- function(col, radius = 1, ...)
pie(rep(1, length(col)), col = col, radius = radius, ...)
opar <- par(mfrow=c(1,2))
wheel(rainbow_hcl(12))
wheel(rainbow(12))
par(opar)
See recent R blog.
See also treatments in Fundamentals of Data Visualization.
The sp package provided base graphics plot and image methods. sf provides plot methods using base graphics; the method for "sf"
objects re-arranges the plot window to provide a colour key, so extra steps are needed if overplotting is needed:
plot(olinda_sirgas2000[,"DEPRIV"], breaks=cI$brks, pal=pal)
(returns current par()
settings); the method also supports direct use of classInt:
plot(olinda_sirgas2000[,"DEPRIV"], nbreaks=7, breaks="fisher", pal=pal)
Earlier we used the plot method for "sfc"
objects which does not manipulate the graphics device, and is easier for overplotting.
tmap: Thematic maps show spatial distributions. The theme refers to the phenomena that is shown, which is often demographical, social, cultural, or economic. The best known thematic map type is the choropleth, in which regions are colored according to the distribution of a data variable. The R package tmap offers a coherent plotting system for thematic maps that is based on the layered grammar of graphics. Thematic maps are created by stacking layers, where per layer, data can be mapped to one or more aesthetics. It is also possible to generate small multiples. Thematic maps can be further embellished by configuring the map layout and by adding map attributes, such as a scale bar and a compass. Besides plotting thematic maps on the graphics device, they can also be made interactive as an HTML widget. In addition, the R package tmaptools contains several convenient functions for reading and processing spatial data. See (Tennekes 2018) and Chapter 8 in (Lovelace, Nowosad, and Muenchow 2019).
The tmap package provides cartographically informed, grammar of graphics (gg) based functionality now, like ggplot2 using grid graphics. John McIntosh tried with ggplot2, with quite nice results. I suggested he look at tmap, and things got better, because tmap can switch between interactive and static viewing. tmap also provides direct access to classInt class intervals.
library(tmap)
tmap_mode("plot")
## tmap mode set to plotting
o <- tm_shape(olinda_sirgas2000) + tm_fill("DEPRIV", style="fisher", n=7, palette="Reds")
class(o)
## [1] "tmap"
returns a "tmap"
object, a grid GROB (graphics object), with print methods.
o
Since the objects are GROBs, they can be updated, as in lattice with latticeExtra or ggplot2:
o + tm_borders(alpha=0.5, lwd=0.5)
There is also a Shiny tool for exploring palettes:
tmaptools::palette_explorer()
cartography helped to design cartographic representations such as proportional symbols, choropleth, typology, flows or discontinuities maps. It also offers several features that improve the graphic presentation of maps, for instance, map palettes, layout elements (scale, north arrow, title…), labels or legends. (Giraud and Lambert 2016, 2017), http://riatelab.github.io/cartography/vignettes/cheatsheet/cartography_cheatsheet.pdf. The package is associated with rosm: Download and plot Open Street Map http://www.openstreetmap.org/, Bing Maps http://www.bing.com/maps and other tiled map sources. Use to create basemaps quickly and add hillshade to vector-based maps. https://cran.r-project.org/web/packages/rosm/vignettes/rosm.html. mapsf has followed up on suggestions to improve cartography, using recent changes in base R colour handling: https://developer.r-project.org/Blog/public/2019/11/21/a-new-palette-for-r/index.html.
The package organizes extra palettes:
library(mapsf)
cols <- mf_get_pal(n = c(5, 5), pal = c("Reds 2", "Greens"))
plot(1:10, rep(1, 10), bg = cols, pch = 22, cex = 4, axes=FALSE, ann=FALSE)
The plotting functions (not methods) use base graphics:
mf_choro(olinda_sirgas2000, var="DEPRIV", breaks="fisher", nbreaks=7, pal=pal, leg_val_rnd=3)
(returns NULL)
The ggplot2 package provides the geom_sf()
facility for mapping:
library(ggplot2)
g <- ggplot(olinda_sirgas2000) + geom_sf(aes(fill=DEPRIV))
g
It is possible to set a theme that drops the arguably unnecessary graticule:
g + theme_void()
g + theme_void() + scale_fill_distiller(palette="Reds", direction=1)
but there is a lot of jumping through hoops to get a simple map. To get proper class intervals involves even more work, because ggplot2 takes specific, not general, positions on how graphics are observed. ColorBrewer eschews continuous colour scales based on cognitive research, but ggplot2 enforces them for continuous variables (similarly for graticules, which may make sense for data plots but not for maps).
Using tmap_mode()
, we can switch between presentation ("plot"
) and interactive ("view"
) plotting:
tmap_mode("view")
## tmap mode set to interactive viewing
o + tm_borders(alpha=0.5, lwd=0.5)
tmap_mode("plot")
## tmap mode set to plotting
mapview: Quickly and conveniently create interactive visualisations of spatial data with or without background maps. Attributes of displayed features are fully queryable via pop-up windows. Additional functionality includes methods to visualise true- and false-color raster images, bounding boxes, small multiples and 3D raster data cubes. It uses leaflet and other HTML packages.
library(mapview)
if (sf:::CPL_gdal_version() >= "3.1.0") mapviewOptions(fgb = FALSE)
mapview(olinda_sirgas2000, zcol="DEPRIV", col.regions=pal, at=cI$brks)
## Warning: Found less unique colors (7) than unique zcol values (8)!
## Interpolating color vector to match number of zcol values.
data("pol_pres15", package = "spDataLarge")
pol_pres15 <- st_buffer(pol_pres15, dist=0)
library(tmap)
o <- tm_shape(pol_pres15) + tm_facets(free.scales=FALSE) + tm_borders(lwd=0.5, alpha=0.4) + tm_layout(panel.labels=c("Duda", "Komorowski"))
o + tm_fill(c("I_Duda_share", "I_Komorowski_share"), n=6, style="pretty", title="I round\nshare of votes")
o + tm_fill(c("II_Duda_share", "II_Komorowski_share"), n=6, style="pretty", title="II round\nshare of votes")
sessionInfo()
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Fedora 34 (Workstation Edition)
##
## Matrix products: default
## BLAS: /home/rsb/topics/R/R412-share/lib64/R/lib/libRblas.so
## LAPACK: /home/rsb/topics/R/R412-share/lib64/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.3.5 mapsf_0.3.0 colorspace_2.0-2 RColorBrewer_1.1-2
## [5] classInt_0.4-3 tmap_3.3-2 rgdal_1.5-27 mapview_2.10.0
## [9] sp_1.4-6 RSQLite_2.2.8 sf_1.0-4
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.0 viridisLite_0.4.0 bit64_4.0.5
## [4] jsonlite_1.7.2 bslib_0.3.1 assertthat_0.2.1
## [7] highr_0.9 stats4_4.1.2 blob_1.2.2
## [10] yaml_2.2.1 pillar_1.6.4 lattice_0.20-45
## [13] glue_1.4.2 uuid_1.0-3 digest_0.6.28
## [16] leaflet.providers_1.9.0 htmltools_0.5.2 XML_3.99-0.8
## [19] pkgconfig_2.0.3 raster_3.5-2 stars_0.5-4
## [22] s2_1.0.7 purrr_0.3.4 scales_1.1.1
## [25] webshot_0.5.2 brew_1.0-6 svglite_2.0.0
## [28] terra_1.4-16 satellite_1.0.4 tibble_3.1.5
## [31] proxy_0.4-26 generics_0.1.1 farver_2.1.0
## [34] ellipsis_0.3.2 withr_2.4.2 cachem_1.0.6
## [37] leafsync_0.1.0 magrittr_2.0.1 crayon_1.4.2
## [40] memoise_2.0.0 evaluate_0.14 fansi_0.5.0
## [43] lwgeom_0.2-8 class_7.3-19 tools_4.1.2
## [46] lifecycle_1.0.1 stringr_1.4.0 munsell_0.5.0
## [49] compiler_4.1.2 jquerylib_0.1.4 e1071_1.7-9
## [52] systemfonts_1.0.3 rlang_0.4.12 tmaptools_3.1-1
## [55] units_0.7-2 grid_4.1.2 leafpop_0.1.0
## [58] dichromat_2.0-0 htmlwidgets_1.5.4 crosstalk_1.1.1
## [61] labeling_0.4.2 leafem_0.1.6 base64enc_0.1-3
## [64] rmarkdown_2.11 wk_0.5.0 gtable_0.3.0
## [67] codetools_0.2-18 abind_1.4-5 DBI_1.1.1
## [70] R6_2.5.1 knitr_1.36 dplyr_1.0.7
## [73] fastmap_1.1.0 bit_4.0.4 utf8_1.2.2
## [76] KernSmooth_2.23-20 stringi_1.7.5 parallel_4.1.2
## [79] Rcpp_1.0.7 vctrs_0.3.8 png_0.1-7
## [82] leaflet_2.0.4.1 tidyselect_1.1.1 xfun_0.27