Sys.setenv("PROJ_NETWORK"="ON")
library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
In section Section 1.1, we showed why spatial data should be treated as “special” until it is shown that they do not require the application of specific methods to handle combinations of spatial autocorrelation, spatial scale and spatial heterogeneity. We did not there develop the concepts and tools needed for understanding and handling spatial data before analysis, or the presentation of results from spatial econometric models in spatial form. This treatment builds on Pebesma and Bivand (2023) chapters 1–6 and the R packages used there, but will also to some extent indicate where that software usage is matched in Python and elsewhere, for example Rey, Arribas-Bel, and Wolf (2023); for a comparative description, see Bivand (2022). If the reader is unfamiliar with data of this kind, they may benefit from Lovelace, Nowosad, and Muenchow (2019).
Spatial data are observations on variables, be they categorical, ordinal or continuous. The observations are associated with a position in space, where space is often represented as (sets of) coordinate tuplets on a map; spatial data are also found in bioinformatics and medical imaging, but here we are concerned with geographical spatial data.
Geographical spatial data take two forms: spatial vectors and spatial rasters. Spatial vector objects are most often represented by geometries of sets of pairs of coordinates known as eastings and northings, or longitude and latitude, forming points, lines or polygons; three-dimensional spatial vectors are not often used but may be provided by data sources. Spatial raster objects are regular square or rectangular grids of cells, most like digital images. Spatial rasters are fixed to a spatial point of origin and the locations of cells can be found from the point of origin by stepping out from the origin cell and summing the distances crossing intervening calls measured by the size or resolution of the cells. Rasters have one or more bands, and may also be termed spatial data cubes, structured by two or three spatial dimensions (three if height/depth is included) and a time dimension in addition to bands containing sensor measurements.
While spatial raster objects include attribute data - observations on variables taken at the location of each cell, spatial vector objects do not have to do so. Observations on variables of interest are linked to geometries through a common key or identifier. The key linking the observations with the spatial vector geometries may often be a standard code published and updated by a government statistical office. Not infrequently, the data are acquired or downloaded including the key, and the geometries of the spatial objects are acquired or downloaded separately, to be merged by the observation keys afterwards. Noting the timestamp specifying the validity of geometries and keys is always sensible, as both keys and geometries change over time. We will return to the representation of time in spatial panel objects, where the geometries are invariant but variables of interest are observed over successive time periods.
For the present, spatial data will be taken as-is; we will return to reading and writing spatial data from and to files in Section Section 3.3. The comprehensive R archive network (CRAN) provides software packages and short summaries called task views listing packages useful for particular tasks. The Spatial task view (https://cran.r-project.org/view=Spatial) lists packages accessing or providing specific data sources of interest. Here we will use three such packages: chilemapas
with administrative boundaries for Chile, and rgugik
for Poland, elevatr
for spatial raster elevation data, and osmdata
for OpenStreetMap information. The representation of the spatial vector data is handled by the sf
package:
Sys.setenv("PROJ_NETWORK"="ON")
library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(chilemapas)
|> data.frame() |>
mapa_comunas st_as_sf(sf_column_name="geometry") -> mc_sf
mc_sf
Simple feature collection with 345 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -109.4499 ymin: -56.52511 xmax: -66.41617 ymax: -17.49778
Geodetic CRS: SIRGAS 2000
First 10 features:
codigo_comuna codigo_provincia codigo_region geometry
1 01401 014 01 MULTIPOLYGON (((-68.86081 -...
2 01403 014 01 MULTIPOLYGON (((-68.65113 -...
3 01405 014 01 MULTIPOLYGON (((-68.65113 -...
4 01402 014 01 MULTIPOLYGON (((-69.31789 -...
5 01404 014 01 MULTIPOLYGON (((-69.39615 -...
6 01107 011 01 MULTIPOLYGON (((-70.1095 -2...
7 01101 011 01 MULTIPOLYGON (((-70.09894 -...
8 02104 021 02 MULTIPOLYGON (((-68.98863 -...
9 02101 021 02 MULTIPOLYGON (((-70.60654 -...
10 02201 022 02 MULTIPOLYGON (((-67.94302 -...
The mapa_comunas
data object stored in the chilemapas
package is represented as a tibble and an sf
object. Because most model fitting objects expect data as a data.frame
not a tbl
also known as a tibble, we coerce to data.frame
and re-construct the sf
object using the sf::st_as_sf
method for data.frame
objects. The disparity between data.frame
and tbl
objects is that when a single column of a data.frame
is selected with the [
method, it becomes a vector by default, but the [
method on a single column of a tbl
object returns a tbl
by default. The geometry
column is an sfc
object, a list column of sfg
objects, where sfg
means “Simple Feature” geometry.
st_geometry(mc_sf)
Geometry set for 345 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -109.4499 ymin: -56.52511 xmax: -66.41617 ymax: -17.49778
Geodetic CRS: SIRGAS 2000
First 5 geometries:
MULTIPOLYGON (((-68.86081 -21.28512, -68.92172 ...
MULTIPOLYGON (((-68.65113 -19.77188, -68.81182 ...
MULTIPOLYGON (((-68.65113 -19.77188, -68.63545 ...
MULTIPOLYGON (((-69.31789 -19.13651, -69.2717 -...
MULTIPOLYGON (((-69.39615 -19.06125, -69.40025 ...
The geometry column contains 345 multi-polygon objects, containing the boundaries of Chilean municipalities in 2017, clipped to the coastline. They are multi-polygons because at least some municipalities are composed of island and mainland parts. The short summary printed here also states that the geometries have a geodetic coordinate reference system (CRS), here specified as SIRGAS 2000. We return to coordinate reference systems in the next section Section 2.2, but note that area calculation is conducted on the specified ellipsoid using the s2
by default, not on the plane.
Let us find the area of Región del Maule, region "07"
, of which Taule is the capital city, using the region key column in the sf
object named mc_sf
. First we subset the municipalities to the desired region, then union the municipalities in Región del Maule before finding the area of the region as a whole.
|> subset(subset=codigo_region == "07") -> maule_sf
mc_sf |> st_union() |> st_area() |> units::set_units("km2") maule_sf
30297.84 [km^2]
The result agrees adequately with 30,296.1 square kilometres given by Wikipedia, as the details of administrative boundaries and hence the computed area are compromises between detail and the size of the data object stored in the package. By default the results of length or area calculations for spherical coordinates are returned by s2
in metres or square metres, so we convert to square kilometres.
sf
objects pay attention to the support of variables, whether they may be classed as constant, aggregate, or identity. Constant variables are constant across the whole area of a polygon, for example a land use category. Aggregate variables may be counts, rates based on counts, or densities of counts by polygon area. Identity variables uniquely identify the geometries. In the case of the Región del Maule data set, no definitions have so far been provided, so when we change support from polygon to point by computing the spherical centroid, a warning is given (see Pebesma and Bivand 2023, 49–52). st_agr
returns the current state of the factor (categorical object) for the non-geometry variables included in the sf
object:
st_agr(maule_sf)
codigo_comuna codigo_provincia codigo_region
<NA> <NA> <NA>
Levels: constant aggregate identity
|> st_centroid() -> maule_sf_pt1 maule_sf
Warning: st_centroid assumes attributes are constant over geometries
We can avoid the warning by taking the centroid of the geometries alone, of the sfc
object rather than the sf
object:
|> st_geometry() |> st_centroid() -> maule_sfc_pt2 maule_sf
Alternatively, since there are few non-geometry columns in the data set, we can assign the appropriate values to them, so that since the change of support does not change the meaning of the variables, no warning is given:
|>
maule_sf st_set_agr(c(codigo_comuna="identity",
codigo_provincia="identity",
codigo_region="constant")
-> maule_sf_agr
) st_agr(maule_sf_agr)
codigo_comuna codigo_provincia codigo_region
identity identity constant
Levels: constant aggregate identity
|> st_centroid() -> maule_sf_pt3 maule_sf_agr
The three remaining packages download spatial data from public sources rather than pre-packaging them for use. elevatr
can be used to download global elevation data as a spatial raster; until version 1 of the package is released, it returns a "RasterLayer"
as defined in the legacy raster
package, but from 1.0 will return a "SpatRaster"
defined in the terra
package. Here we specify our area of interest by passing the sf
spatial vector object for Región del Maule, and clip the output to the boundary of the region, setting negative values to missing:
library(elevatr)
elevatr v0.99.0 NOTE: Version 0.99.0 of 'elevatr' uses 'sf' and 'terra'. Use
of the 'sp', 'raster', and underlying 'rgdal' packages by 'elevatr' is being
deprecated; however, get_elev_raster continues to return a RasterLayer. This
will be dropped in future versions, so please plan accordingly.
library(terra)
terra 1.7.71
|> get_elev_raster(z = 7,
maule_sf clip = "locations", neg_to_na = TRUE) |>
rast() -> maule_elev
Mosaicing & Projecting
Clipping DEM to locations
Note: Elevation units are in meters.
maule_elev
class : SpatRaster
dimensions : 371, 493, 1 (nrow, ncol, nlyr)
resolution : 0.005009617, 0.005009617 (x, y)
extent : -72.78435, -70.3146, -36.54143, -34.68287 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
source(s) : memory
name : filee6328d7c555
min value : 0
max value : 3936
The raster has one layer (band), with a square (geodetic) grid with a resolution of about 0.005 degrees, 18.03 arc seconds in both dimensions, 371 rows and 493 columns.
Let us now access OpenStreetMap data using osmdata
(Padgham et al. 2017), first subsetting to the city of Talca, the capital of Región del Maule. To set up a query, we need to build an OpenStreetMap Overpass query using a bounding box of our area of interest. Since it can be difficult to know what kinds of data have been contributed, we add the "names"
feature, pulling in everything in the rectangle defined in degrees of longitude and latitude - obviously the top and bottom of the rectangle are actually curves, because the rectangle is on the surface of an ellipsoid. Having created the query talca_q
, we can extract the data for all features with names:
|> subset(subset = codigo_comuna=="07101") -> talca_sf
maule_sf library(osmdata)
Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
|> st_bbox() |> opq() |>
talca_sf add_osm_feature(key = "names") -> talca_q
|> osmdata_sf() -> talca_osm_sf talca_q
talca_osm_sf
Object of class 'osmdata' with:
$bbox : -35.5310255,-71.723806,-35.3118342,-71.4925696
$overpass_call : The call submitted to the overpass API
$meta : metadata including timestamp and version numbers
$osm_points : 'sf' Simple Features Collection with 134548 points
$osm_lines : 'sf' Simple Features Collection with 14580 linestrings
$osm_polygons : 'sf' Simple Features Collection with 1090 polygons
$osm_multilines : 'sf' Simple Features Collection with 20 multilinestrings
$osm_multipolygons : 'sf' Simple Features Collection with 22 multipolygons
The output object is a list of components, some of which are sf
objects. If we take the point objects, we can attempt to subset to those said to offer take-away service, that is those for which the takeaway
variable is not coded NA
- missing.
$osm_points -> talca_pts
talca_osm_sf|> subset(subset = !is.na(takeaway),
talca_pts select = c(osm_id, name, amenity, cuisine,
-> talca_takeaways
takeaway, geometry)) talca_takeaways
Simple feature collection with 11 features and 5 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -71.68909 ymin: -35.44643 xmax: -71.62727 ymax: -35.42225
Geodetic CRS: WGS 84
First 10 features:
osm_id name amenity cuisine takeaway
1237856768 1237856768 Rossini cafe sandwich yes
1264154772 1264154772 Telepizza fast_food pizza yes
2853291904 2853291904 La Cleta restaurant <NA> yes
2853679104 2853679104 3 Estrellas pub sandwich yes
2858522001 2858522001 Cafetería Riquíssimo restaurant regional yes
3377351170 3377351170 Me Gustó fast_food burger yes
3699062703 3699062703 Talca Sandwich fast_food burger yes
3766018631 3766018631 El Chilote restaurant fish_and_chips yes
7233542316 7233542316 McDonald's fast_food burger yes
7862158443 7862158443 Papa John's fast_food pizza yes
geometry
1237856768 POINT (-71.66275 -35.42618)
1264154772 POINT (-71.63046 -35.43272)
2853291904 POINT (-71.66563 -35.42866)
2853679104 POINT (-71.65372 -35.42822)
2858522001 POINT (-71.66378 -35.42624)
3377351170 POINT (-71.68909 -35.44643)
3699062703 POINT (-71.68062 -35.43947)
3766018631 POINT (-71.67028 -35.42225)
7233542316 POINT (-71.65486 -35.42561)
7862158443 POINT (-71.68665 -35.44302)
There only appear to be 11 such establishments, but perhaps these are the ones whose locations grateful clients (or ambitious proprietors) have contributed to OpenStreetMap.
The rgugik
package (Dyba and Nowosad 2021) does not bundle the municipality boundaries of Poland as chilemapas
did for Chile, but downloads them from the Polish Head Office of Geodesy and Cartography (GUGIK) online. The object returned from a query to the boundaries of the Gdańsk county administrative unit is an sf
object with a single variable, the identification key.
library(rgugik)
<- borders_get(county = "Gdańsk")
gd_sf gd_sf
Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 462961.1 ymin: 712372.4 xmax: 496776.7 ymax: 731607.8
Projected CRS: ETRF2000-PL / CS92
TERYT geom
1 2261 MULTIPOLYGON (((465078.9 73...
Again, we can use the sf
object to access elevation data, here returned with about 88 metre resolution, as the input object has a projected coordinate reference system:
|> get_elev_raster(z = 9,
gd_sf clip = "locations", neg_to_na = TRUE) |> rast() -> gd_elev
Mosaicing & Projecting
Clipping DEM to locations
Note: Elevation units are in meters.
gd_elev
class : SpatRaster
dimensions : 216, 379, 1 (nrow, ncol, nlyr)
resolution : 89.09542, 89.09542 (x, y)
extent : 462973.9, 496741.1, 712362.5, 731607.1 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=tmerc +lat_0=0 +lon_0=19 +k=0.9993 +x_0=500000 +y_0=-5300000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
source(s) : memory
name : filee63abcb0de
min value : 0
max value : 177
As we just observed, gd_sf
has a projected coordinate reference system, so needs to be transformed to a geodetic coordinate reference system before the creation of an Openpass query, here for "amenity"
features:
|> st_transform(crs = "OGC:CRS84") |> st_bbox() |>
gd_sf opq() |> add_osm_feature(key = "amenity") -> gd_q
|> osmdata_sf() -> gd_osm_sf gd_q
gd_osm_sf
Object of class 'osmdata' with:
$bbox : 54.274974489,18.429497095,54.447218154,18.9503794130001
$overpass_call : The call submitted to the overpass API
$meta : metadata including timestamp and version numbers
$osm_points : 'sf' Simple Features Collection with 64129 points
$osm_lines : 'sf' Simple Features Collection with 30 linestrings
$osm_polygons : 'sf' Simple Features Collection with 5946 polygons
$osm_multilines : 'sf' Simple Features Collection with 1 multilinestrings
$osm_multipolygons : 'sf' Simple Features Collection with 19 multipolygons
Once again we extract the locations of registered take-away outlets, of which many more seem to have been contributed to OpenStreetMap than in the case of the bounding box surrounding the city of Talca:
$osm_points -> gd_pts
gd_osm_sf|> subset(subset = !is.na(takeaway),
gd_pts select = c(osm_id, name, amenity, cuisine,
-> gd_takeaways
takeaway, geometry)) gd_takeaways
Simple feature collection with 160 features and 5 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 18.45562 ymin: 54.30241 xmax: 18.71603 ymax: 54.44475
Geodetic CRS: WGS 84
First 10 features:
osm_id name amenity cuisine
473433244 473433244 McDonald's fast_food burger
473433245 473433245 KFC fast_food chicken
478756186 478756186 McDonald's fast_food burger
746496055 746496055 Costa cafe coffee_shop
747182709 747182709 KFC fast_food chicken
1538112296 1538112296 Ping Pong ramen & sushi restaurant asian;ramen;sushi
1693970318 1693970318 KFC fast_food chicken
1763555507 1763555507 Zielona Papuga restaurant regional
1777255606 1777255606 Piwnica Rajców restaurant <NA>
1783311488 1783311488 Kreska Mąki fast_food pizza
takeaway geometry
473433244 yes POINT (18.6005 54.38275)
473433245 yes POINT (18.60085 54.3827)
478756186 yes POINT (18.58742 54.35775)
746496055 yes POINT (18.6487 54.34943)
747182709 yes POINT (18.6452 54.35604)
1538112296 yes POINT (18.5671 54.41832)
1693970318 yes POINT (18.57906 54.35757)
1763555507 no POINT (18.6172 54.37657)
1777255606 no POINT (18.65326 54.34866)
1783311488 yes POINT (18.58812 54.3596)
Since 2019, the representation of coordinate reference systems (CRS) have stabilised in the well-known text format known as WKT2-2019, which is an international standard. The representation supports both two and three dimensional structures, and as implemented in open geospatial software (PROJ) uses definitions stored in a regularly updated database bundled with the software. The database also contains tables defining transformations between CRS, now often based on open access transformation grid files downloaded on-the-fly if required. In parts of the world subject to rapid tectonic movement (the old city centre of Talca, Chile was badly damaged by an earthquake in 2010), CRS change frequently if more by centimetres than by metres.
st_crs(talca_sf)
Coordinate Reference System:
User input: EPSG:4674
wkt:
GEOGCRS["SIRGAS 2000",
DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["unknown"],
AREA["Latin America - SIRGAS 2000 by country"],
BBOX[-59.87,-122.19,32.72,-25.28]],
ID["EPSG",4674]]
The definition used for stored objects in the chilemapas
package is the geographical CRS in decimal degrees called "SIRGAS 2000"
based on the "GRS 1980"
ellipsoid - geodetic reference system of 1980. The provided data have as is often the case swapped axes compared to the CRS definition. Most geographical information systems keep to x, y, that is easting, northing or longitude, latitude axis order. Before version 1.0, elevatr
reverts to the legacy PROJ4 string representation seen in the summaries above.
osmdata
and OpenStreetMap appear to assume that all modern geographical coordinate reference systems use "EPSG:4326"
as is said to apply to consumer global positioning systems (GPS) devices. This is somewhat accurate, but there is no fixed definition. The definition is usually updated locally for each major tectonic plate, so the WKT2-2019 representation of the CRS used here is as a datum ensemble, with at best two metre accuracy, often worse, as we are noe fourty years on from where the tectonic plates were when "WGS 84
was agreed in 1984. Some surveyors have argued that all coordinate tuples should be timestamped, in order to permit the tracking of the movement of points across the earth’s surface and their vertical change as well.
st_crs(talca_takeaways)
Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
So while the "SIRGAS 2000"
object and the "WGS 84
refer approximately to the same representation, some sf
predicates and operations on geometries may see them as differing, as for example checking that the Talca take-aways found in the bounding box do lie within the municipality boundaries:
try(st_within(talca_takeaways, talca_sf))
Error in st_geos_binop("within", x, y, sparse = sparse, prepared = prepared, :
st_crs(x) == st_crs(y) is not TRUE
Since we know that both are based on the"GRS 1980"
ellipsoid, we can overwrite the CRS of the OpenStreetMap take-away outlets location output to bring it into agreement with "SIRGAS 2000"
, resulting in finding that all the outlets all lie within the municipality boundaries:
|> st_crs() -> sirgas2000
talca_sf |> st_set_crs(sirgas2000) |>
talca_takeaways st_within(talca_sf) |> unlist() |> as.logical()
Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
that
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Alves Costa et al. (2023) document current status for the development of SIRGAS, and helpfully show how an increase in ground GPS/GNSS (Global Navigation Satellite Systems from all providers) station density helps track terrestrial movement, which in parts of Chile exceeds 20mm per year. While the movement of the earth’s crust is one source of inaccuracy, another is measurement error, and yet another computational error from repeated use of trigonometrical functions and approximated transformation coefficients. A lack of clarity in defining CRS can lead to the wrong assignation of for example points to polygons.
The Universal Transverse Mercator projection (zone 19 south) is often used for central Chile, and this specific version uses the same "SIRGAS 2000"
datum as the input data objects from chilemapas
:
st_crs("EPSG:31979")
Coordinate Reference System:
User input: EPSG:31979
wkt:
PROJCRS["SIRGAS 2000 / UTM zone 19S",
BASEGEOGCRS["SIRGAS 2000",
DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4674]],
CONVERSION["UTM zone 19S",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-69,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",10000000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Brazil - between 72°W and 66°W, northern and southern hemispheres. In remainder of South America - between 72°W and 66°W, southern hemisphere, onshore and offshore."],
BBOX[-59.87,-72,2.15,-66]],
ID["EPSG",31979]]
If we check to see which candidate coordinate operations are available, we see that in fact projection accuracy is as good as possible, because the datum definitions of the two CRS are the same:
sf_proj_network(TRUE)
[1] "https://cdn.proj.org"
sf_proj_pipelines("EPSG:4674", "EPSG:31979")
Candidate coordinate operations found: 1
Strict containment: FALSE
Axis order auth compl: FALSE
Source: EPSG:4674
Target: EPSG:31979
Best instantiable operation has accuracy: 0 m
Description: axis order change (2D) + UTM zone 19S
Definition: +proj=pipeline +step +proj=unitconvert +xy_in=deg +xy_out=rad
+step +proj=utm +zone=19 +south +ellps=GRS80
If we replace the target CRS with "SIRGAS-Chile 2021 UTM 19S"
, sf_proj_pipelines
still only finds one candidate, but because no datum transformation from "SIRGAS 2000"
to "SIRGAS-Chile 2021"
is found in the database, we must accept unknown ballpark accuracy:
sf_proj_pipelines("EPSG:4674", "EPSG:20049")
Candidate coordinate operations found: 1
Strict containment: FALSE
Axis order auth compl: FALSE
Source: EPSG:4674
Target: EPSG:20049
Best instantiable operation has only ballpark accuracy
Description: axis order change (2D) + Ballpark geographic offset from SIRGAS
2000 to SIRGAS-Chile 2021 + UTM zone 19S
Definition: +proj=pipeline +step +proj=unitconvert +xy_in=deg +xy_out=rad
+step +proj=utm +zone=19 +south +ellps=GRS80
Many national and international mapping agencies have provided open access transformation grids that PROJ can use on-the-fly (https://cdn.proj.org/), but neither Chile not SIRGAS have contributed any such grid so far. Grids are more accurate than three or seven parameter transformations because they allow local crustal deformations to be applied, rather than averaged over the whole area of application. Accuracy is now much more important as GPS/GNSS has become established in navigation, precision farming, construction, etc.
Let us repeat the measurement of the area of Región del Maule after projection to "UTM 19S"
|> st_union() |> st_transform(crs = "EPSG:31979") |>
maule_sf st_area() |> units::set_units("km2")
30314.34 [km^2]
The area is now more than 16 square kilometres greater than the area from Wikipedia, and the area calculated by s2
on the surface of the earth. All projections distort the rounded surface of the earth by squeezing the centre of the area of interest and stretching the edges to force it into planar form, so depending on the statutory regulations, all measured areas are approximate in one way or another.
Let us now move to the CRS of the boundary of the county of Gdańsk:
st_crs(gd_sf)
Coordinate Reference System:
User input: EPSG:2180
wkt:
PROJCRS["ETRF2000-PL / CS92",
BASEGEOGCRS["ETRF2000-PL",
DATUM["ETRF2000 Poland",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",9702]],
CONVERSION["Poland CS92",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",19,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9993,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",-5300000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (x)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (y)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Topographic mapping (medium and small scale)."],
AREA["Poland - onshore and offshore."],
BBOX[49,14.14,55.93,24.15]],
ID["EPSG",2180]]
Poland currently uses the European Terrestrial Reference Frame (ETRF) and lies on a tectonic plate with little deformation activity, although the plate itself is moving up to 20mm a year. The projection "ETRF2000-PL / CS92"
was adopted following the dissolution of the Warsaw Pact - mapping everywhere is driven at base by the needs of military operations.
The CRS of the take-away outlets is as for Talca:
st_crs(gd_takeaways)
Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
and the transformation pipeline from "ETRF2000-PL / CS92"
to "WGS 84"
has an accuracy of 1 metre:
sf_proj_pipelines("EPSG:2180", "EPSG:4326")
Candidate coordinate operations found: 2
Strict containment: FALSE
Axis order auth compl: FALSE
Source: EPSG:2180
Target: EPSG:4326
Best instantiable operation has accuracy: 1 m
Description: axis order change (2D) + Inverse of Poland CS92 + ETRF2000-PL
to WGS 84 (1) + axis order change (2D)
Definition: +proj=pipeline +step +inv +proj=tmerc +lat_0=0 +lon_0=19
+k=0.9993 +x_0=500000 +y_0=-5300000 +ellps=GRS80
+step +proj=unitconvert +xy_in=rad +xy_out=deg
Since none of these projections/transformations have required datum transformation, let us use the example of "OSGB36 / British National Grid"
to "WGS 84"
. We can see that the "Ordnance Survey of Great Britain 1936"
datum uses a different ellipsoid: "Airy 1830"
from "GRS 1980"
:
st_crs("EPSG:27700")
Coordinate Reference System:
User input: EPSG:27700
wkt:
PROJCRS["OSGB36 / British National Grid",
BASEGEOGCRS["OSGB36",
DATUM["Ordnance Survey of Great Britain 1936",
ELLIPSOID["Airy 1830",6377563.396,299.3249646,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4277]],
CONVERSION["British National Grid",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",49,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-2,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996012717,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",400000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",-100000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
BBOX[49.75,-9.01,61.01,2.01]],
ID["EPSG",27700]]
If we then query the database to find the most accurate transformation with enabled access to transformation grids, we can achieve 1 metre:
sf_proj_pipelines("EPSG:27700", "EPSG:4326")
Candidate coordinate operations found: 9
Strict containment: FALSE
Axis order auth compl: FALSE
Source: EPSG:27700
Target: EPSG:4326
Best instantiable operation has accuracy: 1 m
Description: Inverse of British National Grid + OSGB36 to WGS 84 (9) + axis
order change (2D)
Definition: +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
+k=0.9996012717 +x_0=400000 +y_0=-100000
+ellps=airy +step +proj=hgridshift
+grids=uk_os_OSTN15_NTv2_OSGBtoETRS.tif +step
+proj=unitconvert +xy_in=rad +xy_out=deg
The parts of the grid file uk_os_OSTN15_NTv2_OSGBtoETRS.tif
needed for the data in question can be downloaded automatically on-the-fly if PROJ network access is enabled, and cached locally for any software using PROJ for handling CRS; the grid file offsets are shown in Figure 2.4, page 25 of Pebesma and Bivand (2023).