2  Spatial data

In section Section 1.1, we showed why spatial data should be treated as “special” until it is shown that they do not require the application of specific methods to handle combinations of spatial autocorrelation, spatial scale and spatial heterogeneity. We did not there develop the concepts and tools needed for understanding and handling spatial data before analysis, or the presentation of results from spatial econometric models in spatial form. This treatment builds on Pebesma and Bivand (2023) chapters 1–6 and the R packages used there, but will also to some extent indicate where that software usage is matched in Python and elsewhere, for example Rey, Arribas-Bel, and Wolf (2023); for a comparative description, see Bivand (2022). If the reader is unfamiliar with data of this kind, they may benefit from Lovelace, Nowosad, and Muenchow (2019).

2.1 Spatial data basics

Spatial data are observations on variables, be they categorical, ordinal or continuous. The observations are associated with a position in space, where space is often represented as (sets of) coordinate tuplets on a map; spatial data are also found in bioinformatics and medical imaging, but here we are concerned with geographical spatial data.

Geographical spatial data take two forms: spatial vectors and spatial rasters. Spatial vector objects are most often represented by geometries of sets of pairs of coordinates known as eastings and northings, or longitude and latitude, forming points, lines or polygons; three-dimensional spatial vectors are not often used but may be provided by data sources. Spatial raster objects are regular square or rectangular grids of cells, most like digital images. Spatial rasters are fixed to a spatial point of origin and the locations of cells can be found from the point of origin by stepping out from the origin cell and summing the distances crossing intervening calls measured by the size or resolution of the cells. Rasters have one or more bands, and may also be termed spatial data cubes, structured by two or three spatial dimensions (three if height/depth is included) and a time dimension in addition to bands containing sensor measurements.

While spatial raster objects include attribute data - observations on variables taken at the location of each cell, spatial vector objects do not have to do so. Observations on variables of interest are linked to geometries through a common key or identifier. The key linking the observations with the spatial vector geometries may often be a standard code published and updated by a government statistical office. Not infrequently, the data are acquired or downloaded including the key, and the geometries of the spatial objects are acquired or downloaded separately, to be merged by the observation keys afterwards. Noting the timestamp specifying the validity of geometries and keys is always sensible, as both keys and geometries change over time. We will return to the representation of time in spatial panel objects, where the geometries are invariant but variables of interest are observed over successive time periods.

For the present, spatial data will be taken as-is; we will return to reading and writing spatial data from and to files in Section Section 3.3. The comprehensive R archive network (CRAN) provides software packages and short summaries called task views listing packages useful for particular tasks. The Spatial task view (https://cran.r-project.org/view=Spatial) lists packages accessing or providing specific data sources of interest. Here we will use three such packages: chilemapas with administrative boundaries for Chile, and rgugik for Poland, elevatr for spatial raster elevation data, and osmdata for OpenStreetMap information. The representation of the spatial vector data is handled by the sf package:

Sys.setenv("PROJ_NETWORK"="ON")
library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(chilemapas)
mapa_comunas |> data.frame() |> 
  st_as_sf(sf_column_name="geometry") -> mc_sf
mc_sf
Simple feature collection with 345 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -109.4499 ymin: -56.52511 xmax: -66.41617 ymax: -17.49778
Geodetic CRS:  SIRGAS 2000
First 10 features:
   codigo_comuna codigo_provincia codigo_region                       geometry
1          01401              014            01 MULTIPOLYGON (((-68.86081 -...
2          01403              014            01 MULTIPOLYGON (((-68.65113 -...
3          01405              014            01 MULTIPOLYGON (((-68.65113 -...
4          01402              014            01 MULTIPOLYGON (((-69.31789 -...
5          01404              014            01 MULTIPOLYGON (((-69.39615 -...
6          01107              011            01 MULTIPOLYGON (((-70.1095 -2...
7          01101              011            01 MULTIPOLYGON (((-70.09894 -...
8          02104              021            02 MULTIPOLYGON (((-68.98863 -...
9          02101              021            02 MULTIPOLYGON (((-70.60654 -...
10         02201              022            02 MULTIPOLYGON (((-67.94302 -...

The mapa_comunas data object stored in the chilemapas package is represented as a tibble and an sf object. Because most model fitting objects expect data as a data.frame not a tbl also known as a tibble, we coerce to data.frame and re-construct the sf object using the sf::st_as_sf method for data.frame objects. The disparity between data.frame and tbl objects is that when a single column of a data.frame is selected with the [ method, it becomes a vector by default, but the [ method on a single column of a tbl object returns a tbl by default. The geometry column is an sfc object, a list column of sfg objects, where sfg means “Simple Feature” geometry.

st_geometry(mc_sf)
Geometry set for 345 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -109.4499 ymin: -56.52511 xmax: -66.41617 ymax: -17.49778
Geodetic CRS:  SIRGAS 2000
First 5 geometries:
MULTIPOLYGON (((-68.86081 -21.28512, -68.92172 ...
MULTIPOLYGON (((-68.65113 -19.77188, -68.81182 ...
MULTIPOLYGON (((-68.65113 -19.77188, -68.63545 ...
MULTIPOLYGON (((-69.31789 -19.13651, -69.2717 -...
MULTIPOLYGON (((-69.39615 -19.06125, -69.40025 ...

The geometry column contains 345 multi-polygon objects, containing the boundaries of Chilean municipalities in 2017, clipped to the coastline. They are multi-polygons because at least some municipalities are composed of island and mainland parts. The short summary printed here also states that the geometries have a geodetic coordinate reference system (CRS), here specified as SIRGAS 2000. We return to coordinate reference systems in the next section Section 2.2, but note that area calculation is conducted on the specified ellipsoid using the s2 by default, not on the plane.

Let us find the area of Región del Maule, region "07", of which Taule is the capital city, using the region key column in the sf object named mc_sf. First we subset the municipalities to the desired region, then union the municipalities in Región del Maule before finding the area of the region as a whole.

mc_sf |> subset(subset=codigo_region == "07") -> maule_sf
maule_sf |> st_union() |> st_area() |> units::set_units("km2")
30297.84 [km^2]

The result agrees adequately with 30,296.1 square kilometres given by Wikipedia, as the details of administrative boundaries and hence the computed area are compromises between detail and the size of the data object stored in the package. By default the results of length or area calculations for spherical coordinates are returned by s2 in metres or square metres, so we convert to square kilometres.

sf objects pay attention to the support of variables, whether they may be classed as constant, aggregate, or identity. Constant variables are constant across the whole area of a polygon, for example a land use category. Aggregate variables may be counts, rates based on counts, or densities of counts by polygon area. Identity variables uniquely identify the geometries. In the case of the Región del Maule data set, no definitions have so far been provided, so when we change support from polygon to point by computing the spherical centroid, a warning is given (see Pebesma and Bivand 2023, 49–52). st_agr returns the current state of the factor (categorical object) for the non-geometry variables included in the sf object:

st_agr(maule_sf)
   codigo_comuna codigo_provincia    codigo_region 
            <NA>             <NA>             <NA> 
Levels: constant aggregate identity
maule_sf |> st_centroid() -> maule_sf_pt1
Warning: st_centroid assumes attributes are constant over geometries

We can avoid the warning by taking the centroid of the geometries alone, of the sfc object rather than the sf object:

maule_sf |> st_geometry() |> st_centroid() -> maule_sfc_pt2

Alternatively, since there are few non-geometry columns in the data set, we can assign the appropriate values to them, so that since the change of support does not change the meaning of the variables, no warning is given:

maule_sf |> 
  st_set_agr(c(codigo_comuna="identity",
               codigo_provincia="identity", 
               codigo_region="constant")
            ) -> maule_sf_agr
st_agr(maule_sf_agr)
   codigo_comuna codigo_provincia    codigo_region 
        identity         identity         constant 
Levels: constant aggregate identity
maule_sf_agr |> st_centroid() -> maule_sf_pt3

The three remaining packages download spatial data from public sources rather than pre-packaging them for use. elevatr can be used to download global elevation data as a spatial raster; until version 1 of the package is released, it returns a "RasterLayer" as defined in the legacy raster package, but from 1.0 will return a "SpatRaster" defined in the terra package. Here we specify our area of interest by passing the sf spatial vector object for Región del Maule, and clip the output to the boundary of the region, setting negative values to missing:

library(elevatr)
elevatr v0.99.0 NOTE: Version 0.99.0 of 'elevatr' uses 'sf' and 'terra'.  Use 
of the 'sp', 'raster', and underlying 'rgdal' packages by 'elevatr' is being 
deprecated; however, get_elev_raster continues to return a RasterLayer.  This 
will be dropped in future versions, so please plan accordingly.
library(terra)
terra 1.7.71
maule_sf |>  get_elev_raster(z = 7,
 clip = "locations", neg_to_na = TRUE) |> 
 rast() -> maule_elev
Mosaicing & Projecting
Clipping DEM to locations
Note: Elevation units are in meters.
maule_elev
class       : SpatRaster 
dimensions  : 371, 493, 1  (nrow, ncol, nlyr)
resolution  : 0.005009617, 0.005009617  (x, y)
extent      : -72.78435, -70.3146, -36.54143, -34.68287  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs 
source(s)   : memory
name        : filee6328d7c555 
min value   :               0 
max value   :            3936 

The raster has one layer (band), with a square (geodetic) grid with a resolution of about 0.005 degrees, 18.03 arc seconds in both dimensions, 371 rows and 493 columns.

Let us now access OpenStreetMap data using osmdata (Padgham et al. 2017), first subsetting to the city of Talca, the capital of Región del Maule. To set up a query, we need to build an OpenStreetMap Overpass query using a bounding box of our area of interest. Since it can be difficult to know what kinds of data have been contributed, we add the "names" feature, pulling in everything in the rectangle defined in degrees of longitude and latitude - obviously the top and bottom of the rectangle are actually curves, because the rectangle is on the surface of an ellipsoid. Having created the query talca_q, we can extract the data for all features with names:

maule_sf |> subset(subset = codigo_comuna=="07101") -> talca_sf
library(osmdata)
Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
talca_sf |> st_bbox() |> opq() |>
 add_osm_feature(key = "names") -> talca_q
talca_q |> osmdata_sf() -> talca_osm_sf
talca_osm_sf
Object of class 'osmdata' with:
                 $bbox : -35.5310255,-71.723806,-35.3118342,-71.4925696
        $overpass_call : The call submitted to the overpass API
                 $meta : metadata including timestamp and version numbers
           $osm_points : 'sf' Simple Features Collection with 134548 points
            $osm_lines : 'sf' Simple Features Collection with 14580 linestrings
         $osm_polygons : 'sf' Simple Features Collection with 1090 polygons
       $osm_multilines : 'sf' Simple Features Collection with 20 multilinestrings
    $osm_multipolygons : 'sf' Simple Features Collection with 22 multipolygons

The output object is a list of components, some of which are sf objects. If we take the point objects, we can attempt to subset to those said to offer take-away service, that is those for which the takeaway variable is not coded NA - missing.

talca_osm_sf$osm_points -> talca_pts
talca_pts |> subset(subset = !is.na(takeaway), 
 select = c(osm_id, name, amenity, cuisine,
  takeaway, geometry)) -> talca_takeaways
talca_takeaways
Simple feature collection with 11 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -71.68909 ymin: -35.44643 xmax: -71.62727 ymax: -35.42225
Geodetic CRS:  WGS 84
First 10 features:
               osm_id                 name    amenity        cuisine takeaway
1237856768 1237856768              Rossini       cafe       sandwich      yes
1264154772 1264154772            Telepizza  fast_food          pizza      yes
2853291904 2853291904             La Cleta restaurant           <NA>      yes
2853679104 2853679104          3 Estrellas        pub       sandwich      yes
2858522001 2858522001 Cafetería Riquíssimo restaurant       regional      yes
3377351170 3377351170             Me Gustó  fast_food         burger      yes
3699062703 3699062703       Talca Sandwich  fast_food         burger      yes
3766018631 3766018631           El Chilote restaurant fish_and_chips      yes
7233542316 7233542316           McDonald's  fast_food         burger      yes
7862158443 7862158443          Papa John's  fast_food          pizza      yes
                              geometry
1237856768 POINT (-71.66275 -35.42618)
1264154772 POINT (-71.63046 -35.43272)
2853291904 POINT (-71.66563 -35.42866)
2853679104 POINT (-71.65372 -35.42822)
2858522001 POINT (-71.66378 -35.42624)
3377351170 POINT (-71.68909 -35.44643)
3699062703 POINT (-71.68062 -35.43947)
3766018631 POINT (-71.67028 -35.42225)
7233542316 POINT (-71.65486 -35.42561)
7862158443 POINT (-71.68665 -35.44302)

There only appear to be 11 such establishments, but perhaps these are the ones whose locations grateful clients (or ambitious proprietors) have contributed to OpenStreetMap.

The rgugik package (Dyba and Nowosad 2021) does not bundle the municipality boundaries of Poland as chilemapas did for Chile, but downloads them from the Polish Head Office of Geodesy and Cartography (GUGIK) online. The object returned from a query to the boundaries of the Gdańsk county administrative unit is an sf object with a single variable, the identification key.

library(rgugik)
gd_sf <- borders_get(county = "Gdańsk")
gd_sf
Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 462961.1 ymin: 712372.4 xmax: 496776.7 ymax: 731607.8
Projected CRS: ETRF2000-PL / CS92
  TERYT                           geom
1  2261 MULTIPOLYGON (((465078.9 73...

Again, we can use the sf object to access elevation data, here returned with about 88 metre resolution, as the input object has a projected coordinate reference system:

gd_sf |>  get_elev_raster(z = 9,
 clip = "locations", neg_to_na = TRUE) |> rast() -> gd_elev
Mosaicing & Projecting
Clipping DEM to locations
Note: Elevation units are in meters.
gd_elev
class       : SpatRaster 
dimensions  : 216, 379, 1  (nrow, ncol, nlyr)
resolution  : 89.09542, 89.09542  (x, y)
extent      : 462973.9, 496741.1, 712362.5, 731607.1  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=tmerc +lat_0=0 +lon_0=19 +k=0.9993 +x_0=500000 +y_0=-5300000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
source(s)   : memory
name        : filee63abcb0de 
min value   :              0 
max value   :            177 

As we just observed, gd_sf has a projected coordinate reference system, so needs to be transformed to a geodetic coordinate reference system before the creation of an Openpass query, here for "amenity" features:

gd_sf |> st_transform(crs = "OGC:CRS84") |> st_bbox() |>
 opq() |> add_osm_feature(key = "amenity") -> gd_q
gd_q |> osmdata_sf() -> gd_osm_sf
gd_osm_sf
Object of class 'osmdata' with:
                 $bbox : 54.274974489,18.429497095,54.447218154,18.9503794130001
        $overpass_call : The call submitted to the overpass API
                 $meta : metadata including timestamp and version numbers
           $osm_points : 'sf' Simple Features Collection with 64129 points
            $osm_lines : 'sf' Simple Features Collection with 30 linestrings
         $osm_polygons : 'sf' Simple Features Collection with 5946 polygons
       $osm_multilines : 'sf' Simple Features Collection with 1 multilinestrings
    $osm_multipolygons : 'sf' Simple Features Collection with 19 multipolygons

Once again we extract the locations of registered take-away outlets, of which many more seem to have been contributed to OpenStreetMap than in the case of the bounding box surrounding the city of Talca:

gd_osm_sf$osm_points -> gd_pts
gd_pts |> subset(subset = !is.na(takeaway),
 select = c(osm_id, name, amenity, cuisine,
  takeaway, geometry)) -> gd_takeaways
gd_takeaways
Simple feature collection with 160 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 18.45562 ymin: 54.30241 xmax: 18.71603 ymax: 54.44475
Geodetic CRS:  WGS 84
First 10 features:
               osm_id                    name    amenity           cuisine
473433244   473433244              McDonald's  fast_food            burger
473433245   473433245                     KFC  fast_food           chicken
478756186   478756186              McDonald's  fast_food            burger
746496055   746496055                   Costa       cafe       coffee_shop
747182709   747182709                     KFC  fast_food           chicken
1538112296 1538112296 Ping Pong ramen & sushi restaurant asian;ramen;sushi
1693970318 1693970318                     KFC  fast_food           chicken
1763555507 1763555507          Zielona Papuga restaurant          regional
1777255606 1777255606          Piwnica Rajców restaurant              <NA>
1783311488 1783311488             Kreska Mąki  fast_food             pizza
           takeaway                  geometry
473433244       yes  POINT (18.6005 54.38275)
473433245       yes  POINT (18.60085 54.3827)
478756186       yes POINT (18.58742 54.35775)
746496055       yes  POINT (18.6487 54.34943)
747182709       yes  POINT (18.6452 54.35604)
1538112296      yes  POINT (18.5671 54.41832)
1693970318      yes POINT (18.57906 54.35757)
1763555507       no  POINT (18.6172 54.37657)
1777255606       no POINT (18.65326 54.34866)
1783311488      yes  POINT (18.58812 54.3596)

2.2 Coordinate reference systems

Since 2019, the representation of coordinate reference systems (CRS) have stabilised in the well-known text format known as WKT2-2019, which is an international standard. The representation supports both two and three dimensional structures, and as implemented in open geospatial software (PROJ) uses definitions stored in a regularly updated database bundled with the software. The database also contains tables defining transformations between CRS, now often based on open access transformation grid files downloaded on-the-fly if required. In parts of the world subject to rapid tectonic movement (the old city centre of Talca, Chile was badly damaged by an earthquake in 2010), CRS change frequently if more by centimetres than by metres.

st_crs(talca_sf)
Coordinate Reference System:
  User input: EPSG:4674 
  wkt:
GEOGCRS["SIRGAS 2000",
    DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["unknown"],
        AREA["Latin America - SIRGAS 2000 by country"],
        BBOX[-59.87,-122.19,32.72,-25.28]],
    ID["EPSG",4674]]

The definition used for stored objects in the chilemapas package is the geographical CRS in decimal degrees called "SIRGAS 2000" based on the "GRS 1980" ellipsoid - geodetic reference system of 1980. The provided data have as is often the case swapped axes compared to the CRS definition. Most geographical information systems keep to x, y, that is easting, northing or longitude, latitude axis order. Before version 1.0, elevatr reverts to the legacy PROJ4 string representation seen in the summaries above.

osmdata and OpenStreetMap appear to assume that all modern geographical coordinate reference systems use "EPSG:4326" as is said to apply to consumer global positioning systems (GPS) devices. This is somewhat accurate, but there is no fixed definition. The definition is usually updated locally for each major tectonic plate, so the WKT2-2019 representation of the CRS used here is as a datum ensemble, with at best two metre accuracy, often worse, as we are noe fourty years on from where the tectonic plates were when "WGS 84 was agreed in 1984. Some surveyors have argued that all coordinate tuples should be timestamped, in order to permit the tracking of the movement of points across the earth’s surface and their vertical change as well.

st_crs(talca_takeaways)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

So while the "SIRGAS 2000" object and the "WGS 84 refer approximately to the same representation, some sf predicates and operations on geometries may see them as differing, as for example checking that the Talca take-aways found in the bounding box do lie within the municipality boundaries:

try(st_within(talca_takeaways, talca_sf))
Error in st_geos_binop("within", x, y, sparse = sparse, prepared = prepared,  : 
  st_crs(x) == st_crs(y) is not TRUE

Since we know that both are based on the"GRS 1980" ellipsoid, we can overwrite the CRS of the OpenStreetMap take-away outlets location output to bring it into agreement with "SIRGAS 2000", resulting in finding that all the outlets all lie within the municipality boundaries:

talca_sf |> st_crs() -> sirgas2000
talca_takeaways |> st_set_crs(sirgas2000) |>
 st_within(talca_sf) |> unlist() |> as.logical()
Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
that
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Alves Costa et al. (2023) document current status for the development of SIRGAS, and helpfully show how an increase in ground GPS/GNSS (Global Navigation Satellite Systems from all providers) station density helps track terrestrial movement, which in parts of Chile exceeds 20mm per year. While the movement of the earth’s crust is one source of inaccuracy, another is measurement error, and yet another computational error from repeated use of trigonometrical functions and approximated transformation coefficients. A lack of clarity in defining CRS can lead to the wrong assignation of for example points to polygons.

The Universal Transverse Mercator projection (zone 19 south) is often used for central Chile, and this specific version uses the same "SIRGAS 2000" datum as the input data objects from chilemapas:

st_crs("EPSG:31979")
Coordinate Reference System:
  User input: EPSG:31979 
  wkt:
PROJCRS["SIRGAS 2000 / UTM zone 19S",
    BASEGEOGCRS["SIRGAS 2000",
        DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4674]],
    CONVERSION["UTM zone 19S",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-69,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",10000000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["Brazil - between 72°W and 66°W, northern and southern hemispheres. In remainder of South America - between 72°W and 66°W, southern hemisphere, onshore and offshore."],
        BBOX[-59.87,-72,2.15,-66]],
    ID["EPSG",31979]]

If we check to see which candidate coordinate operations are available, we see that in fact projection accuracy is as good as possible, because the datum definitions of the two CRS are the same:

sf_proj_network(TRUE)
[1] "https://cdn.proj.org"
sf_proj_pipelines("EPSG:4674", "EPSG:31979")
Candidate coordinate operations found:  1 
Strict containment:     FALSE 
Axis order auth compl:  FALSE 
Source:  EPSG:4674 
Target:  EPSG:31979 
Best instantiable operation has accuracy: 0 m
Description: axis order change (2D) + UTM zone 19S 
Definition:  +proj=pipeline +step +proj=unitconvert +xy_in=deg +xy_out=rad
             +step +proj=utm +zone=19 +south +ellps=GRS80

If we replace the target CRS with "SIRGAS-Chile 2021 UTM 19S", sf_proj_pipelines still only finds one candidate, but because no datum transformation from "SIRGAS 2000" to "SIRGAS-Chile 2021" is found in the database, we must accept unknown ballpark accuracy:

sf_proj_pipelines("EPSG:4674", "EPSG:20049")
Candidate coordinate operations found:  1 
Strict containment:     FALSE 
Axis order auth compl:  FALSE 
Source:  EPSG:4674 
Target:  EPSG:20049 
Best instantiable operation has only ballpark accuracy 
Description: axis order change (2D) + Ballpark geographic offset from SIRGAS
             2000 to SIRGAS-Chile 2021 + UTM zone 19S
Definition:  +proj=pipeline +step +proj=unitconvert +xy_in=deg +xy_out=rad
             +step +proj=utm +zone=19 +south +ellps=GRS80

Many national and international mapping agencies have provided open access transformation grids that PROJ can use on-the-fly (https://cdn.proj.org/), but neither Chile not SIRGAS have contributed any such grid so far. Grids are more accurate than three or seven parameter transformations because they allow local crustal deformations to be applied, rather than averaged over the whole area of application. Accuracy is now much more important as GPS/GNSS has become established in navigation, precision farming, construction, etc.

Let us repeat the measurement of the area of Región del Maule after projection to "UTM 19S"

maule_sf |> st_union() |> st_transform(crs = "EPSG:31979") |>
 st_area() |> units::set_units("km2")
30314.34 [km^2]

The area is now more than 16 square kilometres greater than the area from Wikipedia, and the area calculated by s2 on the surface of the earth. All projections distort the rounded surface of the earth by squeezing the centre of the area of interest and stretching the edges to force it into planar form, so depending on the statutory regulations, all measured areas are approximate in one way or another.

Let us now move to the CRS of the boundary of the county of Gdańsk:

st_crs(gd_sf)
Coordinate Reference System:
  User input: EPSG:2180 
  wkt:
PROJCRS["ETRF2000-PL / CS92",
    BASEGEOGCRS["ETRF2000-PL",
        DATUM["ETRF2000 Poland",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",9702]],
    CONVERSION["Poland CS92",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",19,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9993,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",-5300000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (x)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (y)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Topographic mapping (medium and small scale)."],
        AREA["Poland - onshore and offshore."],
        BBOX[49,14.14,55.93,24.15]],
    ID["EPSG",2180]]

Poland currently uses the European Terrestrial Reference Frame (ETRF) and lies on a tectonic plate with little deformation activity, although the plate itself is moving up to 20mm a year. The projection "ETRF2000-PL / CS92" was adopted following the dissolution of the Warsaw Pact - mapping everywhere is driven at base by the needs of military operations.

The CRS of the take-away outlets is as for Talca:

st_crs(gd_takeaways)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

and the transformation pipeline from "ETRF2000-PL / CS92" to "WGS 84" has an accuracy of 1 metre:

sf_proj_pipelines("EPSG:2180", "EPSG:4326")
Candidate coordinate operations found:  2 
Strict containment:     FALSE 
Axis order auth compl:  FALSE 
Source:  EPSG:2180 
Target:  EPSG:4326 
Best instantiable operation has accuracy: 1 m
Description: axis order change (2D) + Inverse of Poland CS92 + ETRF2000-PL
             to WGS 84 (1) + axis order change (2D)
Definition:  +proj=pipeline +step +inv +proj=tmerc +lat_0=0 +lon_0=19
             +k=0.9993 +x_0=500000 +y_0=-5300000 +ellps=GRS80
             +step +proj=unitconvert +xy_in=rad +xy_out=deg

Since none of these projections/transformations have required datum transformation, let us use the example of "OSGB36 / British National Grid" to "WGS 84". We can see that the "Ordnance Survey of Great Britain 1936" datum uses a different ellipsoid: "Airy 1830" from "GRS 1980":

st_crs("EPSG:27700")
Coordinate Reference System:
  User input: EPSG:27700 
  wkt:
PROJCRS["OSGB36 / British National Grid",
    BASEGEOGCRS["OSGB36",
        DATUM["Ordnance Survey of Great Britain 1936",
            ELLIPSOID["Airy 1830",6377563.396,299.3249646,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4277]],
    CONVERSION["British National Grid",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",49,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-2,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996012717,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",400000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",-100000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
        BBOX[49.75,-9.01,61.01,2.01]],
    ID["EPSG",27700]]

If we then query the database to find the most accurate transformation with enabled access to transformation grids, we can achieve 1 metre:

sf_proj_pipelines("EPSG:27700", "EPSG:4326")
Candidate coordinate operations found:  9 
Strict containment:     FALSE 
Axis order auth compl:  FALSE 
Source:  EPSG:27700 
Target:  EPSG:4326 
Best instantiable operation has accuracy: 1 m
Description: Inverse of British National Grid + OSGB36 to WGS 84 (9) + axis
             order change (2D)
Definition:  +proj=pipeline +step +inv +proj=tmerc +lat_0=49 +lon_0=-2
             +k=0.9996012717 +x_0=400000 +y_0=-100000
             +ellps=airy +step +proj=hgridshift
             +grids=uk_os_OSTN15_NTv2_OSGBtoETRS.tif +step
             +proj=unitconvert +xy_in=rad +xy_out=deg

The parts of the grid file uk_os_OSTN15_NTv2_OSGBtoETRS.tif needed for the data in question can be downloaded automatically on-the-fly if PROJ network access is enabled, and cached locally for any software using PROJ for handling CRS; the grid file offsets are shown in Figure 2.4, page 25 of Pebesma and Bivand (2023).